OpenAI's ChatGPT is an advanced AI chatbot that uses content retrieved from the internet to train its system. While this technology is beneficial for many, some website owners may prefer to prevent this bot from accessing their site's content. Whether it's for privacy concerns or content protection, the process is simple.
Accessing Robots.txt
Navigate to your site settings in the Umso Dashboard.
Locate the Robots.txt section.
General Blocking
Add the following lines to block OpenAI's bot from accessing any content and Save Robots.
User-agent: GPTBot
Disallow: /
To confirm that the robots.txt is working correctly, you can visit `https://yourwebsite.com/robots.txt` in your browser.
Advanced Blocking
To block all pages under a directory, but allow a specific file:
User-agent: GPTBot Allow: /privatePage/
Disallow: /privatePage/publicFile.jpg
To block a specific page
User-agent: GPTBot Disallow: /privatePage/
To block a specific file
User-agent: GPTBot Disallow: /privatePage/privateFile.html
Understanding Robots.txt
Robots.txt is a standard used by websites to direct web crawling and scraping bots about which pages or files the bot can or can't request from your site. Not all bots adhere to this standard, but OpenAI's ChatGPT respects it. By making simple modifications to your site's `robots.txt` file, you can manage the access of various crawlers, including the ChatGPT.