Analyze Robots.txt of any website

Tinybird

Best data solution infrastructure for software teams

Topdelmes

TV shows and movies reviews and rankings site

Marketgoo

Empower web hosts, agencies, & SMB providers

Faedo Digital

Digital solutions for small and rural businesses

Tinybird

Best data solution infrastructure for software teams

Topdelmes

TV shows and movies reviews and rankings site

Marketgoo

Empower web hosts, agencies, & SMB providers

Faedo Digital

Digital solutions for small and rural businesses

Tinybird

Best data solution infrastructure for software teams

Topdelmes

TV shows and movies reviews and rankings site

Marketgoo

Empower web hosts, agencies, & SMB providers

Faedo Digital

Digital solutions for small and rural businesses

Frequently asked questions

Here are some common questions about robots.txt files and how to use them.

How do I submit a robots.txt file to search engines?

You don't need to submit a robots.txt file to search engines. Crawlers look for a robots.txt file before crawling a site. If they find one, they read it first before scanning your site.

If you make changes to your robots.txt file and want to notify Google, you can submit it to Google Search Console. Use the Robots.txt Tester to paste the text file and click Submit.

How do I add the generated robots.txt file to my website?

Search engines and other crawling bots look for a robots.txt file in the main directory of your website. After generating the robots.txt file, add it to the root folder of your website, which can be found at https://yoursite.com/robots.txt.

The method of adding a robots.txt file depends on the server and CMS you are using. If you can't access the root directory, contact your web hosting provider.

How do I add my Sitemap to the robots.txt file?

You can add your Sitemap to the robots.txt file to make it easier for bots to crawl your website content. The Sitemap file is located at http://yourwebsite/sitemap.xml. Add a directive with the URL of your Sitemap like this:

User-agent: *
Disallow: /folder1/
Allow: /image1/
Sitemap: https://your-site.com/sitemap.xml

How do I use the Allow directive properly?

The Allow directive counteracts the Disallow directive. Using Allow and Disallow together, you can tell search engines to access a specific folder, file, or page within a disallowed directory.

Disallow: /album/

Example: search engines are not allowed to access the /album/ directory

How do I use the Disallow directive properly?

After filling in the User-agent directive, specify the behavior of certain (or all) bots by adding crawl instructions. Here are some tips:

  • Don't leave the Disallow directive without a value. In this case, the bot will crawl all of the site's content.
  • Do not list every file you want to block from crawling. Just disallow access to a folder, and all files in it will be blocked from crawling and indexing.
  • Don't block access to the whole website unless necessary:

Example 1

Disallow: / # allow to crawl the entire website

Example 2

Disallow: /folder/

Example 3

Disallow: / # block access to the entire website

Make sure essential website pages are not blocked from crawling: the home page, landing pages, product pages, etc.

How do I define the User-agent?

Specify the name of the bot to which you're giving crawl instructions using the User-agent directive.

Keep in mind that each search engine has its own bots, which may differ in name. For example, Yahoo's bot is Slurp. Google has several bots for different purposes:

  • Googlebot News—crawls news
  • Google Mobile—crawls mobile pages
  • Googlebot Video—crawls videos
  • Googlebot Images—crawls images
  • Google AdSense—crawls websites to determine content and provide relevant ads
User-agent: *

To block or allow all crawlers from accessing some of your content, use an asterisk (*):

To allow only Google to crawl your pages, use:

User-agent: Googlebot

Robots.txt syntax

The robots.txt syntax consists of directives, parameters, and special characters. Follow these rules for proper functionality:

  • Each directive must start on a new line with only one parameter per line.
  • Robots.txt is case-sensitive. Match the case of folder names exactly.
  • Do not use quotation marks, spaces at the beginning of lines, or semicolons after lines.

Example of correct syntax

User-agent: *
Disallow: /folder1/
Disallow: /folder2/

Correct case sensitivity

Disallow: /folder/

Incorrect if the actual folder name is lowercase

Disallow: /Folder/

Incorrect syntax with semicolons and quotes

Disallow: /folder1/;
Disallow: /"folder2"/

Correct syntax

Disallow: /folder1/
Disallow: /folder2/

Full documentation

For more information on the robots.txt file, visit:

<span className={styles.code}>Robots.txt</span>

Do you want to contribute?

I know sometimes it's hard to contribute on open source projects but I'm here to help you. You can contribute to this project by adding new features, fixing bugs, or improving the existing code creating a pull request or issue on our GitHub repository .

Contribute on open source projects is a great way to learn new things, improve your skills, and help the community. You can start by reading the project's documentation, checking the issues, and creating a pull request. If you have any questions, feel free to contact me on Twitter / X

Thanks for your interest in contributing to this project. I'm looking forward to seeing your contributions. Let's make this project even better together .