Online robots.txt sandbox

In case you are writing robots.txt and want to test it for a specific crawler/bot/user-agent, you can use this tool to test it.

robots.txt content:

Crawler/Bot Name: (Other bot examples: Googlebot, Googlebot-Image, Googlebot-News, Googlebot-Mobile, AdsBot-Google, bingbot, msnbot, Ask Jeeves)

Url:

 

Some important points about robotx.txt

  1. One site should have one robots.txt and it should be in root directory. e.g. http://yoursite.com/robots.txt
  2. There must be exactly one User-Agent field per record
  3. Robot should try to do a substring match with case insensitive value of User-Agent value specified in robots.txt.
  4. Empty value of Disallow indicates all URIs can be retrieved. At least one Disallow field must be present in robots.txt
  5. "User-Agent: *" applies to a User-Agent if it does not have its own specific entry. Multiple "*" entries are not allowed.

Related links

  1. Python Robotexclusionrulesparser library used by this tool.
  2. HTML 4.01 specification (Section B.4.1 about search robots)
  3. Wikipedia Robots_exclusion_standard

Sample robots.txt

  • wordpress blog allow all
    User-agent: *
    Disallow: /wp-admin/
  • allow all robots.txt
    User-agent: *
    Disallow:
  • allow none robots.txt
    User-agent: *
    Disallow: /
  • allow all except one robots.txt
    User-agent: *
    Disallow: 
    User-agent: Somebadbot
    Disallow: /