InfoHeap
Tech tutorials, tips, tools and more
Navigation
  • Home
  • Tutorials
    • CSS tutorials & examples
    • CSS properties
    • Javascript cookbook
    • Linux/Unix Command Line
    • Mac
    • PHP
      • PHP functions online
      • PHP regex
    • WordPress
  • Online Tools
    • Text utilities
    • Online Lint Tools
search

robots.txt

  • Online robots.txt check
  • wildcard
  • disallow all
  • resubmit robots.txt to Google
  • Googlebot and js, css crawl
  • robot noindex, follow for wordpress
  • Allow Google adsense bot
  • parse robots.txt using python
 
  • Home
  • > Tutorials
  • > Web Development
  • > SEO
  • > robots.txt

How to use wildcard in robots.txt

By admin on Jan 26, 2016

By default Disallow entry in robots.txt is based on url prefix. To block every url beginning with /foo/ the following robots.txt can be used.

User-agent: *
Disallow: /foo/

One can also use wildcard in robots.txt for more flixible rules. Here are some robots.txt examples using wild card in url.

Using wildcard in robots.txt

To support more complex Disallow rules, we can use wildcard in between. So to disallow every url which has /foo/ in url anywhere, the following robots.txt can be used:

User-agent: *
Disallow: /*/foo/

Block 2nd and higher tag pages in wordpress

To block 2nd and higher tag pages in wordpress one can use the following robots.txt

User-agent: *
Disallow: /tag/*/page/

Block comment feed urls in wordpress

User-agent: *
Disallow: /*/feed/

Note this will not block /feed/

Suggested posts:

  1. robots.txt disallow all example
  2. How to prevent your test blog from google crawling but show adsense ads
  3. Using robotexclusionrulesparser python module to parse robots.txt and crawl a site
  4. Should I block Googlebot from crawling javascript and css?
  5. Fetch wordpress rss feed as FeedBurner user agent on command line
  6. How to customize wordpress rss feed
  7. How to resubmit robots.txt to Google
  8. How to redirect wordpress feed to feedburner feed url
Share this article: share on facebook share on linkedin tweet this submit to reddit
Posted in Tutorials | Tagged robots.txt, Tutorials, Web Development, Webmaster

Follow InfoHeap

facebook
twitter
googleplus
  • Browse site
  • Article Topics
  • Article archives
  • Recent Articles
  • Contact Us
  • Omoney
Popular Topics: Android Development | AngularJS | Apache | AWS and EC2 | Bash shell scripting | Chrome developer tools | CSS | CSS cookbook | CSS properties | CSS Pseudo Classes | CSS selectors | CSS3 | CSS3 flexbox | Devops | Git | HTML | HTML5 | Java | Javascript | Javascript cookbook | Javascript DOM | jQuery | Kubernetes | Linux | Linux/Unix Command Line | Mac | Mac Command Line | Mysql | Networking | Node.js | Online Tools | PHP | PHP cookbook | PHP Regex | Python | Python array | Python cookbook | SEO | Site Performance | SSH | Ubuntu Linux | Web Development | Webmaster | Wordpress | Wordpress customization | Wordpress How To | Wordpress Mysql Queries

Copyright © 2023 InfoHeap.

Powered by WordPress