InfoHeap
Tech tutorials, tips, tools and more
Navigation
  • Home
  • Tutorials
    • CSS tutorials & examples
    • CSS properties
    • Javascript cookbook
    • Linux/Unix Command Line
    • Mac
    • PHP
      • PHP functions online
      • PHP regex
    • WordPress
  • Online Tools
    • Text utilities
    • Online Lint Tools
search

robots.txt

  • Online robots.txt check
  • wildcard
  • disallow all
  • resubmit robots.txt to Google
  • Googlebot and js, css crawl
  • robot noindex, follow for wordpress
  • Allow Google adsense bot
  • parse robots.txt using python
 
  • Home
  • > Tutorials
  • > Web Development
  • > SEO
  • > robots.txt

Should I block Googlebot from crawling javascript and css?

By admin | Last updated on Mar 11, 2016

I noticed that google bot is crawling javascript and css regularly from by wordpress blog site. Here are some entries from my apache log:

66.249.75.66 - - [18/Mar/2013:08:07:28 +0000] "GET /wp-content/themes/shell-master/media-queries.css?ver=0.1.1 HTTP/1.1" 200 1541 "https://infoheap.com/" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" "V:infoheap.com t:20130318080728 D:875 -"
66.249.76.66 - - [18/Mar/2013:18:45:08 +0000] "GET /wp-content/plugins/contact-form-7/includes/js/scripts.js?ver=3.3.3 HTTP/1.1" 301 286 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" "V:infoheap.com t:20130318184508 D:323 -"

Earlier I never paid too much attention to it. But recently I tried to do some research on it thinking that may be I can block crawling of javascript and css so that Googlebot can crawl other content from the site.

I found this official video titled Don’t block Googlebot from crawling JavaScript and CSS by Matt Cutts (published in Google Webmaster Channel) on this topic.

This is pretty interesting and makes lot of sense. Matt Cutts clearly says Google is getting better at processing javascript and css. And it makes sense from user perspective we well. Here are my thoughts on it.

  1. Presentation of the content is getting more and more important in addition to the content. So it is important for any search engines to crawl javascript and css.
  2. There may be things hidden in html. So just plain text analysis may not a good idea. A responsible search engine should crawl and interpret everything.
  3. There may be sites with malicious javascript. They have something in html but may show something else to user. So it makes more sense to crawl everything on such pages.
  4. I have even seen flash content being shown in search results. Its a good thing for flash content discovery.
  5. It is also a good idea from site performance perspective to remove unused javascript and css from pages. That way Google does not have to crawl dead javascript and css.

So what should be in robots.txt? I think a good robots.txt (at least as a starting point) for a wordpress site is:

User-agent: *
Disallow: /wp-admin/

Also see: Online robots.txt sandbox

Suggested posts:

  1. How to prevent your test blog from google crawling but show adsense ads
  2. How to resubmit robots.txt to Google
  3. Using python to analyze bots from apache logs
  4. How to use Google custom search with lazy loading approach
  5. robots.txt disallow all example
  6. How to redirect wordpress feed to feedburner feed url
  7. How to undo HTTP 301 site/domain redirect
  8. WordPress SEO – beginner guide
Share this article: share on facebook share on linkedin tweet this submit to reddit
Posted in Tutorials | Tagged robots.txt, SEO, Tutorials, Webmaster, Wordpress

Follow InfoHeap

facebook
twitter
googleplus
  • Browse site
  • Article Topics
  • Article archives
  • Recent Articles
  • Contact Us
  • Omoney
Popular Topics: Android Development | AngularJS | Apache | AWS and EC2 | Bash shell scripting | Chrome developer tools | CSS | CSS cookbook | CSS properties | CSS Pseudo Classes | CSS selectors | CSS3 | CSS3 flexbox | Devops | Git | HTML | HTML5 | Java | Javascript | Javascript cookbook | Javascript DOM | jQuery | Kubernetes | Linux | Linux/Unix Command Line | Mac | Mac Command Line | Mysql | Networking | Node.js | Online Tools | PHP | PHP cookbook | PHP Regex | Python | Python array | Python cookbook | SEO | Site Performance | SSH | Ubuntu Linux | Web Development | Webmaster | Wordpress | Wordpress customization | Wordpress How To | Wordpress Mysql Queries

Copyright © 2023 InfoHeap.

Powered by WordPress