How to Help Search Engine Robots Scan Your Sites Better

It’s been traditionally understood that search engine crawlers skip scanning scripts and stylesheets in favor of the more important chunks of content — body content rendered at HTTP response level and displayed into one’s browsers.

google_webmaster_tools

Doing so leaves plenty of content getting ignored so I typically identify areas of websites that are considered “spider traps” and fix them. But then Google realized the importance of these aspects of a web page that potentially kept them in the dark even if desired content is found in these complex scripts that churn out content, just as content embedded in Flash doesn’t get fully harnessed.

Now that Google has developed means to execute scripts, it also means webmasters should be wary of things that make life harder for the ever-wandering search spider:

  • Make sure JavaScript and CSS are not blocked by robots.txt so Googlebot can retrieve them and virtually execute them. This is especially important for JavaScript driven navigation and responsive websites where CSS and JS files play crucial hint in letting Googlebot understand sites that are optimized for certain devices.
  • Web servers need to cope up with demand for crawling as delays and limitations can adversely impact Google’s ability to render pages promptly. Although there are settings that control crawl rate, a stable and robust server takes higher precedence.
  • Having too complex JavaScript structure can also prevent Googlebots from accessing properly and hence, limit Google’s ability to index content.
  • Ensure that the site works properly — that is, displays content completely and accurately — even for users with low bandwidth, outdated browsers. Just because Google said it will start executing scripts doesn’t mean other search engines can do the same.