Spider Traps and How To Avoid Them

As mentioned before the first time to good ranking on search engine results is to ensure that the pages you build are search engine friendly. And by saying search engine friendly would mean that these pages are navigable by search engine and have no spider traps.

What is a spider trap and how do you know if your page has such traps? Spider traps are areas on your web page that prevent search robots from continually scanning all sections of your page and therefore cannot follow certain links that are embedded somewhere inside the code.

In order to prevent spider traps, here are some ideas you can adopt:

  1. Check your links. Checking links helps ensure that you can provide the correct URL for a certain link that is found in your pages. Once a spider / bot encounters a link and follows it, it gives that spider the chance to check that page for page titles and content. But if the link is broken, there is no way a spider can visit the supposed linked page. Tools like Xenu will help verify sites for broken links.
  2. Carefully check your robots.txt file. You may not mean to block passage of robots but with a wrong directive, you are sending a wrong message to them. A robots.txt file is a text file located at the root of your site and contains two operative parameters, user-agent or name of robot / spider, and disallow which specifies what folders or files to skip indexing. Ensure that those files or folders specify on disallow are the ones you want to be skipped. Also when giving instructions using the meta tag of each page, understand the difference among the following:

    1. -> index the page, follow the links within the page
    2. -> index the page, do not follow the links within the page
    3. -> do not index the page, follow the links within the page
    4. -> do not index the page, do not follow the links within the page

    (1) seems ideal for a page you’d like to get indexed, (2) and (3) appears limited in scope and (4) is totally setting the page off-limits, similar to a disallow instruction.

  3. Avoid using frames. Using frames has disadvantages in terms of usability. One of which is bookmarking. When you bookmark a page inside a framed document that exerts extra click to reach it, the bookmarked page displays the default page of the document. Frames also pose problems to robots. The disconnection between two or more html pages specified in one “parent” document causes the problem to exist. So if possible, avoid using framed documents.
  4. No to popups, if possible. Popups not only annoy many people visiting your site, it also isolates certain pages from robots. When robots visit a page that has link to a popup window, they may not recognize the link explained in the <a> tag. But if the design calls for it, you can still use it, but provide another way to follow the links such as in the site map page.
  5. Eliminate use of “conditions” to access the pages. Certain web pages require enabling of cookies, generate session ID’s or ask for name before it displays a page. Definitely robots cannot type on keyboards and enter pages only through links so if a page requires the above described items or the like, spiders are issued a “no entry” message and cannot come in.

There are others that I failed to mention here, but the message is clear: Robots or spiders are just literaly robots who follow instructions and cannot interact like humans. So when they visit your site, be gentle and don’t trap them.