Yahoo! Adds Wildcard Entries for robots.txt

Yahoo! has recently updated its search engine robot Slurp to recognize wildcard specifications on file naming found within robots.txt file.

The supported characters are * (asterisk) and $ (dollar sign, obviously). As we knew from Disk Operating System of Microsoft, asterisk is used to denote an indefinite number of character so that s* can represent seo or search. As an example:

User-Agent: Yahoo! Slurp
Allow: /public*/
Disallow: /*_print*.html

Second line allows the crawling of all folders beginning with public regardless of character length such as public_html or public_image.

Third line directs Slurp to ignore any html filename that contains _print.

On the other hand the $ directive indicates any match that is found towards the end of the URL. For example:

User-Agent: Yahoo! Slurp

Disallow: /*.gif$
Allow: /*?$

The Disallow statement simply tells Slurp to ignore any filename that ends with a .gif. One thing to note is that if we remove the $ sign, the instruction tells Slurp to ignore any file that contains the expression .gif.

The Allow statement tells Slurp (or other engines that support this command) to allow any file that contains a ? (question mark) character in the end and NOT any filename that contains the ? character elsewhere.