Restricting Yahoo Bot using Robots

Yahoo bot has been indexing more number of pages than Google bots. This is due to Yahoo indexing even the internal site search results pages. Some webmasters have reported that Yahoo bot crawls even their drop down menus and also displays their sitemap.xml in its results page which is ridiculous.

Such indexing can deter your rankings and even get listed for spaming. There is an effective way to stop Yahoo from crawling these unnecessary pages and providing the right priorities to your main pages.

Yahoo’s bot is called “Slurp”.

One effective way to stop Yahoo! Slurp to index a page is to provide the “noindex” tag on its header section.

<META NAME="robots" CONTENT="noindex">

for stopping all search engine bots from indexing the page

<META NAME="Slurp" CONTENT="noindex">

to stop Yahoo! Slurp alone from indexing.

These changes would be difficult if your website has dynamic content and especially when your website is too large. It is not practically possible to implement this code inside the pages which you do not want the crawler to index.

Using Wildcards in Robots.txt

Robots.txt is a simple yet effective way to overcome this issue. If your URL is dynamic, or if you wish to block URLs from a certain director, you can do so using wildcards.

For example: http://www.seo-mind.com/?s=robots.txt can be blocked from being indexed by using the “?’ symbol in robots.

Disallow: /? Would block this URL from being indexed.

User-Agent: slurp
Disallow: /cgi-bin*/
Disallow: /*_search*.html
Disallow: /*?sessionid

Similar statements as shown above can be used to block different kinds of URLs from being indexed.

Read more about “How to remove a page already indexed by Yahoo