robots.txt


Restricting Yahoo Bot using Robots


Yahoo bot has been indexing more number of pages than Google bots. This is due to Yahoo indexing even the internal site search results pages. Some webmasters have reported that Yahoo bot crawls even their drop down menus and also displays their sitemap.xml in its results page which is ridiculous.

Such indexing can deter your rankings and even get listed for spaming. There is an effective way to stop Yahoo from crawling these unnecessary pages and providing the right priorities to your main pages.

Yahoo’s bot is called “Slurp”.

One effective way to stop Yahoo! Slurp to index a page is to provide the “noindex” tag on its header section.

<META NAME="robots" CONTENT="noindex">

for stopping all search engine bots from indexing the page

<META NAME="Slurp" CONTENT="noindex">

to stop Yahoo! Slurp alone from indexing.

These changes would be difficult if your website has dynamic content and especially when your website is too large. It is not practically possible to implement this code inside the pages which you do not want the crawler to index.

Using Wildcards in Robots.txt

Robots.txt is a simple yet effective way to overcome this issue. If your URL is dynamic, or if you wish to block URLs from a certain director, you can do so using wildcards.

For example: http://www.seo-mind.com/?s=robots.txt can be blocked from being indexed by using the “?’ symbol in robots.

Disallow: /? Would block this URL from being indexed.

User-Agent: slurp
Disallow: /cgi-bin*/
Disallow: /*_search*.html
Disallow: /*?sessionid

Similar statements as shown above can be used to block different kinds of URLs from being indexed.

Read more about “How to remove a page already indexed by Yahoo

Posted in ArticlesComments (0)


Webmaster Live Updated


Microsoft Webmaster Tool has had a recent update with some new features added to help webmasters.

Webmaster Live which is now moved out of beta, now has some additional features similar to Google Webmaster Tools. Google has been constantly updating its webmaster tool which provides much greater insight into a website once the owner authenticates their site.

The new section “Crawl Issues” provides information related to 404 errors, Blocked by REP, Long Dynamic URL and Unsupported content type. REP issues are related to the Robot Exclusion Protocol through the robots.txt file. These issues can further filtered by subdomain or subfolder level. The derived report can then be downloaded as aCSV file.

Similarly, the “Backlink” tool and other tools like “Outbound Links” now has filtering and downloading options. The listing also includes a Page Score similar to how Google ranks pages with its Page Rank.

This is a good update by Microsoft but still have a long way to go and frequent updates to compete with Google Webmaster Tools.

Posted in WebmasterComments (0)


Advertisement

Widgets

Archives