Articles

How to Remove a Page already Indexed by Yahoo

1 Star2 Stars3 Stars4 Stars5 Stars (1 votes, average: 4 out of 5)
Loading ... Loading ...

In recent months, Yahoo has been crawling all unnecessary pages which spoil the ranking of the high priority pages in a website. Hence, Yahoo can be seen reporting a much larger amount of indexed pages than what is actually available.
To avoid unnecessary indexing, please read the “Restricting Yahoo Bot using Robots.txt” article.

To remove already indexed pages from Yahoo’s index, you can use wildcards in Yahoo Site Explorer the similar way as you used in Robots.txt

  1. Log into your Yahoo Site Explorer Tool using your Yahoo username and password. [Note: You need to authenticate your site first before you can start removing the URLs.] 
  2. Once you have authenticated your site, you would see the site listed under “My Sites”
  3. Click on the “Explore” button next to your URL.
  4. A page with the list of URLs in the site appears
  5. Move your mouse over the URL you would like to remove
  6. You would find a Delete URL/Path button. Click on the button
  7. In the following page, you can specify if you wish to further proceed with deleting the page.
  8. You can delete upto 25URLs at one time.
  9. If successfully deleted, you would find the URLs listed on the last tab “Actions” found in the left corner
Articles

Restricting Yahoo Bot using Robots

1 Star2 Stars3 Stars4 Stars5 Stars (1 votes, average: 4 out of 5)
Loading ... Loading ...

Yahoo bot has been indexing more number of pages than Google bots. This is due to Yahoo indexing even the internal site search results pages. Some webmasters have reported that  Yahoo bot crawls even their drop down menus and also displays their sitemap.xml in its results page which is ridiculous.

Such indexing can deter your rankings and even get listed for spaming. There is an effective way to stop Yahoo from crawling these unnecessary pages and providing the right priorities to your main pages.

Yahoo’s bot is called “Slurp”.

One effective way to stop Yahoo! Slurp to index a page is to provide the “noindex” tag on its header section.

<META NAME="robots" CONTENT="noindex">

for stopping all search engine bots from indexing the page
  
<META NAME="Slurp" CONTENT="noindex">

to stop Yahoo! Slurp alone from indexing.

These changes would be difficult if your website has dynamic content and especially when your website is too large. It is not practically possible to implement this code inside the pages which you do not want the crawler to index.

Using Wildcards in Robots.txt

Robots.txt is a simple yet effective way to overcome this issue. If your URL is dynamic, or if you wish to block URLs from a certain director, you can do so using wildcards.

For example: http://www.seo-mind.com/?s=robots.txt can be blocked from being indexed by using the “?’ symbol in robots.

Disallow: /? Would block this URL from being indexed.

User-Agent: slurp
Disallow: /cgi-bin*/
Disallow: /*_search*.html
Disallow: /*?sessionid

Similar statements as shown above can be used to block different kinds of URLs from being indexed.

Read more about “How to remove a page already indexed by Yahoo

Yahoo

Slurps Hit Sites Repeatedly!

1 Star2 Stars3 Stars4 Stars5 Stars (No Ratings Yet)
Loading ... Loading ...

Some webmasters have noticed that Slurp (Yahoos Spider) is hitting their site for almost an hour with over 1000 requests, most of these referals from Overture Advertisements. However, its been reported that the PPCs are not being charged. However, rumours are that these might be some illicit IP addresses trying to gain income through PPCs. The Ips were from Inktomi.

66.196.92.14
66.196.92.17
66.196.92.19