Posted on 13 October 2008. Tags: 404 errors, crawl error, crawl issues, google webmaster tools, sitemap error
New and interesting features are being added to Google Webmaster Tools on a regular basis. Though Yahoo and MSN do update their tools and features, they lag far behind when it comes to Google’s update frequency. Google now helps webmasters understand the crawl error sources.

Webmasters using Google’s Webmaster Tools would have faced the issue of not knowing the exact cause of “Not Found” and “Errors for URLs in Sitemaps” errors. Google reports a “Not Found” error if the URL is present in another website, however, not available on the owners website.
Google has now added a new column “Linked From” which lists the number of pages that link to a “Not Found” URL. Once when you click on the “Linked From” link, a new dialog box appears which lists each individual page that has linked to the specific URL. The link page can be either within the website or in an external website.
Further more, you can also download either a specific table or all errors for the site or all sources of error in the site. This new feature will help webmasters reduce 404 errors by identifying the source of the issue which can help search engine rankings in a positive way.
Posted in Webmaster
Posted on 14 August 2008. Tags: crawl issues, dynamic url, googlebot, urls
Googlebot is becoming clever and smarter day by day. It would no longer be patient enough to crawl all your search result pages which carry those complex URLs. Instead it would send you a warning that it found out high number of URLs.
Webmasters have been allowing search engines to index their internal search result pages which in turn portray a false image that the site is huge. This can also help in getting a large number of pages indexed in search engines.
Recently, Google has been sending warning messages to websites which have similar practices or if their URL structure is too complex to understand.
Subject: Googlebot found an extremely high number of URLs on your site: www.example.cm
Message: Googlebot encountered problems while crawling your site http://www.example.com/.
Googlebot encountered extremely large numbers of links on your site. This may indicate a problem with your site's URL structure. Googlebot may unnecessarily be crawling a large number of distinct URLs that point to identical or similar content, or crawling parts of your site that are not intended to be crawled by Googlebot. As a result Googlebot may consume much more bandwidth than necessary, or may be unable to completely index all of the content on your site. The above notification mail also includes a list of URLs which face the issue. Though there is no mention of a ban or spamming, Google might consider this to be a serious issue in the near future. Google would definitely think about avoiding crawling into these pages as this process would involve consuming a large number of resources and time.
Posted in Webmaster
Posted on 10 August 2008. Tags: 404 errors, crawl issues, msn live, robots.txt, webmaster live
Microsoft Webmaster Tool has had a recent update with some new features added to help webmasters.
Webmaster Live which is now moved out of beta, now has some additional features similar to Google Webmaster Tools. Google has been constantly updating its webmaster tool which provides much greater insight into a website once the owner authenticates their site.
The new section “Crawl Issues” provides information related to 404 errors, Blocked by REP, Long Dynamic URL and Unsupported content type. REP issues are related to the Robot Exclusion Protocol through the robots.txt file. These issues can further filtered by subdomain or subfolder level. The derived report can then be downloaded as aCSV file.
Similarly, the “Backlink” tool and other tools like “Outbound Links” now has filtering and downloading options. The listing also includes a Page Score similar to how Google ranks pages with its Page Rank.
This is a good update by Microsoft but still have a long way to go and frequent updates to compete with Google Webmaster Tools.
Posted in Webmaster