How to identify excessive robots traffic on your website: robots.txt

When are search engine robots generating too much traffic on your web site

With the widespread interest in blogs and blogging software, there has been an explosion of new and fresh content, driving search engines to spider with an increased frequency.

The increase in a blogs popularity will have a direct impact on both visible and invisible traffic. Visible traffic is produced by your visitors, people who visit your blog. Invisibile traffic is produced by search engines and the like that explore (and perhaps steal) your content.

As your blog become popular invisible traffic may get out of hand and you’ll want to make extra sure your GBytes of traffic are all legitimate. Sometimes there can be some errors in the site architecture.

I came across an interesting case of redirect mishandling that was generating a huge amount of useless traffic.

On a web site I reviewed the ratio:

search engine spidering / users

was 4 to 1

Google, YAHOO! and MSN were generating GBytes of traffic I just couldn’t explain to myself … until I had the opportunity to download the LOGs and take a closer look at them.

MSN sidering activities on a web site in 1 month

There was an incredible amount of traffic being systematically generated off the robots.txt file – in some instances more than 1 GByte in just a few days – just too much (the previous screenshot was taken from a log analyzer – MSN generated 2 GByte in just under 1 month).

There it was as plain as the nose on my face: Spiders generating tons of useless traffic on the robot.txt file … but there was no robots.txt file on the server.

It was all traffic coming from redirects to the blog home page. In fact they had a redirect to the blog home page funneling all traffic from non existing pages or files, in particular the robots.txt file !

  • Do you have a robot.txt file on your server ?
  • Are you using a redirect your web site home page ?

It might be worth checking out …

