Bandwidth on one of our sites was severely messed w开发者_JAVA百科ith on the 28th of this month. The cpanel only tracks daily access logs and didnt archive them(it does now), using aw stats I found our bot traffic to be as follows:
Unknown robot (identified by 'bot*') 91541+417 4.78 GB 28 Jul 2010 - 07:12
I have blocked bot* using htaccess:
RewriteCond %{HTTP_USER_AGENT} ^bot* [NC]
RewriteRule .* - [F,L]
I have been informed that this can interfere with traffic, what should I do? should I wait for it to happen again then check logs for IP/Agent name or continue to block unkown robots?
I did a dns lookup on the googlebot entries I do have and they check out.
You should use the Robots Exclusion Order. It may not be a spam bot, if you place an entry in the robots.txt and it's still turning up at your site then you will know if it is or not.
By the way, googlebot is the indexer for google. It will adhere to the robots.txt (robot exclusion order). It also provides webmaster tools to allow you to configure how google interacts with your site.
You could lay a trap for the errant bot. Make a link on your home page that is invisble (via css). Configure your robots.txt to instruct all bots to ignore the link and log the bots that do.
If you have a firewall or some other infrastruture in place, exclude these IP addresses from future use of your site.
精彩评论