Bingbot will hit my site pretty hard for a couple of hours each day, and will be extremely light for the rest of the time.
I'd either like to smooth out its crawls, reduce its rate lim开发者_运维百科it, or block it altogether. It doesn't really send through any real visitors.
Is there a way I can smooth its crawling, or rate limit it?
Their webmaster blog says that they support adding a crawl-delay parameter to your robots.txt file to throttle the bingbot.
User-agent: msnbot
Crawl-delay: 1
There's a bit more explanation in the webmaster FAQ PDF
These other links might be helpful as well:
http://www.bing.com/toolbox/webmasters
http://www.bing.com/community/webmaster/f/12252/t/651373.aspx
You can ban his IP using HTACCESS.
order allow,deny
deny from 192.168.44.201
deny from 224.39.163.12
deny from 172.16.7.92
allow from all
More about that you may find here: Blog about bot blocking
You can limit the number of connections from the crawler to f.i. 5 by setting IPTables like this (requires root access to the firewall):
The article at 2bits.com
the setting of IPTables: iptables -I INPUT -p tcp -m connlimit --connlimit-above 5 -j REJECT
This limits connections from each IP address to no more than 5 simultaneous connections. This sort of "rations" connections, and prevents crawlers from hitting the site simultaneously.
精彩评论