We are using a web scraper and have it set up to have a sleep function which has a ra开发者_如何学编程ndom function set up (so that it isn't the same time between each scrape) but we are still getting blocked from Yahoo after 20-30 requests.
Does any one know if there is a limit (i.e: 20 requests per minutes, 200 an hour) Right now our average between each request is around 3-6 seconds. Thanks for any help
1 request every 3-6 seconds is quite low so perhaps there is another problem with your crawler.
A few ideas:
- set the User-Agent to something non-suspicious
- set the Referer header to the same domain
- try running your crawler from a different IP in case your current IP is blacklisted
- try maintaining cookies
This will all be easier if you use a higher level library like Mechanize.
So the answer is 5000 queries. Taken from
http://forums.digitalpoint.com/showthread.php?t=736784
http:// developer. yahoo. com/search/rate.html
精彩评论