I just got scrapy setup and running and it works great, but I have two (noob) questions. I should say first that I am totally new to scrapy and spidering sites.
Can you limit the number of links crawled? I have a site that doesn't use pagination and just lists a lot of links (which I crawl) on their home page. I feel bad crawling all of those links when I really just need to crawl the first 10 or so.
How do you run mul开发者_开发知识库tiple spiders at once? Right now I am using the command
scrapy crawl example.com
, but I also have spiders for example2.com and example3.com. I would like to run all of my spiders using one command. Is this possible?
for #1: Don't use rules attribute to extract links and follow, write your rule in parse function and yield or return Requests object.
for #2: Try scrapyd
Credit goes to Shane, here https://groups.google.com/forum/?fromgroups#!topic/scrapy-users/EyG_jcyLYmU
Using a CloseSpider should allow you to specify limits of this sort.
http://doc.scrapy.org/en/latest/topics/extensions.html#module-scrapy.contrib.closespider
Haven't tried it yet since I didn't need it. Looks like you also might have to enable as an extension (see top of same page) in your settings file.
精彩评论