开发者

Scrapy Django Limit links crawled

开发者 https://www.devze.com 2023-01-27 00:22 出处:网络
I just got scrapy setup and running and it works great, but I have two (noob) questions.I should say first that I am totally new to scrapy and spidering sites.

I just got scrapy setup and running and it works great, but I have two (noob) questions. I should say first that I am totally new to scrapy and spidering sites.

  1. Can you limit the number of links crawled? I have a site that doesn't use pagination and just lists a lot of links (which I crawl) on their home page. I feel bad crawling all of those links when I really just need to crawl the first 10 or so.

  2. How do you run mul开发者_开发知识库tiple spiders at once? Right now I am using the command scrapy crawl example.com, but I also have spiders for example2.com and example3.com. I would like to run all of my spiders using one command. Is this possible?


for #1: Don't use rules attribute to extract links and follow, write your rule in parse function and yield or return Requests object.

for #2: Try scrapyd


Credit goes to Shane, here https://groups.google.com/forum/?fromgroups#!topic/scrapy-users/EyG_jcyLYmU

Using a CloseSpider should allow you to specify limits of this sort.

http://doc.scrapy.org/en/latest/topics/extensions.html#module-scrapy.contrib.closespider

Haven't tried it yet since I didn't need it. Looks like you also might have to enable as an extension (see top of same page) in your settings file.

0

精彩评论

暂无评论...
验证码 换一张
取 消