开发者

How to scrape the same url in loop with Scrapy

开发者 https://www.devze.com 2023-03-15 05:27 出处:网络
Needed content is located on the same page with a static URL. I created a spider that scrapes this page and stores the items in CSV. But it does so only once and then finish the crawling process. But

Needed content is located on the same page with a static URL.

I created a spider that scrapes this page and stores the items in CSV. But it does so only once and then finish the crawling process. But I need repeat the operation continuously. How can I do this?

Scrapy 0.12

Python 2开发者_如何转开发.5


Well giving you a specific example is kind of tough because I don't know what spider you're using and the internal workings of it, but something like this could work.

from scrapy.http import Request

class YourSpider(BaseSpider):
    # ...spider init details...
    def parse(self, response):
        # ...process item...
        yield item           
        yield Request(response.url, callback=self.parse)


You are missing dont_filter=True. Following is example.

import scrapy

class MySpider(BaseSpider):
    start_urls = ('http://www.test.com',)    

    def parse(self, response):
        ### Do you processing here
        yield scrapy.Request(response.url, callback=self.parse, dont_filter=True)


I code this way:

def start_requests(self):
    while True:
        yield scrapy.Request(url, callback=self.parse, dont_filter=True)

I have tried the way below, but there is a problem that when the Internet is unstable, It will stop and will break the loop.

from scrapy.http import Request

    class YourSpider(BaseSpider):
    # ...spider init details...
        def parse(self, response):
            # ...process item...
            yield item           
            yield Request(response.url, callback=self.parse)
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号