开发者

Scrapy "parse" function not being executed

开发者 https://www.devze.com 2023-04-06 04:11 出处:网络
I have started to use scrapy on Ubuntu 11, and facing issue. Specifically the parse function in the following code does not execute, although the terminal shows开发者_如何学运维 the spider executed an

I have started to use scrapy on Ubuntu 11, and facing issue. Specifically the parse function in the following code does not execute, although the terminal shows开发者_如何学运维 the spider executed and closed successfully

from scrapy.contrib.spiders import CrawlSpider
from scrapy.selector import HtmlXPathSelector



class myTestSpider(CrawlSpider):
    name="go4mumbai.com"
    domain_name = "go4mumbai.com"
    start_urls = ["http://www.go4mumbai.com/Mumbai_Bus_Route.php?busno=1"]

def parse(self, response):  
    hxs = HtmlXPathSelector(response)
    stopNames=hxs.select('//table[@cellspacing="2"]/tr/td[2]/a/text()').extract()
    print len(stopNames)

SPIDER = myTestSpider()

The following is the response from the terminal

rupin@rupin-laptop:~/Desktop/ScrappyTest/basetest$ sudo scrapy crawl go4mumbai.com
2011-09-21 15:33:56+0530 [scrapy] INFO: Scrapy 0.12.0.2528 started (bot: basetest)
2011-09-21 15:33:56+0530 [scrapy] DEBUG: Enabled extensions: TelnetConsole,     SpiderContext, WebService, CoreStats, MemoryUsage, CloseSpider
2011-09-21 15:33:56+0530 [scrapy] DEBUG: Enabled scheduler middlewares:     DuplicatesFilterMiddleware
2011-09-21 15:33:56+0530 [scrapy] DEBUG: Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware, RedirectMiddleware, CookiesMiddleware, HttpCompressionMiddleware, DownloaderStats
2011-09-21 15:33:56+0530 [scrapy] DEBUG: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware
2011-09-21 15:33:56+0530 [scrapy] DEBUG: Enabled item pipelines: 
2011-09-21 15:33:56+0530 [scrapy] DEBUG: Telnet console listening on 0.0.0.0:6023
2011-09-21 15:33:56+0530 [scrapy] DEBUG: Web service listening on 0.0.0.0:6080
2011-09-21 15:33:56+0530 [go4mumbai.com] INFO: Spider opened
2011-09-21 15:33:58+0530 [go4mumbai.com] DEBUG: Crawled (200) <GET http://www.go4mumbai.com/Mumbai_Bus_Route.php?busno=1> (referer: None)
2011-09-21 15:33:58+0530 [go4mumbai.com] INFO: Closing spider (finished)
2011-09-21 15:33:58+0530 [go4mumbai.com] INFO: Spider closed (finished)

Is there some part of the code I am missing? Please advise..


Your parse() function does not seem to belong to your spider class. Indent the whole function for one indention, so it belongs to the class and gets called.

0

精彩评论

暂无评论...
验证码 换一张
取 消