I have started to use scrapy on Ubuntu 11, and facing issue. Specifically the parse function in the following code does not execute, although the terminal shows开发者_如何学运维 the spider executed and closed successfully
from scrapy.contrib.spiders import CrawlSpider
from scrapy.selector import HtmlXPathSelector
class myTestSpider(CrawlSpider):
name="go4mumbai.com"
domain_name = "go4mumbai.com"
start_urls = ["http://www.go4mumbai.com/Mumbai_Bus_Route.php?busno=1"]
def parse(self, response):
hxs = HtmlXPathSelector(response)
stopNames=hxs.select('//table[@cellspacing="2"]/tr/td[2]/a/text()').extract()
print len(stopNames)
SPIDER = myTestSpider()
The following is the response from the terminal
rupin@rupin-laptop:~/Desktop/ScrappyTest/basetest$ sudo scrapy crawl go4mumbai.com
2011-09-21 15:33:56+0530 [scrapy] INFO: Scrapy 0.12.0.2528 started (bot: basetest)
2011-09-21 15:33:56+0530 [scrapy] DEBUG: Enabled extensions: TelnetConsole, SpiderContext, WebService, CoreStats, MemoryUsage, CloseSpider
2011-09-21 15:33:56+0530 [scrapy] DEBUG: Enabled scheduler middlewares: DuplicatesFilterMiddleware
2011-09-21 15:33:56+0530 [scrapy] DEBUG: Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware, RedirectMiddleware, CookiesMiddleware, HttpCompressionMiddleware, DownloaderStats
2011-09-21 15:33:56+0530 [scrapy] DEBUG: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware
2011-09-21 15:33:56+0530 [scrapy] DEBUG: Enabled item pipelines:
2011-09-21 15:33:56+0530 [scrapy] DEBUG: Telnet console listening on 0.0.0.0:6023
2011-09-21 15:33:56+0530 [scrapy] DEBUG: Web service listening on 0.0.0.0:6080
2011-09-21 15:33:56+0530 [go4mumbai.com] INFO: Spider opened
2011-09-21 15:33:58+0530 [go4mumbai.com] DEBUG: Crawled (200) <GET http://www.go4mumbai.com/Mumbai_Bus_Route.php?busno=1> (referer: None)
2011-09-21 15:33:58+0530 [go4mumbai.com] INFO: Closing spider (finished)
2011-09-21 15:33:58+0530 [go4mumbai.com] INFO: Spider closed (finished)
Is there some part of the code I am missing? Please advise..
Your parse()
function does not seem to belong to your spider class.
Indent the whole function for one indention, so it belongs to the class and gets called.
精彩评论