scrapy
python-scrapy: how to fetch an URL (not via following links) inside a spider?
How can I have inside my spider something that will fetch some URL to extract something from a page via HtmlXPathSelector? But the URL is something I want to supply as a string inside the code, not a[详细]
2023-02-03 20:16 分类:问答Suggestion for building search engine using Django
Im new in web crawling. I\'m going to build a search engine which the crawler saves Rapidshare links including URL where that Rapidshare links found...[详细]
2023-02-03 14:30 分类:问答How to match a case insensitive value with XPath
I have an XPath with which I\'m trying to match meta tags that have a name attribute with a value that contains the word \'keyword\' irrespective of case. Basically, I\'m trying to match:[详细]
2023-02-03 01:02 分类:问答Scrapy Newbie Question - can't get tutorial file working
I am a complete newbie to Python and Scrapy so I started by trying to replicate the tutorial.I am trying to scrape the www.dmoz.org website as per the tutorial.[详细]
2023-01-30 19:39 分类:问答Access django models inside of Scrapy
Is it possible to access my django models inside of a Scrapy pipeline, so that I can save my scraped data straight to my model?[详细]
2023-01-27 00:53 分类:问答Scrapy Django Limit links crawled
I just got scrapy setup and running and it works great, but I have two (noob) questions.I should say first that I am totally new to scrapy and spidering sites.[详细]
2023-01-27 00:22 分类:问答guidance on python scraping packages
I\'m still a newcomer to python, so I hope this question isn\'t inane. The more I google for web scraping solutions, the more confused I become (unable to see a forest, despite investigating many tr[详细]
2023-01-27 00:07 分类:问答Scrapy pipeline spider_opened and spider_closed not being called
I am having some trouble with a scrapy pipeline. My information is being scraped form sites ok and the process_item method is being called correctly. However the spider_opened and spider_closed method[详细]
2023-01-24 06:53 分类:问答Can't get Scrapy pipeline to work
I have spider that I have written using the Scrapy framework. I am having some trouble getting any pipelines to work. I have the following code in my pipelines.py:[详细]
2023-01-23 14:43 分类:问答web server returns "500 Internal Server Error" after sending this FormRequest using Scrapy
I construct the following FormRequest ac开发者_C百科cording to httpFox(Firefox addon)\'s content. However, web server alway returns\"500 Internal Server Error\".[详细]
2023-01-21 12:51 分类:问答