web-scraping
best way to emulate a python ie compatible browser
What would be the best way to emulate a IE browser with python for scraping? I\'ve found this script http://www.mayukhbose.com/python/IEC/index.php and was wo开发者_运维百科ndering if there was anythi[详细]
2023-04-13 01:46 分类:问答Ruby on Rails safari reader like text extraction and boilerplating
I have a digg like web service which briefly explained has a page parser and when people submit stories, the parser returns title and summary based on hpricot and some other small extraction principle[详细]
2023-04-12 20:17 分类:问答Web mining or scraping or crawling? What tool/library should I use? [closed]
Closed. This question is seeking recommendations for books, tools, software libraries, and more. It does not meet Stack Overflow guidelines. It is not currently accepting answers.[详细]
2023-04-12 17:08 分类:问答Get content from an external website? [duplicate]
This question开发者_JAVA技巧 already has answers here: Closed 11 years ago. Possible Duplicate: Can Javascript read the source of any web page?[详细]
2023-04-12 13:52 分类:问答Multilevel web spider with regex match?
I need a web spider to find certain links with regex. The spider would visit a list of websites, find links that match a regex pattern list, visit those matched links and repeat until the configured[详细]
2023-04-12 11:01 分类:问答How to connect via HTTPS using Jsoup?
It\'s working fine over HTTP, but when I try and use an HTTPS source it throws the following exception:[详细]
2023-04-12 09:00 分类:问答Why do I get a "IndexError: list index out of range"? (Beautiful Soup)
I am trying to scrape a table here very similar in structure to my previous question. I just changed the attributes names but I am getting index out of range error. This is the TR:[详细]
2023-04-10 16:06 分类:问答Getting the source code . [closed]
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references,or expertise, but this question will likely solicit debate, a[详细]
2023-04-10 08:02 分类:问答How do I invoke a Javascript that does not have a name using C#
I would like to invoke a Javascript function on a web page that does not have a f开发者_如何学JAVAunction name. Using C#, I would normally use Webbrowser.Document.InvokeScript(\"ScriptName\"). In this[详细]
2023-04-10 07:03 分类:问答How to scrape the 'More' portion of the Quora profile page?
To determine the list of all topics on Quora, I decided to start from scraping the profile page with many topics followed, e.g. http://www.quora.com/Charlie-Cheever/topics. I scraped the topics from t[详细]
2023-04-09 23:47 分类:问答