web-crawler
HTML Mixed Encodings?
First I would like to say thank you for the help in advance. I am currently w开发者_如何学Pythonriting a web crawler that parses HTML content, strips HTML tags, and then spell checks the text which i[详细]
2023-04-08 20:25 分类:问答Use one search string to search 4 website catalogs
I frequent many libraries. The Brooklyn Public Libraries, Queens Public Libraries, New York Public Libraries and CUNY schools libraries. When I want a book I have to go to all 4 online catalogs and se[详细]
2023-04-08 16:48 分类:问答Website Downloader using Python
I am trying to create a website downloader using python. I have the code for: Finding all URLs from a page[详细]
2023-04-07 21:29 分类:问答get certain data from pages using crawler
I am looking to use a crawler to fetch data from a site, I found How do I make a simple crawler in PHP? and it was helpfull but I am looking to use the code on http://findpeopleonplus.com/ to get all[详细]
2023-04-07 08:47 分类:问答Mass Downloading of Webpages C#
My application requires that I download a large amount of webpages into memory for further parsing and processing. What is the fastest way to do it? My current method (shown below) seems to be too slo[详细]
2023-04-06 11:18 分类:问答Suggestions for avoiding duplicate products from scraping
I have written a very basic crawler which scrapes product information from websites to be put into a database.[详细]
2023-04-06 04:30 分类:问答Simulate human click in JavaScript
I have a small scraper where I need to click an anchor link using JavaScript. I\'ve tried a few ways: jQuery.click(), document.createEvent(\'MouseEvents\') etc. T开发者_Go百科hey all sort of worked, h[详细]
2023-04-05 09:18 分类:问答running multiple threads in python, simultaneously - is it possible?
I\'m writi开发者_开发问答ng a little crawler that should fetch a URL multiple times, I want all of the threads to run at the same time (simultaneously).[详细]
2023-04-03 23:03 分类:问答Reliability of using HEAD Request to Check Web Page Status
I\'ve been testing a small app I\'ve written that basically does a http HEAD request to check whether a page exists, redirects etc. I\'ve noticed that some pages respond differently to HEAD than GET r[详细]
2023-04-03 18:59 分类:问答wget not following links with spider
I am trying to check a page and all of its links as well as images. The following is stopping after the initial page and I get very little output.[详细]
2023-04-03 02:49 分类:问答