web-crawler
How can I add URLs to crawler of crawler4j at random times during progress
I\'m tackling to crawler4j. http://code.google.com/p/crawler4j/ and simple test crawl a site was succeeded.[详细]
2023-04-13 09:10 分类:问答Web mining or scraping or crawling? What tool/library should I use? [closed]
Closed. This question is seeking recommendations for books, tools, software libraries, and more. It does not meet Stack Overflow guidelines. It is not currently accepting answers.[详细]
2023-04-12 17:08 分类:问答Multilevel web spider with regex match?
I need a web spider to find certain links with regex. The spider would visit a list of websites, find links that match a regex pattern list, visit those matched links and repeat until the configured[详细]
2023-04-12 11:01 分类:问答Can modernizr and/or yepnope react to bots and spiders?
I have some JS running on a page which pops up a modal localisation select box. I would like to prevent this from happening for bots /crawlers. Is the开发者_如何学Gore a way to do this using Modernizr[详细]
2023-04-12 05:54 分类:问答Prevent Custom Web Crawler from being blocked
I am creating a new web crawler using C# to crawl some specific websites. every thing goes fine. but the problem is that some websites are blocking my crawler IP address after some requests. I tried u[详细]
2023-04-10 17:13 分类:问答Fast internet crawler
I\'d like to do perform data mining on a large scale. For this, I need a fast crawler. All I need is something to download a web page, extract links and follow them recursively, but without visiting t[详细]
2023-04-10 10:18 分类:问答Encoding issues crawling non-english websites
I\'m trying to get the contents of a webpage开发者_Go百科 as a string, and I found this question addressing how to write a basic web crawler, which claims to (and seems to) handle the encoding issue,[详细]
2023-04-10 09:26 分类:问答It's possible open multiple connections to multiple sites using only one thread?
Update I use a FixedThreadPool already. What happens is that each thread open one connection for one site. What I want to do is something asynchronous.[详细]
2023-04-10 02:29 分类:问答make a friendly multi-language website
Just to make things clear. I\'m trying to figure out how to build a website with a language ch开发者_JS百科ooser.[详细]
2023-04-10 02:10 分类:问答Google crawling, AJAX and HTML5
HTML5 allows us to update the current URL without refreshing the browser. I\'ve created a small framework on top of HTML5 which allows me to leverage this transparently, so I can do all requests using[详细]
2023-04-09 08:20 分类:问答