开发者

Crawl Images, Whole Web Pages and cache them

开发者 https://www.devze.com 2023-01-03 10:49 出处:网络
I am starting a project and wonder the relationship betwe开发者_StackOverflowen the characters in images and the whole web page where the images reside.

I am starting a project and wonder the relationship betwe开发者_StackOverflowen the characters in images and the whole web page where the images reside.

I want to crawl some images and their web pages. I need to save the crawl result in local disk for further analysis. I wonder if there is any open source for this issue?


Here's a list of open source crawlers http://www.google.co.uk/#hl=en&source=hp&q=open+source+web+crawler&aq=f&aqi=g9g-m1&aql=&oq=&gs_rfai=&fp=77130048d7e0701a

Near top of the list are Java crawlers, and the Wikipedia article has some more as well


You can use crawler4j for this purpose. It is a simple java crawler that can be configured in a few minutes and you can use it for crawling images as well. You can also find an ImageCrawler example in the source codes.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号