I am starting a project and wonder the relationship betwe开发者_StackOverflowen the characters in images and the whole web page where the images reside.
I want to crawl some images and their web pages. I need to save the crawl result in local disk for further analysis. I wonder if there is any open source for this issue?
Here's a list of open source crawlers http://www.google.co.uk/#hl=en&source=hp&q=open+source+web+crawler&aq=f&aqi=g9g-m1&aql=&oq=&gs_rfai=&fp=77130048d7e0701a
Near top of the list are Java crawlers, and the Wikipedia article has some more as well
You can use crawler4j for this purpose. It is a simple java crawler that can be configured in a few minutes and you can use it for crawling images as well. You can also find an ImageCrawler example in the source codes.
精彩评论