hi i am completin开发者_JS百科g a little hobby project of mine to create a small scale search engine.
i was wondering if any one knows of a decent robust opensource web crawler that they have used? it should be easy for a noob to setup and use.
thank you for not googling web crawlers and pasting a list .
crawler4j is a pretty decent crawler, multi-threaded and easy to configure and use. It's written in Java.
You can find a list of open-source crawlers in this wikipedia page.
I think you should read a similar experience.
http://infolab.stanford.edu/~backrub/google.html
精彩评论