开发者

nutch + mysql integration

开发者 https://www.devze.com 2023-01-06 15:46 出处:网络
When nutch finishes its cycle (that is crawl - fetch- parse - index) during index phase, I do not want nutch to index (lucene index), but I want nutch to place all the crawled data (I believe he keeps

When nutch finishes its cycle (that is crawl - fetch- parse - index) during index phase, I do not want nutch to index (lucene index), but I want nutch to place all the crawled data (I believe he keeps them as NutchDocument object) into mysql using my code.

Is there any way to do t开发者_如何学Pythonhis?

Thanks


Create your own java class that manage the Nutch cycle. It should be similar to org.apache.nutch.crawl.Crawl but you will have to replace the call to the indexer by a call to your Mysql connector. Or you can call your Mysql connector during each cycle depending on whether you want to update Mysql at the end of the crawl or while it is happening.

0

精彩评论

暂无评论...
验证码 换一张
取 消