开发者

How to configure heritrix to log all encountered URLs including those which are filtered / not to crawl?

开发者 https://www.devze.com 2023-02-23 13:56 出处:网络
I\'m using heritrix 3.1.1-snapshot to crawl / archive some website contents, I need to log all urls encountered in every page it processes, including those urls which are (configured) not to be crawle

I'm using heritrix 3.1.1-snapshot to crawl / archive some website contents, I need to log all urls encountered in every page it processes, including those urls which are (configured) not to be crawled.

I've been searching for long time and havent gotten positive results 开发者_运维百科:( hope can get some helps here. thanks.


http://crawler.archive.org/articles/user_manual/config.html section 6.3.1.4 seems to answer your question.

0

精彩评论

暂无评论...
验证码 换一张
取 消