I'm using heritrix 3.1.1-snapshot to crawl / archive some website contents, I need to log all urls encountered in every page it processes, including those urls which are (configured) not to be crawled.
I've been searching for long time and havent gotten positive results 开发者_运维百科:( hope can get some helps here. thanks.
http://crawler.archive.org/articles/user_manual/config.html section 6.3.1.4 seems to answer your question.
精彩评论