Is there a way to get Nutch to increase the crawling of pages that gets updated frequently?
E.g. index pages and feeds.
It would also be of value to refresh fresh pages that contains comments more f开发者_运维技巧requently the first date after the page was created. Any tips are appreciated.
What you need is the Adaptive Fetch Schedule. I have written a blog post about how it works. Basically what this scheduler does is gradually makes the pages that change more often to be visited more and more regularly.
精彩评论