开发者

nutch crawl path

开发者 https://www.devze.com 2023-03-29 17:27 出处:网络
I would like to know how to make nutch crawl not only the domain that I specified, but also the dir path within 开发者_StackOverflowthe domain that I specified.I know that you can configure this infor

I would like to know how to make nutch crawl not only the domain that I specified, but also the dir path within 开发者_StackOverflowthe domain that I specified. I know that you can configure this information on regex-urlfilter.txt


This should crawl only the domain/path you want :

+.*www\.domain\.com/yourpath/.*  
#skip everything else  
-.*
0

精彩评论

暂无评论...
验证码 换一张
取 消