开发者

How to crawl images in Nutch?

开发者 https://www.devze.com 2023-01-07 08:05 出处:网络
How to crawl i开发者_开发百科mages in Nutch? Or, is there any other open search engine which is producing the results with images?change your regex-urlfilter.txt in conf

How to crawl i开发者_开发百科mages in Nutch? Or, is there any other open search engine which is producing the results with images?


change your regex-urlfilter.txt in conf

-.(ico|ICO|css|CSS|sit|SIT|eps|EPS|wmf|WMF|zip|ZIP|ppt|PPT|xls|XLS|gz|GZ|rpm|RPM|tgz|TGZ|exe|EXE|js|JS|gif|GIF|png|PNG||jpg|JPG|jpeg|JPEG|bmp|BMP|mpg|MPG|mov|MOV)$

Delete jpeg, jpg, gif or type picture that you want to grep.

And then change suffix-urlfilter.txt in conf

add # to jpeg, gif or png

That worked for me!

0

精彩评论

暂无评论...
验证码 换一张
取 消