开发者

Whats the best way to crawl a batch of urls for a specific html element and retrieve the image?

开发者 https://www.devze.com 2022-12-19 14:38 出处：网络

I\'m looking to crawl ~100 webpages that are of the same structure, but the image I require is of a different name in each instance.

相关专题：sysadmin web-services wget

I'm looking to crawl ~100 webpages that are of the same structure, but the image I require is of a different name in each instance.

The image tag is located at:

#content div.artwork img.ar开发者_如何转开发twork

and I need the src url of that result to be downloaded.

Any ideas? I have the urls in a .txt file, and am on a mac os x box.

I am not sure how you can utilize a 'selector' like query on the file but a Perl regex might do the job just as well:

for url in `cat urls.txt`; do wget -O- $url; done | \
  perl -nle 'print $1 if /<img.+?class="artwork".+?src="([^"]+)"/'

暂无评论...

登录注册

请自觉遵守互联网相关的政策法规，严禁发布色情、暴力、反动的言论！

验证码：

取消

Delphi - Custom drawing a message list

C++ header-only include pattern

IE7 Margin Collapses Into Padding

in CoffeeScript, how can I use a variable as a key in a hash?

Interactive visualization of a graph in python [closed]

How to customise PHP MYSQL tables?

High quality, simple random password generator

Image Recognition ApI in android

开发者开发者网给大家分享系统运维,大数据运维,云计算,编程开发技巧,路由交换,运维和开发相关的资讯及技术文章，同时StackOverflow中文社区，知识经验交流分享。

法律声明：本站内容均为网友上传，网站举办方负责审核和监督，如存在版权或非法内容，欢迎举报，我们将尽快予以删除。邮箱：devze@qq.com