I'm looking to crawl ~100 webpages that are of the same structure, but the image I require is of a different name in each instance.
The image tag is located at:
#content div.artwork img.ar开发者_如何转开发twork
and I need the src url of that result to be downloaded.
Any ideas? I have the urls in a .txt file, and am on a mac os x box.
I am not sure how you can utilize a 'selector' like query on the file but a Perl regex might do the job just as well:
for url in `cat urls.txt`; do wget -O- $url; done | \
perl -nle 'print $1 if /<img.+?class="artwork".+?src="([^"]+)"/'
精彩评论