Automating JPEG download_问答_开发者_运维开发者技术经验分享

Automating JPEG download

开发者 https://www.devze.com 2023-01-26 15:39 出处：网络

I need to download jpeg images of size > MIN_SIZE from开发者_如何学Go the pages 1 <= PAGE_NUMBER <= NUM_OF_PAGES

相关专题：python

I need to download jpeg images of size > MIN_SIZE from开发者_如何学Go the pages 1 <= PAGE_NUMBER <= NUM_OF_PAGES

http://somewebsite.com/showthread.php?t=12345&page=PAGE_NUMBER

How can I do that in python? I am new to python.

Here's how I would do it in Python:

Fetch each page you need to grab image from (easy, just use mechanize or some other HTTP fetcher library)
Parse each HTML file to grab the image URLs. This a bit more involved -- have a look at HTMLParser. From memory, you can subclass HTMLParser to only grab the text that you're interested in. In this case, this is the src attribute from the HTML img tag, e.g. something like <img src="this is what you want" width=640 height=480/>
Fetch each image obtained above (easy)

Personally, though, I wouldn't use Python for this. The first and last steps of the above approach are easily done with wget. The second can be performed with grep, with bash to tie everything together. In fact, this is pretty much exactly what I recommended here.

That is, of course, if you're on Linux. If you don't have bash and get Python may be your next best option.