I need to download jpeg images of size > MIN_SIZE from开发者_如何学Go the pages 1 <= PAGE_NUMBER <= NUM_OF_PAGES
http://somewebsite.com/showthread.php?t=12345&page=PAGE_NUMBER
How can I do that in python? I am new to python.
Here's how I would do it in Python:
- Fetch each page you need to grab image from (easy, just use mechanize or some other HTTP fetcher library)
- Parse each HTML file to grab the image URLs. This a bit more involved -- have a look at HTMLParser. From memory, you can subclass
HTMLParser
to only grab the text that you're interested in. In this case, this is thesrc
attribute from the HTMLimg
tag, e.g. something like<img src="this is what you want" width=640 height=480/>
- Fetch each image obtained above (easy)
Personally, though, I wouldn't use Python for this. The first and last steps of the above approach are easily done with wget
. The second can be performed with grep
, with bash
to tie everything together. In fact, this is pretty much exactly what I recommended here.
That is, of course, if you're on Linux. If you don't have bash and get Python may be your next best option.
精彩评论