开发者

Getting largest image from a page via Hpricot

开发者 https://www.devze.com 2023-01-10 21:36 出处:网络
I\'m trying to get the largest image off a page I parse with Hpricot and am not having any luck. How do I access the width and height attributes of an img tag with i开发者_如何学Got?It is possible, pr

I'm trying to get the largest image off a page I parse with Hpricot and am not having any luck. How do I access the width and height attributes of an img tag with i开发者_如何学Got?


It is possible, provided the image width/height attributes are present in the HTML for each image.

hp = Hpricot(page_html)  

# get all image tags, sort them by height, then take largest
largest_image = hp.search("img").sort_by {|img| img["height"].to_i}[-1]

url = largest_image["src"]

Derived from Hpricot Challenge.


Unless it's in the mark-up you won't be able to access the file details through hpricot.

An alternative is to use hpricot to return you all src attributes to the images, then loop through requesting these, you can parse the response as an image and access the properties on these actual image files.


As hemal said the only possible way is if the image sizes are listed in the image tag's attributes. But if they are it's easy to read them. All tags' attributes are available through their relevant hash key. For example:

doc = Hpricot("<img src='foo.jpg' width=200 height=200 /><img src='bar.jpg' width=100 height=100 />")

doc.search("//img").each do |image|
  puts "#{image[:src]} => #{image[:width]}x#{image[:height]}"
end

This should result in:

foo.jpg => 200x200
bar.jpg => 100x100
0

精彩评论

暂无评论...
验证码 换一张
取 消