开发者

Extract IMG tags in Ruby

开发者 https://www.devze.com 2023-03-01 09:33 出处:网络
Is it possible to extract the IMG tag (or just the src attribute of an IMG tag) from a block of HTML in Ruby?

Is it possible to extract the IMG tag (or just the src attribute of an IMG tag) from a block of HTML in Ruby?

For example, if I have a block of HTML such as:

<p>Lorem ipsum dolor sit amet, labore et dolore magna aliqua.<img src="example.jpg" alt="" /> Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.&开发者_运维技巧lt;/p>

Could I extract just the IMG tag or src of that IMG tag via Regex or some other method?

Thanks in advance for any suggestions!


Using Nokogiri:

require 'nokogiri' # gem install nokogiri
doc = Nokogiri::HTML( my_html_string )
img_srcs = doc.css('img').map{ |i| i['src'] } # Array of strings


You can use this regular expression

html_str[/img.*?src="(.*?)"/i,1]

If you want a more advance html parser, I recommend nokogiri


Use Nokogiri to parse the HTML and search for img tags to extract the src attribute from.


There are many ways to do this. I prefer using the Nokogiri gem.

Before you get too far into this I suggest reading the following written by Jeff Atwood regarding parsing with Regex: Parsing Html The Cthulhu Way

0

精彩评论

暂无评论...
验证码 换一张
取 消