Is it possible to extract the IMG tag (or just the src attribute of an IMG tag) from a block of HTML in Ruby?
For example, if I have a block of HTML such as:
<p>Lorem ipsum dolor sit amet, labore et dolore magna aliqua.<img src="example.jpg" alt="" /> Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.&开发者_运维技巧lt;/p>
Could I extract just the IMG tag or src of that IMG tag via Regex or some other method?
Thanks in advance for any suggestions!
Using Nokogiri:
require 'nokogiri' # gem install nokogiri
doc = Nokogiri::HTML( my_html_string )
img_srcs = doc.css('img').map{ |i| i['src'] } # Array of strings
You can use this regular expression
html_str[/img.*?src="(.*?)"/i,1]
If you want a more advance html parser, I recommend nokogiri
Use Nokogiri to parse the HTML and search for img tags to extract the src attribute from.
There are many ways to do this. I prefer using the Nokogiri gem.
Before you get too far into this I suggest reading the following written by Jeff Atwood regarding parsing with Regex: Parsing Html The Cthulhu Way
精彩评论