I know a little bit of regex, but not mutch. What is the best way to get just the number out of the following html. (I want to have 32 returned). the values of width,row span, and size are all different in this horrible html page. Any help?
<td width=14 rowspan=2 align=right><font size=2 face="helvetica">32</font></td>
开发者_开发知识库
How about
>(\d+)<
Or, if you desperately want to avoid using capturing groups at all:
(?<=>)\d+(?=<)
Please, do yourself a favor:
#!/usr/bin/env ruby
require 'nokogiri'
require 'test/unit'
class TestExtraction < Test::Unit::TestCase
def test_that_it_extracts_the_number_correctly
doc = Nokogiri::HTML('<td width=14 rowspan=2 align=right><font size=2 face="helvetica">32</font></td>')
assert_equal [32], (doc / '//td/font').map {|el| el.text.to_i }
end
end
May be
<td[^>]*><font[^>]*>\d+</font></td>
精彩评论