开发者

Getting portion of href attribute using hpricot

开发者 https://www.devze.com 2023-01-23 07:49 出处:网络
I think I need a combo of hpricot and regex here. I need to search for \'a\' tags with an \'href\' attribute that starts with \'abc/\', and returns the text following that until the next forward slash

I think I need a combo of hpricot and regex here. I need to search for 'a' tags with an 'href' attribute that starts with 'abc/', and returns the text following that until the next forward slash '/'.

So, given:

<a href="/abc/12345/xyz123/">One</a>
<a href="/abc/67890/xyzabc/">Two</a>

I need to get back: '12345' and '67开发者_开发知识库890'

Can anyone lend a hand? I've been struggling with this.


You don't need regex but you can use it. Here's two examples, one with regex and the other without, using Nokogiri, which should be compatible with Hpricot for your use, and uses CSS accessors:

require 'nokogiri'

html = %q[
  <a href="/abc/12345/xyz123/">One</a>
  <a href="/abc/67890/xyzabc/">Two</a>
]

doc = Nokogiri::HTML(html)
doc.css('a[@href]').map{ |h| h['href'][/(\d+)/, 1] } # => ["12345", "67890"]
doc.css('a[@href]').map{ |h| h['href'].split('/')[2] } # => ["12345", "67890"]


or use regex:

s = '<a href="/abc/12345/xyz123/">One</a>'
s =~ /abc\/([^\/]*)/
return $1


What about splitting the string by /?

(I don't know Hpricot, but according to the docs):

doc.search("a[@href]").each do |a|
    return a.somemethodtogettheattribute("href").split("/")[2]; // 2, because the string starts with '/'
end
0

精彩评论

暂无评论...
验证码 换一张
取 消