开发者

How do I extract urls from hyperlinks using hpricot?

开发者 https://www.devze.com 2023-03-14 03:13 出处:网络
I\'d like to get the actual url strings from the hyperlinks. I\'d like my result to be stripped of html.

I'd like to get the actual url strings from the hyperlinks. I'd like my result to be stripped of html.

So, if one of my input string开发者_运维技巧s is

<a href="http://target.com/resource.tar.gz">resource</a>

I'd like to get:

http://target.com/resource.tar.gz

How can I do this?


In Hpricot you access attributes of an element using square brackets (like you would when accessing elements in a Hash). So, to use your example:

doc = Hpricot('<a href="http://target.com/resource.tar.gz">resource</a>')

puts doc.at('a')['href']  # => http://target.com/resource.tar.gz
0

精彩评论

暂无评论...
验证码 换一张
取 消