开发者

ruby regex links not already in anchor tag

开发者 https://www.devze.com 2022-12-30 01:55 出处:网络
I am using ruby 1.8.7. I am not using rails. How do I find all the links which are not already in anchor tag.

I am using ruby 1.8.7. I am not using rails.

How do I find all the links which are not already in anchor tag.

s = %Q{ <a href='www.a.com'><b>www.a.com</b></a> www.b.com <div>www.c.com</div> }

The output of above string should be

www.b.com
www.c.com

I know "b" tag开发者_开发问答 before www.a.com complicates the case but that's what I have to work with.


You are going to want to use a real XML parser (Nokogiri will do). Regexes are unsuitable for a task like this. Especially so in ruby 1.8.7 where negative look behind is not supported.


Dirty way to get rid of anchor tags. Doesn't work the way you want if they're nested. Also use a real parser ;-)

s.gsub(%r[<a\b.*?</a>]i, "")
=> "  www.b.com <div>www.c.com</div> "
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号