开发者

How can I extract html escape chars/entities as text when scraping web? (ruby & nokogiri)

开发者 https://www.devze.com 2022-12-17 02:24 出处:网络
In my ruby+mechanize(nokogiri) script I use this piece of code: row.at_xpath(\'td[3]/div[1]/a/text()\').to_s.strip

In my ruby+mechanize(nokogiri) script I use this piece of code:

row.at_xpath('td[3]/div[1]/a/text()').to_s.strip

on a forum where the post title html looks like:

<a href="showthread.php?t=233891" >&lt;/body&gt; on Footer ?</a>

and I recei开发者_开发技巧ve from xpath this string &lt;/body&gt; on Footer ?

I would like to get what I can see in the web browser </body> on Footer ?

How can I do that for all html escape characters/entities?


Please take a look this post, to unescape htmlentities

or

There is a ruby package called htmlentities

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号