开发者

How do I filter CDATA out and only get the text from HTML?

开发者 https://www.devze.com 2023-01-12 00:12 出处:网络
I want to parse a HTML file using Nokogiri. I am able to do that but I o开发者_如何学Cnly want text and no CDATA or JavaScript, since my script and div tags are all over the file.You can delete all sc

I want to parse a HTML file using Nokogiri. I am able to do that but I o开发者_如何学Cnly want text and no CDATA or JavaScript, since my script and div tags are all over the file.


You can delete all script elements,

doc.search('script').remove

… and then select all text elements

doc.xpath('//text()') 

… or just select the text elements within div elements

doc.xpath('//div//text()') 
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号