开发者

Extract snippet out of HTML with Ruby?

开发者 https://www.devze.com 2023-01-10 08:08 出处:网络
I need to show the first 100 characters of an HTML text, which means, I have to pick the first 100 characters that are not tags and then close any open tags leaving a balanced HTML. Is there any l开发

I need to show the first 100 characters of an HTML text, which means, I have to pick the first 100 characters that are not tags and then close any open tags leaving a balanced HTML. Is there any l开发者_运维知识库ibrary that can do it? Or is there any trivial way to do it that I am missing?

The text is originally written in Textile which can and does contain HTML, so I figured I am better off turning it to fully HTML first and then processing, but if something can do it at the Textile level, I'm happy too.


This is how I would get the first 100 chars of text. You may need to modify according to your needs

require 'nokogiri'

def get_first_100_chars
 doc = Nokogiri::Slop(open 'html_file.html')
 text = doc.html.body.text
 return text[0..99]
end

Not sure about balancing the html. Will post if I find out.


Have a look at Nokogiri

0

精彩评论

暂无评论...
验证码 换一张
取 消