开发者

How can I get Hpricot to play nice with HTML5?

开发者 https://www.devze.com 2022-12-30 09:54 出处:网络
I am using Hpricot to parse a theme 开发者_JS百科file. I have noticed, however, that if I feed a valid HTML5 document into Hpricot(), it auto-closes HTML5 tags (like <section>), and messes with

I am using Hpricot to parse a theme 开发者_JS百科file. I have noticed, however, that if I feed a valid HTML5 document into Hpricot(), it auto-closes HTML5 tags (like <section>), and messes with the DOCTYPE.

Are there any extensions to Hpricot, or perhaps a flag I need to set, that will allow HTML5 documents to be parsed correctly?


I know it kind of works around the direct question but I would suggest you try Nokogiri http://nokogiri.org/ as mentioned in some of the comments on your question post. I've had no issues with it parsing any HTML/XML like structured text, including HTML5.


I think Hpricot's to_original_html method is exactly what you're looking for.

From the docs, to_original_html

Attempts to preserve the original HTML of the document, only outputing new tags for elements which have changed.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号