开发者

Nokogiri unescaped html

开发者 https://www.devze.com 2023-03-23 10:20 出处:网络
I am parsing HTML text using nokogiri and making some changes to that HTML. doc = Nokogiri::HTML.parse(html_code)

I am parsing HTML text using nokogiri and making some changes to that HTML.

doc = Nokogiri::HTML.parse(html_code)

But i am using mustache with that html so the html contains mustache variables which are in enclosed in curly braces e.g.{{mustache_variable}}.

After tinkering with the nokogiri document, when i do

doc.to_html

These curly braces are escaped and i get something like %7B%7Bmustache_variable%7D%7D


But, not all of the content is escaped, e.g. if i have html as

<label> {{mustache_variable}} </label>

It returns, <label> {{mustache_variable}} </label>


But for html like, <img src='{{mustache_variable}}'>

It returns, <img src='%7B%7Bmustache_variable%7D%7D'>

So, i am currentl开发者_Go百科y doing a gsub to replace %7B and %7D with { and } respectively so mustache works.

So, is there a way i can get the exact html from nokogiri or a better solution ???


Probably you need cgi module

require 'cgi'
doc = Nokogiri::HTML.parse(html_code)
CGI.unescapeHTML(doc.to_html)

or you can use htmlentities lib.

And try to use doc.content instead of doc.to_html


I ran into this same problem and ended up using a regular expression to convert the escaped double braces:

html_doc.gsub(/%7B%7B(.+?)%7D%7D/, '{{\1}}')

To make this safer, I'd recommend prefixing each mustache variable with a namespace, just in case some of the HTML does have the escaped double brace pattern intentionally, e.g.

html_doc.gsub(/%7B%7Bnamespace(.+?)%7D%7D/, '{{namespace\1}}')
0

精彩评论

暂无评论...
验证码 换一张
取 消