Nokogiri unescaped html_问答_开发者_运维开发者技术经验分享

开发者 https://www.devze.com 2023-03-23 10:20 出处：网络

I am parsing HTML text using nokogiri and making some changes to that HTML. doc = Nokogiri::HTML.parse(html_code)

I am parsing HTML text using nokogiri and making some changes to that HTML.

doc = Nokogiri::HTML.parse(html_code)

But i am using mustache with that html so the html contains mustache variables which are in enclosed in curly braces e.g.{{mustache_variable}}.

After tinkering with the nokogiri document, when i do

doc.to_html

These curly braces are escaped and i get something like %7B%7Bmustache_variable%7D%7D

But, not all of the content is escaped, e.g. if i have html as

<label> {{mustache_variable}} </label>

It returns, <label> {{mustache_variable}} </label>

But for html like, <img src='{{mustache_variable}}'>

It returns, <img src='%7B%7Bmustache_variable%7D%7D'>

So, i am currentl开发者_Go百科y doing a gsub to replace %7B and %7D with { and } respectively so mustache works.

So, is there a way i can get the exact html from nokogiri or a better solution ???

Probably you need cgi module

require 'cgi'
doc = Nokogiri::HTML.parse(html_code)
CGI.unescapeHTML(doc.to_html)

or you can use htmlentities lib.

And try to use doc.content instead of doc.to_html

I ran into this same problem and ended up using a regular expression to convert the escaped double braces:

html_doc.gsub(/%7B%7B(.+?)%7D%7D/, '{{\1}}')

To make this safer, I'd recommend prefixing each mustache variable with a namespace, just in case some of the HTML does have the escaped double brace pattern intentionally, e.g.

html_doc.gsub(/%7B%7Bnamespace(.+?)%7D%7D/, '{{namespace\1}}')

Nokogiri unescaped html

精彩评论

关注公众号

热门标签

图文推荐

Nokogiri unescaped html

更多 问答 相关资讯：

精彩评论

关注公众号

热门标签

图文推荐

更多问答相关资讯：