开发者

Sanitize gem doesn't like colon inside href attribute

开发者 https://www.devze.com 2023-03-05 12:37 出处:网络
Using the Sanitize gem, I\'m cleaning some HTML. In the href attribute of my anchor tags, I wish to parse the following:

Using the Sanitize gem, I'm cleaning some HTML. In the href attribute of my anchor tags, I wish to parse the following:

<a href="#fn:1">1</a>

This is required for implementing footnotes using the Kramdown gem.

However, Sanitize doesn't appear to like the colon inside the href attribute. It simply outputs <a>1</a> instead, skipping the href attribute altogether.

My sanitize code looks like this:

# Setup whitelist of html elements, attributes, and protocols th开发者_运维技巧at are allowed.
allowed_elements = ['h2', 'a', 'img', 'p', 'ul', 'ol', 'li', 'strong', 'em', 'cite', 
  'blockquote', 'code', 'pre', 'dl', 'dt', 'dd', 'br', 'hr', 'sup', 'div']
allowed_attributes = {'a' => ['href', 'rel', 'rev'], 'img' => ['src', 'alt'], 
  'sup' => ['id'], 'div' => ['class'], 'li' => ['id']}
allowed_protocols = {'a' => {'href' => ['http', 'https', 'mailto', :relative]}}

# Clean text of any unwanted html tags.
html = Sanitize.clean(html, :elements => allowed_elements, :attributes => allowed_attributes, 
  :protocols => allowed_protocols)

Is there a way to get Sanitize to accept a colon in the href attribute?


This is Sanitize doing the safest thing by default. It assumes that the portion of the URL before the : is a protocol (or a scheme in the terminology of RFC 1738), and since #fn isn't in the protocol whitelist, the entire href attribute is removed.

You can allow URLs like this by adding #fn to the protocol whitelist:

allowed_protocols = {'a' => {'href' => ['#fn', 'http', 'https', 'mailto', :relative]}}
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号