When using Sanitizer.GetSafeHtmlFragment from Microsoft's AntiXSSLibrary 4.0, I noticed it changes my HTML fragment from:
<pre class="brush: csharp">
</pre>
to:
<pre class="x_brush: x_csharp">
</pre>
Sadly their API doesn't allow us to disable this behavior. Therefore I'd like to use a regular expr开发者_如何转开发ession (C#) to fix and replace strings like "x_anything" to "anything", that occur inside a class="" attribute.
Can anyone help me with the RegEx to do this?
Thanks
UPDATE - this worked for me:
private string FixGetSafeHtmlFragment(string html)
{
string input = html;
Match match = Regex.Match(input, "class=\"(x_).+\"", RegexOptions.IgnoreCase);
if (match.Success)
{
string key = match.Groups[1].Value;
return input.Replace(key, "");
}
return html;
}
Im not 100% sure about the C# @(Verbatim symbol) but I think this should match x_
inside of any class=""
and replace it with an empty string:
string input = 'class="x_something"';
Match match = Regex.Match(input, @'class="(x_).+"',
RegexOptions.IgnoreCase);
if (match.Success)
{
string key = match.Groups[1].Value;
string v = input.Replace(key,"");
}
It's been over a year since this has been posted but here's some regex you can use that will remove up to three class instances. I'm sure there's a cleaner way but it gets the job done.
VB.Net Code:
Regex.Replace(myHtml, "(<\w+\b[^>]*?\b)(class="")x[_]([a-zA-Z]*)( )?(?:x[_])?([a-zA-Z]*)?( )?(?:x[_])?([^""]*"")", "$1$2$3$4$5$6$7")
精彩评论