开发者

Remove Encoded HTML from Strings using RegEx

开发者 https://www.devze.com 2023-02-10 20:37 出处:网络
I currently have an extension method from removing any HTML from strings. Regex.Replace(s, @\"<(.|\\n)*?>\", string.Empty);

I currently have an extension method from removing any HTML from strings.

Regex.Replace(s, @"<(.|\n)*?>", string.Empty);

This works fine on the whole, however, I am occasionally getting passed strings that have both standard HTML markup within them, along with encoded markup (I don't have control of the source data so can't c开发者_运维知识库orrect things at the point of entry), e.g.

&lt;p&gt;<p>Sample text</p>&lt;/p&gt;

I need an expression that will remove both encoded and non-encoded HTML (whether it be paragraph tags, anchor tags, formatting tags etc.) from a string.


I think you can do that in two passes with your same Extension method.

First Replace the usual un-encoded tags then Decode the returned string and do it again. Simple

0

精彩评论

暂无评论...
验证码 换一张
取 消