I currently have an extension method from removing any HTML from strings.
Regex.Replace(s, @"<(.|\n)*?>", string.Empty);
This works fine on the whole, however, I am occasionally getting passed strings that have both standard HTML markup within them, along with encoded markup (I don't have control of the source data so can't c开发者_运维知识库orrect things at the point of entry), e.g.
<p><p>Sample text</p></p>
I need an expression that will remove both encoded and non-encoded HTML (whether it be paragraph tags, anchor tags, formatting tags etc.) from a string.
I think you can do that in two passes with your same Extension method.
First Replace the usual un-encoded tags then Decode the returned string and do it again. Simple
精彩评论