Using regular expressions, how do I remove style tags, CSS, scrip开发者_开发问答ts and HTML tags from HTML to plain text.
In ASP.NET C#.
I don't think you are looking for a regex to do this, however the following regex should do it, if you run a regex replace:
<[^>]*>
To use this in a Regex Replace to the following:
string myHtmlString = "<html><body>my test text</body></html>";
string myPlainTextString = Regex.Replace(myHtmlString ,"<[^>]*>",String.Empty);
I recommend you use something like the Html Agility pack though - http://htmlagilitypack.codeplex.com/
as it has a method to make this even easier called "ConvertToPlainText":
string myHtmlString = "<html><body>my test text</body></html>";
string myPlainTextString = ConvertToPlainText(myHtmlString);
精彩评论