开发者

strip HTML and CSS in C#

开发者 https://www.devze.com 2023-02-21 16:45 出处:网络
I\'m creating mails in one of my solutions and need to provide both html an开发者_Go百科d plaintext mails from a given html page.

I'm creating mails in one of my solutions and need to provide both html an开发者_Go百科d plaintext mails from a given html page.

However, I haven't found any real good way to strip html, js and css from whatever html template the customers might provide.

Are there any simple solution to this, perhaps a component that handle all this or do I need to start puzzle with regexp? And is it even possible to create a bulletproof regexp for all possible tags?

Regards


Give HtmlAgilityPack a go. It has methods for extracting the text out of an HTML Document.

You basically just need to do the following:

  var doc = new HtmlDocument();
  doc.LoadHtml(htmlStr);
  var node = doc.DocumentNode;
  var textContent = node.InnerText;


As a component that can strip html: Html Agility Pack


Take a look here: HTMLAgilityPack parse in the InnerHTML. There is an answer how to do it using Html Agility Pack


You might find the Html Agility Pack helpful to your situation.


In this page you can find a really fast algorithm to strip HTML from a string input. Although there are some issues with invalid HTML, it's still a great resource. http://www.dotnetperls.com/remove-html-tags

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号