开发者

Xdocument - how to convert non-html safe characters

开发者 https://www.devze.com 2023-01-27 02:51 出处:网络
I have a \"title\" attribute inside elements of my UTF-8 xml, e.g. <tag title=\"This is some test w开发者_开发百科ith special chars §£\" />

I have a "title" attribute inside elements of my UTF-8 xml, e.g.

<tag title="This is some test w开发者_开发百科ith special chars §£" />

as I want the content of this attribute to be printed directly in an HTML page, I'm trying to have an output like:

<tag title="This is some test with special chars &#x00a7;&#x00a3;" />

The code fragment where I add there attribute looks like this:

new XElement( "tag",
    new XAttribute( "title" , title)
);

Characters such as & and " are escaped, but §£ are not - as they're valid utf-8 characters. What should I change?


UTF-8 characters are supported in HTML, if the page is declared as UTF-8.

You should always specify the encoding used for an HTML or XML page. If you don't, you risk that characters in your content are incorrectly interpreted. This is not just an issue of human readability, increasingly machines need to understand your data too. You should also check that you are not specifying different encodings in different places.

If the default encoding for the page is a character set with a smaller range, then it will not render all of the UTF-8 characters properly. However, if the document is declared as UTF-8 they should display fine.

Rather than replacing characters with entity references, you may need to explicitly declare the encoding of your page as UTF-8.

There are a variety of ways to do this:

  • <meta charset="UTF-8">
  • <meta http-equiv="Content-type" content="text/html;charset=UTF-8">
  • <?xml version="1.0" encoding="UTF-8"?>


May be you can manually decode those characters. I have used this before

 Dictionary<string, char> HTMLSymbolMap = new Dictionary<string, char>()
        {
            {"&#8211;",'–'},
            {"&#8212;",'—'},
            {"&#8216;",'‘'},
            {"&#8217;",'’'},
            {"&#8218;",'‚'},
            {"&#8220;",'“'},
            {"&#8221;",'”'},
            {"&#8226;",'•'},
            {"&#183;",'·'},
            {"&#8222;",'„'},                
            {"&#163;",'£'},
            {"&#167;",'§'},

        };

   public string CleanJunk(string docText)
    {


        foreach (var kv in HTMLSymbolMap)
        {
            docText = docText.Replace(kv.value.tostring(), kv.key);
        }

        return docText;

    }

Refer this HTMLSymbol table for more info

0

精彩评论

暂无评论...
验证码 换一张
取 消