开发者

Any idea how to enforce utf8 within a document

开发者 https://www.devze.com 2023-02-13 23:47 出处:网络
I am creating an xml document and atttempting to store at as utf8. However, i am receiving a non utf8apostrophe within the stored document.

I am creating an xml document and atttempting to store at as utf8. However, i am receiving a non utf8 apostrophe within the stored document.

eg : <Name=Dave t="Owner(e.g pete’s)">

I have tried the follwoing

`System.Text.UTF8Encoding encoding = new System.Text.UTF8Encoding();

var docX  = encoding.GetBytes(vdd.ToString());

System.IO.StreamWriter s = new StreamWriter(pathAndFileName, false, encoding);

string myString = encoding.GetString(docX);

s.Write(myString);

`开发者_如何学C

Which should have been overkill, but the '’' inside of the brackets is still showing. I have also tried htmlencode, which didn't help.

The xml reads fine as utf8 in notepad++, but the ’ character is not parsing on all of my clients systems.

Help please.....


EDIT: Dour noted something I missed in all the confusion; the sample you pasted is not XML at all, and therefore will not parse. My answer still applies insofar as 'html encoding' and UTF8 encoding were the wrong roads to be going down here.


It's difficult to tell exactly what your problem is, but I've tried to eliminate some of the possibilities and come up with a possibility: the is causing your XML not to be parsed correctly.

This is not an encoding problem. As The Skeet notes, UTF8 can represent all Unicode characters, including that one. Instead, this is an... umm... an encoding problem. That is: a XML data encoding problem.

The character should be attribute encoded, not html encoded

What API are you using to build the XML? That should be done for you, so you don't need to worry about what to encode, how, and why. But if you attribute encode the character, I think your problem will cease.

Assuming I understand your problem...


<Name=Dave t="Owner(e.g pete’s)">

This is not XML, the '=' is illegal for a tag name. If it's supposed to be an attribute it must be quoted. It's also unterminated and has no XML declaration; if this is what you're trying to output, you're not outputting XML. The ’ character is allowed both in UTF-8 and XML attribute values.

System.Text.UTF8Encoding encoding = new System.Text.UTF8Encoding();
var docX = encoding.GetBytes(vdd.ToString());

docX is a byte array of the UTF-8 bytes in vdd. If vdd contains any non-Unicode points they will be discarded.

System.IO.StreamWriter s = new StreamWriter(pathAndFileName, false, encoding);

You're opening a UTF-8-encoded output stream, fair enough...

string myString = encoding.GetString(docX);

Now you're converting your UTF-8-encoded array back into a C# string. Why?

s.Write(myString);

Now you're writing the C# string back to a UTF-8 stream, which does a second UTF-8 conversion. This makes no sense, please explain what you're trying to accomplish.

the ’ character is not parsing on all of my clients systems

Then your clients system is not accepting UTF-8. Either fix it, or find out what encoding they are accepting and use that.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号