I have some code to dump strings to stdout to check their encoding, it looks like this:
private void DumpString(string s)
{
System.Console.Write("{0}: ", s);
foreach (byte b in s)
{
System.Console.Write("{0}({1}) ", (char)b, b.ToString("x2"));
}
System.Console.WriteLine();
}
Consider two strings, each of which appear as "ë", but with different encodings. DumpString will produce the following output:
ë: e(65)(08)
ë: ë(eb)
The code looks like this:
DumpString(string1);
DumpString(string2);
How can I convert string2, using the System.Text.Encoding, to be byte equivalen开发者_如何学Got to string1.
They don't have different encodings. Strings in C# are always UTF-16 (thus, you shouldn't use byte
to iterate over strings because you'll lose the top 8 bits). What they have is different normalization forms.
Your first string is "\u0065\u0308": LATIN SMALL LETTER E + COMBINING DIAERESIS. This is the decomposed form (NFD).
The second is "\u00EB": LATIN SMALL LETTER E WITH DIAERESIS. This is the precomposed form (NFC).
You can convert between them with string.Normalize
.
You're looking for the String.Normalize
method.
精彩评论