Possible Duplicate:
How can I detect the encoding/codepage of a text file
I've been developing a winform system. And need to read txt file.
Unfortunately, there are many txt encoded files. I can't read it use a specific encoding.
The problem is how to judge a txt file encoding.
See this answer here:
How can I detect the encoding/codepage of a text file
You can't detect the codepage, you need to be told it. You can analyse the bytes and guess it, but that can give some bizarre (sometimes amusing) results. I can't find it now, but I'm sure Notepad can be tricked into displaying English text in Chinese.
and the article it links to:
http://www.joelonsoftware.com/printerFriendly/articles/Unicode.html
The Single Most Important Fact About Encodings
If you completely forget everything I just explained, please remember one extremely important fact. It does not make sense to have a string without knowing what encoding it uses. You can no longer stick your head in the sand and pretend that "plain" text is ASCII. There Ain't No Such Thing As Plain Text.
If you have a string, in memory, in a file, or in an email message, you have to know what encoding it is in or you cannot interpret it or display it to users correctly.
In @Gens and @Samuel Neff cues, I solve the problem. Here is my code.
public static Encoding GetFileEncoding(string srcFile)
{
// *** Use Default of Encoding.Default (Ansi CodePage)
Encoding encoding = Encoding.Default;
using (FileStream stream = File.OpenRead(fileName))
{
// *** Detect byte order mark if any - otherwise assume default
byte[] buff = new byte[5];
stream.Read(buff, 0, buff.Length);
if (buff[0] == 0xEF && buff[1] == 0xBB && buff[2] == 0xBF)
{
encoding = Encoding.UTF8;
}
else if (buff[0] == 0xFE && buff[1] == 0xFF)
{
encoding = Encoding.BigEndianUnicode;
}
else if (buff[0] == 0xFF && buff[1] == 0xFE)
{
encoding = Encoding.Unicode;
}
else if (buff[0] == 0 && buff[1] == 0 && buff[2] == 0xFE && buff[3] == 0xFF)
{
encoding = Encoding.UTF32;
}
else if (buff[0] == 0x2B && buff[1] == 0x2F && buff[2] == 0x76)
{
encoding = Encoding.UTF7;
}
}
return encoding;
}
精彩评论