开发者

How to judge a txt file Encoding [duplicate]

开发者 https://www.devze.com 2023-03-13 12:50 出处:网络
This question already has answers here:开发者_StackOverflow中文版 Closed 11 years ago. Possible Duplicate:
This question already has answers here: 开发者_StackOverflow中文版 Closed 11 years ago.

Possible Duplicate:

How can I detect the encoding/codepage of a text file

I've been developing a winform system. And need to read txt file.

Unfortunately, there are many txt encoded files. I can't read it use a specific encoding.

The problem is how to judge a txt file encoding.


See this answer here:

How can I detect the encoding/codepage of a text file

You can't detect the codepage, you need to be told it. You can analyse the bytes and guess it, but that can give some bizarre (sometimes amusing) results. I can't find it now, but I'm sure Notepad can be tricked into displaying English text in Chinese.

and the article it links to:

http://www.joelonsoftware.com/printerFriendly/articles/Unicode.html

The Single Most Important Fact About Encodings

If you completely forget everything I just explained, please remember one extremely important fact. It does not make sense to have a string without knowing what encoding it uses. You can no longer stick your head in the sand and pretend that "plain" text is ASCII. There Ain't No Such Thing As Plain Text.

If you have a string, in memory, in a file, or in an email message, you have to know what encoding it is in or you cannot interpret it or display it to users correctly.


In @Gens and @Samuel Neff cues, I solve the problem. Here is my code.

public static Encoding GetFileEncoding(string srcFile)
{
 // *** Use Default of Encoding.Default (Ansi CodePage)
            Encoding encoding = Encoding.Default;
            using (FileStream stream = File.OpenRead(fileName))
            {
                // *** Detect byte order mark if any - otherwise assume default
                byte[] buff = new byte[5];
                stream.Read(buff, 0, buff.Length);
                if (buff[0] == 0xEF && buff[1] == 0xBB && buff[2] == 0xBF)
                {
                    encoding = Encoding.UTF8;
                }
                else if (buff[0] == 0xFE && buff[1] == 0xFF)
                {
                    encoding = Encoding.BigEndianUnicode;
                }
                else if (buff[0] == 0xFF && buff[1] == 0xFE)
                {
                    encoding = Encoding.Unicode;
                }
                else if (buff[0] == 0 && buff[1] == 0 && buff[2] == 0xFE && buff[3] == 0xFF)
                {
                    encoding = Encoding.UTF32;
                }
                else if (buff[0] == 0x2B && buff[1] == 0x2F && buff[2] == 0x76)
                {
                    encoding = Encoding.UTF7;
                }
            }
            return encoding;
}
0

精彩评论

暂无评论...
验证码 换一张
取 消