开发者

Is there a .NET encoding type that will return every byte in the underlying file as a character with the same ordinal value?

开发者 https://www.devze.com 2023-03-19 10:32 出处:网络
This is my question: Is there a .NET Encoding object/type that will decode every byte in a file to a character with the exact same ordinal value as the one in the file, basically do a 1-to-2 mapping

This is my question:

Is there a .NET Encoding object/type that will decode every byte in a file to a character with the exact same ordinal value as the one in the file, basically do a 1-to-2 mapping between a byte in the file an the character ordinal value?

More details

I'm reading text data, which contains some binary values, ie. an integer encoded as 4 bytes. The data has to be read through a TextReader-class because I'm getting it from an external programs standard output. The data I get back is sometimes mangled, due to encoding issues. Basically, the .NET streams are decoding the data from the external program and sometimes switches out a character so that whatever byte/character ordinal value the external program output is not the same as the one I read in .NET.

Background information

I am communicating with an external program, Mercurial, over standard input/output, and for some reason they decided to output some data as binary.

The protocol looks like this:

<type:single-byte char><length:32-bit integer><data:string>

The type is a single-byte character that just tells me whether this is error output, standard output, or the result of executing the command.

The length is a 32-bit integer, output as 4 bytes on the stream.

The data is a string, consisting of a sequence of bytes of the aforementioned length, but these characters can be encoded with the default encoding of Mercurial.

For instance, if I ask Mercurial to use codepage 1252 (standard Windows) encoding, then the string will be encoded in that encoding.

However, and here's the problem: the length will not be, of course

If I configure the .NET Process object to use Windows-1252 as the encoding for the StandardOutput stream, like this:

psi.StandardOutputEncoding = Encoding.GetEncoding("Windows-1252");
psi.StandardErrorEncoding = Encoding.GetEncoding("Windows-1252");

Then at some point the decoding of data from the client gets out of sync, because one of the binary length values ended up being decoded and thus has a different ordinal value than the byte from the file.

My current example contained the euro-character at some point (as a printable character), however the byte from the file did not have the value 172 which was that of the printable character. Some decoding had taken place.

However, let's say I have a file containing every byte value possible.

Then I open the file up through a one of the TextReader descendants, and specify an encoding.

Is there any encoding that will let me use the TextReader.Read() method and read every byte from that file, unchanged?

Basically, my decoding loop looks like this:

read one byte, convert to character
if character is 'r', 'e' or 'o':
    read next 4 bytes, assemble to integer
    read next X bytes (x=integer above)
    decode the byt开发者_如何学Ces to a string using the encoding specified

However, I tried this and it tripped when the length contained the euro-character (as a printable character.) Apparently that character had one byte value in the file, but was decoded as another.

So to sum up:

Is there a .NET Encoding object/type that will decode every byte in a file to a character with the exact same ordinal value as the one in the file, basically "no encoding"?


The correct encoding to use is "iso-8859-1", it decodes every byte to the same character ordinal. Apparently, it is also the only such encoding present in .NET (at least on my machine.) that has that capability/feature.

I wrote a LINQPad test-program to figure this out:

void Main()
{
    byte[] buffer = new byte[256];
    for (int index = 0; index < 256; index++)
        buffer[index] = (byte)index;

    foreach (var encodingInfo in Encoding.GetEncodings())
    {
        string s = encodingInfo.GetEncoding().GetString(buffer);
        var stream = new MemoryStream(buffer);
        var reader = new StreamReader(stream, encodingInfo.GetEncoding());
        bool equal = true;
        for (int index = 0; index < 256; index++)
            if (reader.Read() != index)
            {
                equal = false;
                break;
            }
        if (equal)
            Debug.WriteLine(encodingInfo.Name);
    }
}
0

精彩评论

暂无评论...
验证码 换一张
取 消