开发者

Why we use flush parameter with Encoder.GetBytes method

开发者 https://www.devze.com 2023-01-19 03:32 出处:网络
This lin开发者_JAVA技巧k explains the Encoder.GetBytes Method and there is a bool parameter called flush explained too . The explanation of flush is :

This lin开发者_JAVA技巧k explains the Encoder.GetBytes Method and there is a bool parameter called flush explained too . The explanation of flush is :

true if this encoder can flush its state at the end of the conversion; otherwise, false. To ensure correct termination of a sequence of blocks of encoded bytes, the last call to GetBytes can specify a value of true for flush.

but I didn't understand what flush does , maybe I am drunk or somthing :). can you explain it in more details please.


Suppose you receive data over a socket connection. You will receive a long text as several byte[] blocks.

It is possible that 1 Unicode character occupies 2+ bytes in a UTF-8 stream and that it is split over 2 byte blocks. Encoding the 2 byte blocks separately (and concatenating the strings) would produce an error.

So you can only specify flush=true on the last block. And of course, if you only have 1 block then that is also the last.

Tip: Use a TextReader and let it handle this problem(s) for you.

Edit

The mirror problem (that was actually asked: GetBytes) is slightly harder to explain.

Using flush=true is the same as using Encoder.Reset() after GetBytes(...). It clears the 'state' of the encoder,

including trailing characters at the end of the previous data block, such as an unmatched high surrogate

The basic idea is the same: when converting from string to blocks of bytes, or vice versa, the blocks are not independent.


Internally the Encoder would be implemented with a buffer - this buffer may need to be flushed (cleared) in order to end the read correctly or prepare the Encoder for the next read.

Here is one explanation of buffer flushing.

The exact usage of the flush parameter is described here:

true to clear the internal state of the encoder after the conversion; otherwise, false.


Flushing will reset the internal state of the encoder instance used to encode the text into bytes. Why does it need internal state, you ask? Well, to quote MSDN:

The flush parameter is useful for flushing a high-surrogate at the end of a stream that does not have a low-surrogate. For example, the Encoder created by UTF8Encoding.GetEncoder uses this parameter to determine whether to write out a dangling high-surrogate at the end of a character block.

If you're using multiple GetBytes(), hence, you would want to flush the internal state at the end to terminate any character sequences that need terminating, but only at the end, since terminating sequences might otherwise be introduced in the middle of words.

Note that this may be a purely theoretical problem these days. And, you'd be better off using higher-level wrappers anyway. If you do, being drunk will not be a problem.

0

精彩评论

暂无评论...
验证码 换一张
取 消