开发者

Memory wise, is storing a string as byte cheaper than its UTF equivalent?

开发者 https://www.devze.com 2023-02-14 17:27 出处:网络
If I store a string as a byte, does it use less memory than if it was stored in UTF-8? e.g.开发者_运维百科

If I store a string as a byte, does it use less memory than if it was stored in UTF-8?

e.g.

开发者_运维百科
string text = "Hello, World!";

Versus encoding it into a byte variable?


If you stored that in a byte array it would be more efficient than in a string, yes - because all of that text is ASCII, which would be encoded as a single byte per character. However, it's not universally true for all strings (some characters would take 2 bytes, some would take 3 - and for non-BMP characters it would take even more), and it's also a darned sight less convenient to work with in binary form...

I would stick with strings unless you had a really really good reason to keep them in memory as byte arrays.


UTF8 will only use 1 byte per char if you stick to 7bit ascii.

But internally .NET uses UCS-2 which uses 2 bytes per char IIRC, so yes, assuming you want to store it as UTF8 it will use less memory than just storing it as a string, assuming that you are storing western european languages (aka, latin1).


In the example you gave, UTF-8 encoding would save you some bytes insce you only use ASCII characters, but it does depend on the input string - some UTF8 encoded strings might actually be larger than the corresponding UTF-16 version.

//UTF-16 so 26 bytes
string text = "Hello, World!";

//UTF-8 length will be 13 (only ASCII chars used)
var bytesUTF8 = Encoding.UTF8.GetBytes(text);

//UTF-16 so 26 bytes
var bytesUTF16 = Encoding.Unicode.GetBytes(text);


Strings are arrays of characters, which in .NET are UTF-16 encoded. Each char thus needs an Int16 (twice the space) to store its value (characters in the upper half of the codepage use a second Char structure to hold the second pair of bytes).

If you're only dealing with ASCII, yes, you can put a string in a byte array that takes half the space as a char array and doesn't lose information. However, as Jon said, that's not a very convenient way to work with strings. You have 2 GIGABYTES of addressing space available for a single string. As bytes, yes you'd get 2 billion characters, but as strings you still get 1 BILLION characters in a single string. If you really need more than that in a single string I worry about what you think you need it for.

0

精彩评论

暂无评论...
验证码 换一张
取 消