I was wondering how do I find开发者_C百科 out how many bytes does a character have?
If you want to find out how many UTF-8 bytes a letter in a PHP string has then:
print strlen(mb_substr($string, 0, 1, "utf-8"));
strlen()
returns the raw byte length, while mb_substr()
returns a "character" according to the charset/encoding. In this example from position 0
.
- ASCII is 7 bits.
- Most other languages use 8 bits (1 byte).
- Many eastern languages (Chinese, Japanese) use 16 bits (2 bytes).
- Unicode is usually 32 bits (4 bytes).
How a character is stored and represented depends on the programming language and the platform you are using.
精彩评论