开发者

about mb string and normal string in PHP

开发者 https://www.devze.com 2023-01-02 04:33 出处:网络
How do I know the string is mb string? so we use mb_strlen instea开发者_如何学编程d of strlen ?You need to always know what encoding a string is in, and whether it is a multibyte one. After all, you n

How do I know the string is mb string? so we use mb_strlen instea开发者_如何学编程d of strlen ?


You need to always know what encoding a string is in, and whether it is a multibyte one. After all, you need to pass the string's encoding as the second parameter to mb_strlen() to get reliable results, right?

The encoding of incoming data will always be defined in some way - the page's encoding when processing form data; the database connection's and tables' encoding when processing database data; and so on. It is your job to build the flow in a way that you always know what is in what encoding where.

The only exception is when you're dealing with arbitrary third party data that don't declare their content's encoding properly. It is then (and only then) when it's okay to employ sniffing functions like mb-detect-encoding() and colleagues. Remember that those functions are very error-prone and can give you only an educated guess what encoding a string is in, not hard reliable info.


No. A string is a string. There is no way to tell if it contains multiple byte characters.

You can guess with something like mb_detect_encoding() but your mileage may vary depending on the charset and encoding. For example, UTF-8 has a very distinct pattern and you will get very good result. But other encodings like GB2312 are really hard to detect.

If you are designing a new protocol or system, it's best to keep the encoding information.


Compare the strlen and the mb_strlen results, and if they do not match, the string contains multibyte characters.


Isn't mb_check_encoding or mb_detect_encoding supposed to be used for that?

0

精彩评论

暂无评论...
验证码 换一张
取 消