PHP has a lot of trouble with multibyte strings (non-ASCII characters). The entire language was built assuming that each character is a byte. To solve this they invented the mb_strings functions which you can use instead of the standard functions (which work fine).
strlen($str);
mb_strlen($str); // correct
However, this is really a pain since you have to verify that the code you download/find online uses these functions or enable the mb_string_overload
which then might break some code that actually needs char = byte
calculatio开发者_Go百科ns.
Does Ruby share this problem?
It shares the problem. It's covered here at SO. You can use ActiveSupport::Multibyte
for mb_chars
support.
>> s = "Iñtërnâtiônàlizætiøn"
=> "Iñtërnâtiônàlizætiøn"
>> puts s[0..3]
Iñt
=> nil
>> puts s.mb_chars[0..3]
Iñtë
=> nil
>> puts s.mb_chars.size
20
=> nil
>> puts s.size
27
=> nil
I think Ruby 1.9 clears up this underlaying assumption
irb(main):002:0> 'ÿ'.length
=> 2
精彩评论