Length of a unicode string_问答_开发者_运维开发者技术经验分享

开发者 https://www.devze.com 2023-01-13 23:42 出处：网络

In my Rails (2.3, Ruby 1.8.7) application, I need to truncate a string to a certain length. the string is unicode, and when running tests in console, such as \'א\'.length, I realized that a double le

In my Rails (2.3, Ruby 1.8.7) application, I need to truncate a string to a certain length. the string is unicode, and when running tests in console, such as 'א'.length, I realized that a double length is returned. I would like an encoding-agnostic length, so that the same truncation would be done for a unicode string or a latin1 e开发者_如何学Cncoded string.

I've gone over most of the unicode material for Ruby, but am still a little in the dark. How should this problem be tackled?

Rails has an mb_chars method which returns multibyte characters. Try unicode_string.mb_chars.slice(0,50)

"ア".size # 3 in 1.8, 1 in 1.9
puts "ア".scan(/./mu).size # 1 in both 1.8 and 1.9

chars and mb_chars don't give you text elements, which is what you seem to be looking for.

For text elements you'll want the unicode gem.

mb_chars:

>> 'กุ'.mb_chars.size
=> 2

>> 'กุ'.mb_chars.first.to_s
=> "ก"

text_elements:

>> Unicode.text_elements('กุ').size
=> 1

>> Unicode.text_elements('กุ').first
=> "กุ"

You can use something like str.chars.slice(0, 50).join to get the first 50 characters of a string, no matter how many bytes it uses per character.

Length of a unicode string

精彩评论

关注公众号

热门标签

图文推荐

Length of a unicode string

更多 问答 相关资讯：

精彩评论

关注公众号

热门标签

图文推荐

更多问答相关资讯：