开发者

How to get a Ruby substring of a Unicode string?

开发者 https://www.devze.com 2023-01-05 17:13 出处:网络
I have a field in my Rails model that has max length 255. I\'m importing data into it, and some times the imported data has a length > 255. I\'m willing to simply chop it off so that I end up with th

I have a field in my Rails model that has max length 255.

I'm importing data into it, and some times the imported data has a length > 255. I'm willing to simply chop it off so that I end up with the largest possible valid string that fits.

I originally tried to do field[0,255] in order to get this, but this will actually chop trailing Unicode right through a character. When I then go to save this into the database, it throws an error telling me I have an invalid character due to the character that's been halved or quartered.

What's the recommended way to chop off Unic开发者_开发问答ode characters to get them to fit in my space, without chopping up individual characters?


Uh. Seems like truncate and friends like to play with chars, but not their little cousins bytes. Here's a quick answer for your problem, but I don't know if there's a more straighforward and elegant question I mean answer

def truncate_bytes(string, size)
  count = 0
  string.chars.take_while{|c| (a += c.bytes.to_a.length) <= size }.join
end

Give a look at the Chars class of ActiveSupport.


Use the multibyte proxy method (mb_chars) before manipulating the string:

str.mb_chars[0,255]

See http://api.rubyonrails.org/classes/String.html#method-i-mb_chars.

Note that until Rails 2.1 the method was "chars".

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号