开发者

better alternative in letters substitution

开发者 https://www.devze.com 2022-12-18 08:56 出处:网络
开发者_Python百科Is there any better alternative to this? name.gsub(\'è\',\'e\').gsub(\'à\',\'a\').gsub(\'ò\',\'o\').gsub(\'ì\',\'i\').gsub(\'ù\',\'u\')
开发者_Python百科

Is there any better alternative to this?

name.gsub('è','e').gsub('à','a').gsub('ò','o').gsub('ì','i').gsub('ù','u')

thanks


Use tr.

Maybe like string.tr('èàòìù', 'eaoiu').


substitutes = {'è'=>'e', 'à'=>'a', 'ò'=>'o', 'ì'=>'i', 'ù'=>'u'}
substitutes.each do |old, new| 
    name.gsub!(old, new)
end

Or you could use an extension of String such as this one to do it for you.


If you really want a full solution, try pulling the tables from Perl's Unidecode module. After translating those tables to Ruby, you'll want to loop over each character of the input, substituting the table's value for that character.


Taking a wild stab in the dark, but if you're trying to remove the accented characters because you're using a legacy text encoding format you should look at Iconv.

An introduction which is great on the subject: http://blog.grayproductions.net/articles/encoding_conversion_with_iconv


In case you are wondering the technical terms for what you want to do is Case Folding and possibly Unicode Normalization (and sometimes collation).

Here is a case folding configuration for ThinkingSphinx to give you an idea of how many characters you need to worry about.


If JRuby is an option, see the answer to my question:

How do I detect unicode characters in a Java string?

It deals with removing accents from letters, using a Normalizer. You could access that class from JRuby.

0

精彩评论

暂无评论...
验证码 换一张
取 消