开发者

smarter character replacement using ruby gsub and regexp

开发者 https://www.devze.com 2022-12-27 19:04 出处:网络
I\'m trying to create permalink like behavior for some article titles and i don\'t want to add a new db field for permalink. So i decided to write a helper that will convert my article title from:

I'm trying to create permalink like behavior for some article titles and i don't want to add a new db field for permalink. So i decided to write a helper that will convert my article title from:

"O "focoasă" a pornit cruciada, împotriva bărbaţilor zgârciţi" to "o-focoasa-a-pornit-cruciada-impotriva-开发者_开发百科barbatilor-zgarciti".

While i figured out how to replace spaces with hyphens and remove other special characters (other than -) using:

title.gsub(/\s/, "-").gsub(/[^\w-]/, '').downcase

I am wondering if there is any other way to replace a character with a specific other character from only one .gsub method call, so I won't have to chain title.gsub("ă", "a") methods for all the UTF-8 special characters of my localization.

I was thinking of building a hash with all the special characters and their counterparts but I haven't figured out yet how to use variables with regexps.

What I was looking for is something like:

title.gsub(/\s/, "-").gsub(*replace character goes here*).gsub(/[^\w-]/, '').downcase

Thanks!


I solved this in my application by using the Unidecoder gem:

require 'unidecode'

def uninternationalize(str)
  Unidecoder.decode(str).gsub("[?]", "").gsub(/`/, "'").strip
end


If you want to only transliterate from one character to another, you can use the String#tr method which does exactly the same thing as the Unix tr command: replace every character in the first list with the character in the same position in the second list:

'Ünicöde'.tr('ÄäÖöÜüß', 'AaOoUus') # => "Unicode"

However, I agree with @Daniel Vandersluis: it would probably be a good idea to use some more specialized library. Stuff like this can get really tedious, really fast. Also, a lot of those characters actually have standardized transliterations (ä → ae, ö → oe, ..., ß → ss), and users may be expecting to have the transliterations be correct (I certainly don't like being called Jorg – if you really must, you may call me Joerg but I very much prefer Jörg) and if you have a library that provides you with those transliterations, why not use them? Note that there are a lot of transliterations which are not single characters and thus can't be used with String#tr anyway.

0

精彩评论

暂无评论...
验证码 换一张
取 消