开发者

detect cjk characters in a sentence using ruby!

开发者 https://www.devze.com 2023-01-20 21:16 出处:网络
$str = \"This is a string containing 中文 characters. Some more characters - 中华人民共和国 \";
$str = "This is a string containing 中文 characters. Some more characters - 中华人民共和国 ";

How do I detect if there are chinese characters in this string but i have no idea how to do it.Any clu开发者_StackOverflow中文版es?


Regular expressions are not the way to go here. You should rather have code similar to the following (disclaimer: I'm not a Ruby programmer):

# coding: utf-8
str = "This is a string containing 中文 characters. Some more characters - 中华人民共和国 ";

str.each_char { |c|
  if c.ord >= 0x4E00 && c.ord <= 0x9FFF
    # found a chinese character - process it somehow.
    puts c
  end   
}

You're esentially checking for if the character is in the range of common Chinese characters in Unicode. This is not the complete range of hanzi (Chinese characters). If you need to detect rare or historic characters you'll simply have to add the ranges listed here to the boolean check.

0

精彩评论

暂无评论...
验证码 换一张
取 消