Coming from the land of Perl, I can do something like the following to test the membership of a开发者_StackOverflow string in a particular unicode block:
# test if string has any katakana script characters
my $japanese = "カタカナ";
if ($japanese =~ /\p{InKatakana}/) {
print "string has katakana"
}
I've read that Python does not support unicode blocks (true?) - so what's the best way to impliment this manually? For example, the above unicode block range for {InKatakana} should be U+30A0…U+30FF. How can I test the unicode range in Python? Any other recommended solutions?
I would prefer not to go with an external wrapper like Ponyguruma to limit the number of dependencies for roll-out/maintenance.
>>> re.search(u'[\u30a0-\u30ff]', u'カタカナ')
<_sre.SRE_Match object at 0x7fa0dbb62578>
As Ignacio said, the re expression is very useful. Don't forget the import first. This search only finds full-width katakana.
import re
re.search(u'[\u30a0-\u30ff]', u'カタカナ')
Or you might already have a string on hand.
import re
x = "カタカナ"
re.search(u'[\u30a0-\u30ff]', x.decode('utf-8'))
精彩评论