Is there a native g开发者_如何学Pythonem (so it can be used for jruby 1.8.2) which implements UTF8 normalizers (NFD, NFKD, NFC, NFKC)?
Ruby v1.8 is really flaky on Unicode. I find v1.9 the minimal Ruby version for sane processing. Even then, the unicode_utils
gem for v1.9.1 for better is absolutely indispensable. It has things like full casemapping and normalization functions. You really do need it.
Unfortunately, it doesn’t include collation, so you can’t do alphabetic sorts in Ruby the way you can in Perl or languages with access to the ICU libraries. Collation is the hardest to get right so it is not surprising that it is missing. But it is critical because it underlies nearly everything we ever do with text. It’s not just about sorting; it’s about simple string comparisons. Most people don’t realize this.
I talk about Ruby’s Unicode support and what you can do to make your life easier there a little in my third OSCON talk from a couple weeks ago. I confess that I gave up on Ruby v1.8; it was just too too frustrating.
That’s not a knock against Ruby, because the same thing can be said for most languages today that aren’t the latest versions.
- You will not be happy with Ruby and Unicode unless you’re running v1.9.
- If you aren’t running Python v3 (and preferably v3.2 or probably v3.3) with a wide build, you will be unhappy in Python with Unicode.
- If you aren’t running Java v1.7, you will be unhappy in Java with Unicode — and maybe even then. :(
- If you aren’t running Perl v5.14 or better, you will be arguably unhappy in Perl with Unicode.
The situation with those four therefore is quite unlike the one(s) with PHP, Javascript, and Go. With those latter three languages, it doesn’t matter what version you run, because
- With the first two you will always be unhappy with their Unicode support. This is really really terrible because the people using them can almost never switch to a real language with real Unicode support. The niche is too special-purpose.
- Whereas with Go you will never be unhappy with its Unicode support — unless you’re in a hurry: the normalization module is very close to ready and be out already, while the collation module is being worked on but it really is a great deal harder.
Is there any possible way for you to use Ruby v1.9?
精彩评论