Java Unicode strings sorting_问答_开发者_运维开发者技术经验分享

开发者 https://www.devze.com 2023-02-19 13:12 出处：网络

In Java, how does Unicode strings get c开发者_运维百科ompared? What I mean is, if I have a few say, Japanese strings, when I do the following:

In Java, how does Unicode strings get c开发者_运维百科ompared?

What I mean is, if I have a few say, Japanese strings, when I do the following:

java.util.Arrays.sort(arrayOfJapaneseStrings);

how does those strings get compared and sorted?

By default, Strings sort lexicographically, by Unicode order. The order is by UTF-16, so might not be exactly what you want for certain characters, but Japanese characters are all in the BMP, so you shouldn't have a problem with these.

If you would like a different sort order, you can use the java.text.Collator classes to define a different sort order.

By default it's in UTF-16 byte-code comparison. This is the fastest way, and hence perfect if all you need is some order (e.g. if you are going to use a binary search later, you need them to be in order, but just what "in order" means doesn't matter, so the faster the better).

If you need an ordering that is sensible to a user in a given locale, use the java.text.Collator class.

According to compareTo methodof String class. See the javadoc:

Compares two strings lexicographically. The comparison is based on the Unicode value of each character in the strings. The character sequence represented by this String object is compared lexicographically to the character sequence represented by the argument string. The result is a negative integer if this String object lexicographically precedes the argument string. The result is a positive integer if this String object lexicographically follows the argument string. The result is zero if the strings are equal; compareTo returns 0 exactly when the {@link #equals(Object)} method would return true.