开发者

Java sort strings in codepoint (UTF-32) order

开发者 https://www.devze.com 2022-12-18 00:56 出处:网络
Other than to convert to UTF-8 bytes, or write a comp开发者_开发知识库are function that iterates and compares, is there some method I\'m missing in JDK 1.6 that compares two strings in full Unicode co

Other than to convert to UTF-8 bytes, or write a comp开发者_开发知识库are function that iterates and compares, is there some method I'm missing in JDK 1.6 that compares two strings in full Unicode codepoint order instead of in UCS-2 codepoint order?

I appreciate that this is not a hard thing to code. I was puzzled, however, that 1.6 has the various 'codepoint' APIs in java.lang.String as well as the Collation system, but apparently nothing to simply compare two strings without hiccuping on the surrogates.

For the benefit of a commenter, I have to feed some data to a tool that wants the strings in this order.


AFAIk, the API has no such method, but it should be trivial to implement it yourself. Just out of curiosity: What do you need something like that for?


For the sake of completeness her my solution to the problem. Maybe there is a better solution:

   String sortedText = text
      .codePoints()
      .sorted()
      .mapToObj(i -> String.valueOf(Character.toChars(i)))
      .collect(Collectors.joining(""));
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号