does any signifigant interchange take place in formats other than ascii/utf8? are there any fields where utf16xx and utf32xx are used heavily? i ask as a writer of multiple libraries that work on unicode text, and the burden of supporting all five major variants is quite high compared to the perce开发者_JS百科ived utility.
Windows and Java both treat Unicode as UTF-16 internally, and Python uses UTF-16 or UTF-32 depending on the platform. So more than just UTF-8 is important for these. These are just the cases I'm most familiar with, I'm sure there are others.
So, in my opinion, if you have a Unicode library, you should support UTF-16 and UTF-32. (I can't believe UTF-32 is too difficult, since there's no special processing involved besides byte ordering. Although, I'm not a Unicode library author :) )
One important point is XML: it can come in pretty much any encoding imaginable, but UTF-8 is by far the most common.
However, the XML spec says this:
All XML processors must accept the UTF-8 and UTF-16 encodings of Unicode
So if your application/library handles XML in any way it must support UTF-16 at least in that portion. Note that a conforming parser that converts the data to UTF-8 for processing would be enough here.
When it comes to interchange, I guess you are right that UTF-8 is prevalent. Some cases of using UTF-16 are various binary protocols such as DCOM, Java RMI and (maybe???) CORBA.
As for UTF-32 I've never heard of a case where it is used for interchange.
精彩评论