Character Encoding Issue_问答_开发者_运维开发者技术经验分享

开发者 https://www.devze.com 2023-01-21 11:09 出处：网络

I\'m usi开发者_如何学运维ng an API that processes my files and presents optimized output, but some special characters are not preserved, for example:

相关专题：encoding utf-8

I'm usi开发者_如何学运维ng an API that processes my files and presents optimized output, but some special characters are not preserved, for example:

Input: äöü

Output: Ã¤Ã¶Ã¼

How do I fix this? What encoding should I use?

Many thanks for your help!

It really depend what processing you are done to your data. But in general, one powerful technique is to convert it to UTF-8 by Iconv, for example, and pass it through ASCII-capable API or functions. In general, if those functions don't mess with data they don't understand as ASCII, then the UTF-8 is preserved -- that's a nice property of UTF-8.

I am not sure what language you're using, but things like this occur when there is a mismatch between the encoding of the content when entered and encoding of the content when read in.

So, you might want to specify exactly what encoding to read the data. You may have to play with the actual encoding you need to use

string.getBytes("UTF-8")
string.getBytes("UTF-16")
string.getBytes("UTF-16LE")
string.getBytes("UTF-16BE") 
etc...

Also, do some research about the system where this data is coming from. For example, web services from ASP.NET deliver the content as UTF-16LE, but Java uses UTF-16BE encoding. When these two system talk to each other with extended characters, they might not understand each other exactly the same way.