开发者

Problem with iconv

开发者 https://www.devze.com 2022-12-24 14:26 出处:网络
If you are on Mac OS X 10.6, and you are familiar with character encoding AND the terminal please do this:

If you are on Mac OS X 10.6, and you are familiar with character encoding AND the terminal please do this:

开发者_JS百科

Open a terminal and type the following commands:

echo sørensen > test.txt iconv -f UTF8 -t ISO-8859-1 test.txt

You will see the output: "sørensen". Can somebody explain what is going on?


UTF-8 is multibyte encoding. Character ø is encoded by two bytes: C3-B8 . In encoding of your terminal (ISO-8859-1) this bytes are decoded as ø . Then you convert those bytes to ISO-8859-1's code of ø. Any questions?


I tried the "iconv" command from one file to another, looking at the data with "od -txC" with the following results:

Input:  c3  83  c2  b8         [ 2 utf8-chars Capital A tilde; Cedilla ]

Command: iconv -f utf-8 -t ISO-8859-1 < in.txt > out.txt

Output:  c3  b8    [ 2 ISO-8859-1 characters, Capital A tilde; Cedilla ]

So, the iconv conversion is correct.

But, if you instead treat the converted data as utf-8 (which Terminal is apparently doing), C3-B8 is "ø" (o-slash).

If you change your character encoding in Terminal (Preferences // Advanced // Character Encoding) to "Western (ISO Latin 1)" you'll see C3-B8 as "ø"

0

精彩评论

暂无评论...
验证码 换一张
取 消