开发者

wprintf UTF16 (should be UTF8) on Linux?

开发者 https://www.devze.com 2023-04-11 12:52 出处:网络
1 It\'s really strange that wprintf show \'Ω\' as 3A9 (UTF16), but wctomb convert wchar to CEA9 (UTF8), my locale is default en_US.utf8. As man-pages said,

1 It's really strange that wprintf show 'Ω' as 3A9 (UTF16), but wctomb convert wchar to CEA9 (UTF8), my locale is default en_US.utf8. As man-pages said, they should comform to my locale, but wpritnf use UTF16, why?

excerpt from http://www.fileformat.info/info/unicode/char/3a9/index.htm

Ω in UTF

UTF-8 (hex) 0xCE 0xA9 (cea9)

UTF-16 (hex) 0x03A9 (03a9)

2 wprintf and printf just cannot be run in the same program, I have to choose to use either wprintf or printf, why?


See my program:

#include <stdio.h>
#include <wchar.h>
#include <stdlib.h>
#include <locale.h>

int main() {
  setlocale(LC_ALL,""); // inherit locale setting开发者_运维问答 from environment
  int r;
  char wc_char[4] = {0,0,0,0};
  wchar_t myChar1 = L'Ω'; //greek 

  // should comment out either wprintf or printf, they don't run together
  r = wprintf(L"char is %lc (%x)\n", myChar1, myChar1);//On Linux, to UTF16

  r = wctomb(wc_char, myChar1); // On Linux, to UTF8
  r = printf("r:%d, %x, %x, %x, %x\n", r, wc_char[0], wc_char[1], wc_char[2], wc_char[3]);
}


The answer to your second question has to do with stream orientation. You cannot mix printf() and wprintf() because they require different orientations.

When the process starts, the streams are not set yet. On the first call to a function that uses the stream, it gets set accordingly. printf() will set the orientation to normal, and wprintf() will set it to wide.

It is undefined behavior to call a function that requires a different orientation as the current setting.


How exactly are you determining what the wprintf line is printing? Your comment below the question seems to imply that you're just examining the results of wprintf ("%x", myChar1);, which prints the internal numeric value of myChar1 regardless of character encoding (but not regardless of character set — there's a difference); assuming that your compiler uses Unicode for wchar_ts internally (a pretty safe bet, I believe), this simply prints out the Unicode codepoint for 'Ω', which is 0x3a9, independently of UTF-16 vs. UTF-8 distinctions. In order to tell whether wprintf is printing UTF-16, you have to directly examine the raw bytes that are output (e.g., with hexdump(1)). For example, on my computer, the wprintf line prints the following:

63 68 61 72 20 69 73 20 ce a9 20 28 33 61 39 29 0a
c  h  a  r     i  s     Ω        (  3  a  9  )  \n

Note that the omega is encoded in UTF-8 as the bytes CE A9, but the numeric value of the wchar_t is still 3A9.


Ahh, I may have found it. You need to execute

setlocale(LC_ALL, "")

first. It looks like the wchar I/O functions are not honoring the LC_ environment variables.

See http://littletux.homelinux.org/knowhow.php?article=charsets/ar01s08 for more background.

0

精彩评论

暂无评论...
验证码 换一张
取 消