printf("%s\n", "ああ");
It outputs :
ã‚ã‚
What else should I 开发者_高级运维do to print it correctly?
Assuming that's unicode, compile with a C99 compiler
#include <locale.h>
#include <stdio.h>
#include <wchar.h>
int main(void) {
wchar_t buff[3]; // = L"ああ";
buff[0] = buff[1] = L'\U00003042';
buff[2] = 0;
setlocale(LC_ALL, "");
wprintf(L"%ls\n", buff);
return 0;
}
The absolutely correct version should look like this:
#include <stdio.h>
#include <wchar.h>
#include <locale.h>
int main()
{
wchar_t *s1 = L"♠♣♥♦";
wchar_t *s2 = L"Příšerně žluťoučký kůň";
wchar_t *s3 = L"ああ";
setlocale(LC_ALL,""); /* pull system locale for correct output */
wprintf(L"%ls\n%ls\n%ls\n",s1,s2,s3); /* print all three strings */
return 0;
}
Edit:
As pointed out in the comments by R.., you can actually use printf
instead of wprintf
. The only limitation is that the formating string must be const char*
for the printf
instead of const wchar_t*
for wprintf
. So no wide characters in the formatting string.
I think you might have to use wprintf
, the wide character version of printf
.
Technically, C89 doesn't support mutli-byte encoding for string literals (only ASCII), standard C functions can handle input/output with other encodings, provided it can be treated as an opaque blob.
E.g., this one will be correct:
#include <stdio.h>
int main() {
printf("%s\n", "\xe3\x81\x82\xe3\x81\x82");
}
This one may be wrong (if you expect it to print the number of characters):
#include <stdio.h>
#include <string.h>
int main() {
printf("%lu\n", strlen("\xe3\x81\x82\xe3\x81\x82"));
}
The compiler may interpret source input as UTF-8, but it's not guaranteed. For example, GCC does seem to read UTF-8 source files correctly:
hexdump -Cv b.c
00000000 23 69 6e 63 6c 75 64 65 20 3c 73 74 64 69 6f 2e |#include <stdio.|
00000010 68 3e 0a 69 6e 74 0a 6d 61 69 6e 28 29 0a 7b 0a |h>.int.main().{.|
00000020 20 20 20 20 70 72 69 6e 74 66 28 22 25 73 5c 6e | printf("%s\n|
00000030 22 2c 20 22 e3 81 82 e3 81 82 22 29 3b 0a 7d 0a |", "......");.}.|
00000040
Note the same string is the literal (e3 81 82 e3 81 82
) and is exactly the same byte sequence that gets printed out:
./a.out | hexdump -Cv
00000000 e3 81 82 e3 81 82 0a |.......|
00000007
If your locale isn't UTF-8, or your editor saved the file with encoding other than UTF-8, I suspect the result will be different.
精彩评论