In this output, why am I getting extra newlines after printing non-ASCII Unicode characters?
Platform is Windows Vista and problem occurs after chcp 65001
but not after chcp 850
C:\>chcp 850 Active code page: 850 C:\>perl unicode_bug_1.pl Budweiser Budweiser Budweiser Bud─øjovick├¢ Budvar Bud─øjovick├¢ Budvar Bud─øjovick├¢ Budvar C:\>chcp 65001 Active code page: 65001 C:\>perl unicode_bug_1.pl Budweiser 开发者_如何转开发Budweiser Budweiser Budějovický Budvar Budějovický Budvar Budějovický Budvar
from this program
#!perl
use strict;
use warnings;
binmode (STDOUT, "encoding(UTF-8)"); # so no "Wide character in print" warning
print "Budweiser\n" for 1..3;
print "Bud\N{U+011B}jovick\N{U+00FD} Budvar\n" for 1..3;
This seems to be a bug in Perl. I had thought it was a bug in Windows code page 65001 not really being supported for the console but I finally made test programs in C and Perl and the problem does not happen in the C version. It happens no matter where the Unicode character occurs in the line but the line you're printing must be wider than the console supports.
Here is my C program:
#include "stdafx.h"
#include "Windows.h"
int _tmain(int argc, _TCHAR* argv[])
{
BOOL b = SetConsoleOutputCP(65001);
printf("set console output codepage returned %d\n", b);
printf("cαfe\n");
printf("1234567890 café\n");
printf("1234567890 1234567890 cαfe\n");
printf("1234567890 1234567890 1234567890 café\n");
printf("1234567890 1234567890 1234567890 1234567890 cαfe\n");
printf("1234567890 1234567890 1234567890 1234567890 1234567890 café\n");
printf("1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 cαfe\n");
printf("1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 café\n");
printf("1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 cαfe\n");
printf("1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 café\n");
printf("1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 cαfe\n");
printf("1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 café\n");
printf("1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 cαfe\n");
return 0;
}
And here is my Perl program:
#
use utf8;
binmode STDOUT, ':utf8';
printf STDOUT "cαfe\n";
printf STDOUT "1234567890 café\n";
printf STDOUT "1234567890 1234567890 cαfe\n";
printf STDOUT "1234567890 1234567890 1234567890 café\n";
printf STDOUT "1234567890 1234567890 1234567890 1234567890 cαfe\n";
printf STDOUT "1234567890 1234567890 1234567890 1234567890 1234567890 café\n";
printf STDOUT "1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 cαfe\n";
printf STDOUT "1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 café\n";
printf STDOUT "1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 cαfe\n";
printf STDOUT "1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 café\n";
printf STDOUT "1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 cαfe\n";
printf STDOUT "1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 café\n";
printf STDOUT "1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 cαfe\n";
UPDATE
No I was wrong, with the help of some of the guys at #perl on irc.perl.org it turns out to be a bug in the Microsoft API. WriteFile
is documented to return the number of bytes written but returns the number of characters written, which depends on the codepage. A bug was filed in March 2010.
There is more discussion in the MSDN forums.
UPDATE 2
I posted Michael Kaplan's blog, "Sorting it all out", about this problem and he responded with the article entitled "Hidden in plain site: a purloined letter kind of a bug report". He's a Microsoft internationalization expert so you will surely find some insights there...
I'm not getting any newlines. Is your command line wide enough to fit your output?
精彩评论