Why do I get from the second loop (CHECK-argument set) a different output?
#!/usr/bin/env perl
use war开发者_如何学Gonings;
use 5.012;
use Encode qw(encode);
my $s = 'a';
for my $encoding ( 'iso-8859-1', 'iso-8859-15', 'cp1252', 'cp850' ) {
my $encoded = encode( $encoding, $s );
my $c = unpack '(B8)*', $encoded;
printf "%-12s:\t%8s\n", $encoding, $c;
}
say "-------------------";
for my $encoding ( 'iso-8859-1', 'iso-8859-15', 'cp1252', 'cp850' ) {
my $encoded = encode( $encoding, $s, Encode::FB_WARN );
my $c = unpack '(B8)*', $encoded;
printf "%-12s:\t%8s\n", $encoding, $c;
}
# iso-8859-1 : 01100001
# iso-8859-15 : 01100001
# cp1252 : 01100001
# cp850 : 01100001
# -------------------
# iso-8859-1 : 01100001
# Use of uninitialized value $c in printf at ./perl1.pl line 20.
# iso-8859-15 :
# Use of uninitialized value $c in printf at ./perl1.pl line 20.
# cp1252 :
# Use of uninitialized value $c in printf at ./perl1.pl line 20.
# cp850 :
The behavior is described in documentation (see snip below) - it modifies data and leaves unprocessed portion in $s
. Since there is no error, it basically clears your variable.
*CHECK* = Encode::FB_QUIET
If *CHECK* is set to Encode::FB_QUIET, (en|de)code will immediately
return the portion of the data that has been processed so far when an
error occurs. The data argument will be overwritten with everything
after that point (that is, the unprocessed part of data). This is
handy when you have to call decode repeatedly in the case where your
source data may contain partial multi-byte character sequences, (i.e.
you are reading with a fixed-width buffer). Here is a sample code that
does exactly this:
my $buffer = ''; my $string = '';
while(read $fh, $buffer, 256, length($buffer)){
$string .= decode($encoding, $buffer, Encode::FB_QUIET);
# $buffer now contains the unprocessed partial character
}
*CHECK* = Encode::FB_WARN
This is the same as above, except that it warns on error. Handy when
you are debugging the mode above.
When CHECK
is set to Encode::FB_QUIET
, the data argument is overwritten:
perl -MEncode -Mutf8 -E '$s="a"; encode("utf-8", $s, Encode::FB_WARN); say $s'
You can prevent the overwriting by oring in Encode::LEAVE_SRC
my $encoded = encode( $encoding, $s, Encode::FB_WARN | Encode::LEAVE_SRC);
精彩评论