开发者

How to upload binary files in mod_perl with CGI.pm?

开发者 https://www.devze.com 2023-03-14 02:04 出处:网络
I have a big piece of production code, that works. But after I setup a new environment in virtual machine I have one issue -- everytime I need to upload a binary file it become messed up with unicode

I have a big piece of production code, that works. But after I setup a new environment in virtual machine I have one issue -- everytime I need to upload a binary file it become messed up with unicode conversions.

So there is a sub, where issue is:

sub save_uploaded_file
{
    # $file is obtained by param(zip) 
    my ($file) = @_;
    my ($fh, $fname) = tmpnam;
    my ($br, $buffer);
    # commenting out next 2 lines doesn't help either
    binmode $file, ':raw';
    binmode $fh, ':raw';
    while ($br = sysread($file, $buffer, 16384))
    {
        syswrite($fh, $buffer, $br);
    }
    close $fh;
    return $fname;
}

Its used to upload zip archives, but they are uploaded as malformed (their size is always bigger than in original) and I looked inside of them with hex editor and found that there are lots unicode replacement charaters, encoded in utf-8, inside (EF BF BD).

I figured out that the total sum of bytes read is bigger than original file. So the problem starts at sysread.

Text files uploads well.

Update: There is a binary representation of first few bytes of transfered file:

0000000: 504b 0304 1400 0000 0800 efbf bd1开发者_JAVA技巧c efbf  PK..............
0000010: bd3e efbf bd1d 3aef bfbd efbf bd02 0000  .>....:.........
0000020: efbf bd05 0000 0500 1c00 422e 786d 6c55  ..........B.xmlU
0000030: 5409 0003 5cef bfbd efbf bd4d 18ef bfbd  T...\......M....
0000040: efbf bd4d 7578 0b00 0104 efbf bd03 0000  ...Mux..........
0000050: 0404 0000 00ef bfbd efbf bdef bfbd 6bef  ..............k.

And the original one:

0000000: 504b 0304 1400 0000 0800 b81c d33e df1d  PK...........>..
0000010: 3aa0 8102 0000 a405 0000 0500 1c00 422e  :.............B.
0000020: 786d 6c55 5409 0003 5cd4 fc4d 18c7 fc4d  xmlUT...\..M...M
0000030: 7578 0b00 0104 e803 0000 0404 0000 008d  ux..............
0000040: 94df 6bdb 3010 c7df 03f9 1f0e e1bd 254e  ..k.0.........%N
0000050: ec74 6c85 d825 2bac 9442 379a c25e ca8a  .tl..%+..B7..^..

Update2 The running software is centos 5.6, perl 5.8.8, apache 2.2.3


Does tmpnam returns a filehandle marked as utf8? I think not!

try binmode $fh, ":utf8" ;


sysread is reading the file as utf8, but the file is not utf8! the first ten bytes are in "the basic latin range" (00-7F) so they are interpreted as the same byte. The next byte 'b8' is not in the valid range and its being replaced by 'efbfbd' <=> \x{FFFD} (a special char to indicate a decoding error). All the bytes greater than 7F are being replaced by \x{FFFD}.

What perl version and OS are you using? There is a report (perl bug 75106) with title binmode $fh, ":raw" doesn't undo :utf8 on win32!


As far as I know, Perl 5 doesn't swap in the replacement character in any of its io layers. They only conversions I am aware of are newline conversions (i.e. the text layer). Are you certain the source file does not contain those byte sequences?

This code works for me, does it work for you?

#!/usr/bin/perl

use strict;
use warnings;

use File::Temp qw/:POSIX/;

sub save_uploaded_file {
    # $file is obtained by param(zip) 
    my ($file) = @_;
    my ($fh, $fname) = tmpnam;
    my ($br, $buffer);
    # commenting out next 2 lines doesn't help either
    binmode $file, ':raw'
        or die "could not change input file to raw: $!";
    binmode $fh, ':raw'
        or die "could not change tempfile to raw: $!";
    while ($br = sysread($file, $buffer, 16384)) {
        syswrite($fh, $buffer, $br);
    }
    close $fh
        or die "could not close tempfile: $!";
    return $fname;
}

sub check {
    my $input_file = shift;

    print "$input_file is ", -s $input_file, " bytes long\n"; 

    open my $fh, "<:raw", $input_file
        or die "could not open $input_file for reading: $!";

    my $bytes = sysread $fh, my $buf, 4096;

    print "read $bytes bytes: ", 
        join(", ", map { sprintf "%02x", $_ } unpack "C*", $buf),
        "\n";
}

my $input_file = "test.bin";

open my $fh, ">:raw", $input_file
    or die "could not open $input_file for writing: $!";

print $fh pack "CC", 0xFF, 0xFD
    or die "could not write to $input_file: $!";

close $fh
    or die "could not close $input_file: $!";

check $input_file;

open my $newfh, "<", $input_file
    or die "could not open $input_file: $!";
my $new_file = save_uploaded_file $newfh;

check $new_file;


I had what I think is the same problem. The error seemed to be occurring very early, because none of my code ever executed when client attempted to load a binary file. I fixed it by setting STDIN to "raw" (binary), at the top of the script…

binmode(STDIN, ':raw') ;

0

精彩评论

暂无评论...
验证码 换一张
取 消