开发者

How can I open a Unicode file with Perl?

开发者 https://www.devze.com 2022-12-23 18:36 出处:网络
I\'m using osql to run several sql scripts against a database and then I need to look at the results file to check if any errors occurred.The problem is that Perl doesn\'t seem to like the fact that t

I'm using osql to run several sql scripts against a database and then I need to look at the results file to check if any errors occurred. The problem is that Perl doesn't seem to like the fact that the results files are Unicode.

I wrote a little test script to test it and the output comes out all warbled:

$file = shift;

open OUTPUT, $file or die "Can't open $file: $!\n";
while (<OUTPUT>) {
    print $_;
    if (/Invalid|invalid|Cannot|cannot/) {
        push(@invalids, $file);
        print "invalid file - $inputfile - schedule for retry\n";
        last;
    }            
}

Any ideas? I've tried decoding using decode_utf8 but it makes no difference. I've also tried to set the encoding when opening the file.

I think the problem might be that osql puts the result file in UTF-16 format, but I'm not sure. When I open the file in textpad it just tells me 'Unicode'.

Edit: Using perl v5.8.8 Edit: Hex dump:

file name: Admin_CI.User.sql.results
mime type: 

0000-0010:  ff fe 31 00-3e 00 20 00-32 00 3e 00-20 00 4d 00  ..1.>... 2.>...M.
0000-0020:  73 00 67 00-20 00 31 00-35 00 30 00-30 00 37 00  s.g...1. 5.0.0.7.
0000-0030:  2c 00 20 00-4c 00 65 开发者_如何学Go00-76 00 65 00-6c 00 20 00  ,...L.e. v.e.l...
0000-0032:  31 00                                            1.


The file is presumably in UCS2-LE (or UTF-16 format).

C:\Temp> notepad test.txt

C:\Temp> xxd test.txt
0000000: fffe 5400 6800 6900 7300 2000 6900 7300  ..T.h.i.s. .i.s.
0000010: 2000 6100 2000 6600 6900 6c00 6500 2e00   .a. .f.i.l.e...

When opening such file for reading, you need to specify the encoding:

#!/usr/bin/perl

use strict; use warnings;

my ($infile) = @ARGV;

open my $in, '<:encoding(UCS-2le)', $infile
    or die "Cannot open '$infile': $!";

Note that the fffe at the beginning is the BOM.


The answer is in the documentation for open, which also points you to perluniintro. :)

open my $fh, '<:encoding(UTF-16LE)', $file or die ...;

You can get a list of the names of the encodings that your perl supports:

% perl -MEncode -le "print for Encode->encodings(':all')"

After that, it's up to you to find out what the file encoding is. This is the same way you'd open any file with an encoding different than the default, whether it's one defined by Unicode or not.

We have a chapter in Effective Perl Programming that goes through the details.


Try opening the file with an IO layer specified, e.g. :

open OUTPUT,  "<:encoding(UTF-8)", $file or die "Can't open $file: $!\n";

See perldoc open for more on this.


    #
    # -----------------------------------------------------------------------------
    # Reads a file returns a sting , if second param is utf8 returns utf8 string
    # usage:
    # ( $ret , $msg , $str_file )
    #         = $objFileHandler->doReadFileReturnString ( $file , 'utf8' ) ;
    # or
    # ( $ret , $msg , $str_file )
    #         = $objFileHandler->doReadFileReturnString ( $file ) ;
    # -----------------------------------------------------------------------------
    sub doReadFileReturnString {

        my $self      = shift;
        my $file      = shift;
        my $mode      = shift ;

        my $msg        = {} ;
        my $ret        = 1 ;
        my $s          = q{} ;

        $msg = " the file : $file does not exist !!!" ;
        cluck ( $msg ) unless -e $file ;

        $msg = " the file : $file is not actually a file !!!" ;
        cluck ( $msg ) unless -f $file ;

        $msg = " the file : $file is not readable !!!" ;
        cluck ( $msg ) unless -r $file ;

        $msg .= "can not read the file $file !!!";

        return ( $ret , "$msg ::: $! !!!" , undef )
            unless ((-e $file) && (-f $file) && (-r $file));

        $msg = '' ;

        $s = eval {
             my $string = ();    #slurp the file
             {
                local $/ = undef;

                if ( defined ( $mode ) && $mode eq 'utf8' ) {
                    open FILE, "<:utf8", "$file "
                      or cluck("failed to open \$file $file : $!");
                    $string = <FILE> ;
                    die "did not find utf8 string in file: $file"
                        unless utf8::valid ( $string ) ;
                }
                else {
                    open FILE, "$file "
                      or cluck "failed to open \$file $file : $!" ;
                    $string = <FILE> ;
                }
                close FILE;

             }
            $string ;
         };

         if ( $@ ) {
            $msg = $! . " " . $@ ;
            $ret = 1 ;
            $s = undef ;
         } else {
            $ret = 0 ; $msg = "ok for read file: $file" ;
         }
         return ( $ret , $msg , $s ) ;
    }
    #eof sub doReadFileReturnString
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号