开发者

Perl Regex Error Help

开发者 https://www.devze.com 2023-01-31 17:21 出处:网络
I\'m receiving a similar error in two completely unrelated places in our code that we can\'t seem to figure out how to resolve. The first error occurs when we try to parse XML using XML::Simple:

I'm receiving a similar error in two completely unrelated places in our code that we can't seem to figure out how to resolve. The first error occurs when we try to parse XML using XML::Simple:

Malformed UTF-8 character (unexpected end of string) in substitution (s///) at /usr/local/lib/perl5/XML/LibXML/Error.pm line 217.

And the second is when w开发者_运维知识库e try to do simple string substitution:

Malformed UTF-8 character (unexpected non-continuation byte 0x78, immediately after start byte 0xe9) in substitution (s///) at /gold/content/var/www/alltrails.com/cgi-bin/API/Log.pm line 365.

The line in question in our Log.pm file is as follows where $message is a string:

$message =~ s/\s+$//g;

Our biggest problem in troubleshoot this is that we haven't found a way to identify the input that is causing this to occur. My hope is that some else has run into this issue before and can provide advice or sample code that will help us resolve it.

Thanks in advance for your help!


Not sure what the cause is, but if you want to log the message that is causing this, you could always add a __DIE__ signal handler to make sure you capture the error:

$SIG{__DIE__} = sub { 
  if ($_[0] =~ /Malformed UTF-8 character/) { 
    print STDERR "message = $message\n"; 
  } 
};

That should at least let you know what string is triggering these errors.


Can you do a hex dump of the source data to see what it looks like?

If your reading this from a file, you can do this with a tool like "od".

Or, you can do this inside the perl script itself by passing the string to a function like this:

sub DumpString {
    my @a = unpack('C*',$_[0]);
    my $o = 0;
    while (@a) {
        my @b = splice @a,0,16;
        my @d = map sprintf("%03d",$_), @b;
        my @x = map sprintf("%02x",$_), @b;
        my $c = substr($_[0],$o,16);
        $c =~ s/[[:^print:]]/ /g;
        printf "%6d %s\n",$o,join(' ',@d);
        print " "x8,join('  ',@x),"\n";
        print " "x9,join('   ',split(//,$c)),"\n";
        $o += 16;
    }
}


Sounds like you have an "XML" file that is expected to have UTF-8 encoded characters but doesn't. Try just opening it and looking for hibit characters.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号