开发者

How do I encode characters into numeric character reference format in Perl?

开发者 https://www.devze.com 2023-02-05 15:15 出处:网络
I found this sample script from How can I guess the encoding of a string in Perl? #!C:\\perl\\bin use utf8;

I found this sample script from How can I guess the encoding of a string in Perl?

#!C:\perl\bin
use utf8;
use Encode qw(encode PERLQQ XMLCREF);
my $string = 'This year I went to 北京 Perl workshop.';
#print encode('ascii', $string, PERLQQ);
# This year I went to \x{5317}\x{4eac} Perl workshop.
print encode('ascii', $string, XMLCREF); # This year I went to 北京 Perl workshop.

After having a test I found the encoded output result to be:

This year I went to \x{71fa9} Perl workshop.
This year 开发者_开发问答I went to 񱾩 Perl workshop.

Looks like the result is different from the one author shows us above in the sample code.

I wonder how could I encode a character string and make its output in the numeric character reference format (&#xHHHH;), for example when:

my $string = 'This year I went to 北京 Perl workshop.';

the encoded output would be:

This year I went to 北京 Perl workshop.


I am the author of the answer linked in the question.


You made a simple mistake, you saved the Perl program in GB18030. When it contains use utf8;, you must save it in UTF-8 instead.


$string =~ s/[^\0-\377]/ sprintf '&#x%04x;', ord($&) /ge

Find each character in $string not in the range 0-255 (i.e., any wide characters), and replace it with the value of the expression sprintf '&#x%04x;', ord($&), where $& is the wide character that was matched.

use utf8;
$string = "This year I went to \x{5317}\x{4eac} Perl workshop.";
$string =~ s/[^\0-\377]/ sprintf '&#x%04x;', ord($&) /ge;
print $string;

Produces:

This year I went to 北京 Perl workshop.
0

精彩评论

暂无评论...
验证码 换一张
取 消