开发者

How can I encode a Perl string so I can put it into an XML document?

开发者 https://www.devze.com 2022-12-31 14:57 出处:网络
I\'m not too fluent with the perl XML libraries (actually, I really suck at understanding encoding in general), all I\'m doing is taking a string that possi开发者_如何学JAVAbly has characters such as

I'm not too fluent with the perl XML libraries (actually, I really suck at understanding encoding in general), all I'm doing is taking a string that possi开发者_如何学JAVAbly has characters such as "à" and putting it in an XML file, but when I open the file, I get an encoding error at the line containing such a character.

So I just need a lightweight way to take a string and encode it for XML.


Your XML should specify UTF-8 encoding. For example:

<?xml version="1.0" encoding="UTF-8" ?>

There's a lot of good information at UTF-8 and Unicode Standards.

Your Perl program should also be set its output filehandle to the UTF-8 encoding so it writes the data correctly. See the perl documentation for open, for instance.

The only XML-specific escaping you need is for the XML reserved characters. See Where can I get a list of the XML document escape characters? on Stackoverflow.

You can use Perl's XML::Code or a similar module to escape the XML-specific chars


Example using LibXML, which is the standard big hammer for XML. Not lightweight, but your problem really is a familiar nail and at least we're not spending much time writing code, either.

use XML::LibXML ();
XML::LibXML::Document->new('1.0', 'UTF-8')->createTextNode($text)->toString; # returns properly encoded fragment

See method toFile for writing into a file.


I couldn't get answer 2 to work. Try this, it produces XML which says "not well-formed (invalid token)":

#!/usr/bin/perl -wT

use XML::LibXML;
use HTML::Entities;

binmode(STDOUT, ':utf8');
my $string = 'foo &auml; bar';
$string = decode_entities($string);
print XML::LibXML::Document->new('1.0', 'UTF-8')->createTextNode($string)->toString();  
0

精彩评论

暂无评论...
验证码 换一张
取 消