I'm not too fluent with the perl XML libraries (actually, I really suck at understanding encoding in general), all I'm doing is taking a string that possi开发者_如何学JAVAbly has characters such as "à" and putting it in an XML file, but when I open the file, I get an encoding error at the line containing such a character.
So I just need a lightweight way to take a string and encode it for XML.
Your XML should specify UTF-8 encoding. For example:
<?xml version="1.0" encoding="UTF-8" ?>
There's a lot of good information at UTF-8 and Unicode Standards.
Your Perl program should also be set its output filehandle to the UTF-8 encoding so it writes the data correctly. See the perl documentation for open, for instance.
The only XML-specific escaping you need is for the XML reserved characters. See Where can I get a list of the XML document escape characters? on Stackoverflow.
You can use Perl's XML::Code or a similar module to escape the XML-specific chars
Example using LibXML, which is the standard big hammer for XML. Not lightweight, but your problem really is a familiar nail and at least we're not spending much time writing code, either.
use XML::LibXML ();
XML::LibXML::Document->new('1.0', 'UTF-8')->createTextNode($text)->toString; # returns properly encoded fragment
See method toFile
for writing into a file.
I couldn't get answer 2 to work. Try this, it produces XML which says "not well-formed (invalid token)":
#!/usr/bin/perl -wT
use XML::LibXML;
use HTML::Entities;
binmode(STDOUT, ':utf8');
my $string = 'foo ä bar';
$string = decode_entities($string);
print XML::LibXML::Document->new('1.0', 'UTF-8')->createTextNode($string)->toString();
精彩评论