For the life of me I cannot understand the XML::Twig documentation for entity handling.
I've got some XML I'm generating with HTML::Tidy. The call is as follows:
my $tidy = HTML::Tidy->new({
'indent' => 1,
'break-before-br' => 1,
'output-xhtml' => 0,
'output-xml' => 1,
'char-encoding' => 'raw',
});
$str = "foo bar";
$xml = $tidy->clean("<xml>$str</xml>");
which produces:
<html>
<head>
<meta content="tidyp for Linux (v1.02), see www.w3.org" name="generator" />
<title></title>
</head>
<body>foo bar</body>
</html>
XML::Twig (understandably) barfs at the
. I want to do some transformations, running it through XML::Twig:
my $twig = XML::Twig->开发者_开发百科new(
twig_handlers => {... handlers ...}
);
$twig->parse($xml);
The $twig->parse
line barfs on the
, but I can't figure out how to add the
element programmatically. I tried things like:
my $entity = XML::Twig::Entity->new("nbsp", " ");
$twig->entity_list->add($entity);
$twig->parse($xml);
... but no joy.
Please help =)
A dirty, but efficient, trick in a case like this would be to add a fake DTD declaration.
Then XML::Parser, which does the parsing, will assume that the entity is defined in the DTD and won't barf on it.
To get rid of the fake DTD declaration, you can output the root of the twig. If you need a different declaration, create it and replace the current one:
#!/usr/bin/perl
use strict;
use warnings;
use XML::Twig;
my $fake_dtd= '<!DOCTYPE head SYSTEM "foo"[]>'; # foo may not even exist
my $xml='<html>
<head>
<meta content="tidyp for Linux (v1.02), see www.w3.org" name="generator" />
<title></title>
</head>
<body>foo bar</body>
</html>';
XML::Twig->new->parse( $fake_dtd . $xml)->root->print;
use strict;
use XML::Twig;
my $doctype = '<?xml version="1.0" encoding="utf-8"?><!DOCTYPE html [<!ENTITY nbsp " ">]>';
my $xml = '<html><head><meta content="tidyp for Linux (v1.02), see www.w3.org" name="generator" /><title></title></head><body>foo bar</body></html>';
my $xTwig = XML::Twig->new();
$xTwig->safe_parse($doctype . $xml) or die "Failure to parse XML : $@";
print $xTwig->sprint();
There maybe a better way, but the code below worked for me:
my $filter = sub {
my $text = shift;
my $ascii = "\x{a0}"; # non breaking space
my $nbsp = ' ';
$text =~ s/$ascii/$nbsp/;
return $text;
};
XML::Twig->new( output_filter => $filter )
->parse_html( $xml )
->print;
精彩评论