Using perl's XML::SAX module I'm parsing (x)html templates, and as a result am simply echoing a lot of the input to output. I have a SAX event handler that extends XML::SAX::Base and implements the usual methods - start_element
, end_element
and so on.
Now my question concerns elements that do not take a closing tag - e.g <img />
, <link />
, and <input />
. The parser will call start_element($eleme开发者_运维问答nt_name, %attribute_hash)
and end_element
for these tags, but how do I know that the element is self-contained?
In other words, I want to write out <img src="blah" />
as the same, not as <img ...></img>
which I belive is invalid.
Short of maintaining a list of these elements, what can I do? Is there a way in SAX of directly echoing an element as opposed to reconstructing it from what's passed to the event handlers?
First, building off Quentin's comment, you're using an XML parser to handle HTML. There's nothing particularly wrong with that as long as the HTML is relatively clean. However, if you need to be in compliance with HTML (as opposed to XHTML), then perhaps an XML parser is the wrong tool.
If you want to hack around it, then here's what you could do. Implement a characters()
callback, which will set a flag if there are any non-whitespace characters present. The start_element()
callback will reset this flag. The end_element()
callback will consider the tag empty if the flag was not set and write the syntax accordingly.
Note that this will also catch tags like <td></td>
, transforming them to <td />
.
Short of maintaining a list of these elements, what can I do?
Nothing :/ usually the DTD maintains this list, so you would ask the dtd object, before emitting end tags ... but XML::SAX doesn't appear to support such a thing since it doesn't support validation
The other option is keeping state, so you know when an element is empty, and omitting a closing tag, but that is yucky too :) like maintaining your own list
Is there a way in SAX of directly echoing an element as opposed to reconstructing it from what's passed to the event handlers?
No, SAX doesn't specify such a thing , see the normative/reference implementation at Echoing an XML File with the SAX Parser
XML::Twig on the other hand does provide for this, see the docs for
pretty_print => 'indented', # output will be nicely formatted
empty_tags => 'html', # outputs <empty_tag />
You want to use XML::Twig
精彩评论