I'm parsing an XML document using SAX in Java.
I'm working with the XML that describes research publications in different fields. Among others there are elements like "abstract" that shortly describes what the reserch paper is about. Th开发者_如何学JAVAe basic HTML formatting is allowed in that field, but I don't want the SAX to threat the HTML tags (like i,b,u,sub,sup an so on) as real XML tags and fire strartElement() and endElement() events on that elements.Is there a way to tell to SAX to ignore some predefined set of XML tags and to pass theirs XML code as is to the characters() method?
I suspect not, without some work. I would perhaps slot in different SAX handlers as you encounter different elements, and push/pop them off a stack. So when you encounter an <abstract>
element, you slot in a new handler that the SAX parser delegates to, and that is intelligent enough to process your HTML elements as you require. Not a trivial solution, I'm afraid.
精彩评论