开发者

SAXParser problem grabbing tag value with & character

开发者 https://www.devze.com 2023-01-07 05:09 出处:网络
I have a SAXParser with with an XMLReader. SAXParserFactory saxPF = SAXParserFactory.newInstance(); SAXParser sp = saxPF .newSAXParser();

I have a SAXParser with with an XMLReader.

SAXParserFactory saxPF = SAXParserFactory.newInstance();
SAXParser sp = saxPF .newSAXParser();
XMLReader xmlR = sp.getXMLReader();
MyHandler myHandler = new MyHandler();
xmlR .setContentHandler(myHandler );

My handler code uses startElement and endElement to detect with it's inside a tag. It does this by setting a boolean and using characters() to grab the value

public void startElement(String namespaceURI, 
    String localName, String qName, Attributes atts) throws SAXException {
    if (localName.equals("myTag")) this.in_myTag = true;
}

public void characters(char ch[], int start, int length) {
        开发者_JAVA技巧    if(in_myTag )  { c.setMyTag(new String(ch, start, length));
}

The problem is that I have a tag that is "A & B Value" and it's notifying characters() for "A" and "&" and "B" and "Value". So the final value of setMyTag is "Value"

<myTag>A & B value</myTag>

http://www.saxproject.org/apidoc/org/xml/sax/helpers/DefaultHandler.html


<myTag>A & B value</myTag>

(That's not XML. I assume you mean A &amp; B value, to be well-formed.)

In general you can't guarantee that your characters() handler will get called exactly once per element. If there is no text content in the element it won't get called at all; if there are entity references or the text is very long you are likely to get called more than once. Plus of course any comments, PIs or other elements in there will definitely need multiple calls.

Whilst it is unusual for a predefined entity reference like &amp; to cause a separate callback to the content handler, there's nothing in the spec to say it can't happen at any time for any (or no) reason. In particular:

SAX parsers may return all contiguous character data in a single chunk, or they may split it into several chunks

Consequently, a SAX handler must collect every piece of text content sent to it and join them together when endElement occurs, rather than setting the content from a single characters callback.


Take a look at that Trouble parsing quotes with SAX parser (javax.xml.parsers.SAXParser) on Android API 1.5

By the way & is incorrect XML character, it should be &amp;

0

精彩评论

暂无评论...
验证码 换一张
取 消