开发者

sax : trouble with parsing mixed content text

开发者 https://www.devze.com 2023-03-17 05:09 出处:网络
I\'m having trouble with a part of a xml file. I\'m parsing it with sax and java. I can\'t manage to get all the parts of the text (beginning of the text, middle of the text, end of the text).

I'm having trouble with a part of a xml file. I'm parsing it with sax and java.

I can't manage to get all the parts of the text (beginning of the text, middle of the text, end of the text).

<sometag type="aType">  
     beginning of the text          
     <anothertag type="anotherType" t开发者_高级运维arget="aTarget">middle of the text</anothertag>
     end of the text
</sometag>


Everybody messes up implementing the ContentHandler characters method, because it's totally unintuitive. The trick is that there can be multiple calls to the characters method for a single element text node, you have to accumulate the passed-in fragments in a buffer. See the Java tutorial on SAX. With mixed-content you have to get the text from the buffer at startElement and at endElement.

If that doesn't answer your question, show us some code.


SAX is often surprising till you know what to expect from experience.

You probably want to temporarily put some console logging in the event handlers, or even just breakpoint them all, and set up a little test to see what you're getting. I prefer logging in a case like this because it gives me the "big picture" of what I can expect.

FWIW, Stax is a little easier and similar in performance.

0

精彩评论

暂无评论...
验证码 换一张
取 消