开发者

Editing a BIG XML via DOM parser

开发者 https://www.devze.com 2023-04-08 20:12 出处:网络
If there is a very big XML and DOM parser is used to parse it. Now there is a requirement to add/delete elements from the XML i.e edit 开发者_如何转开发the XML

If there is a very big XML and DOM parser is used to parse it. Now there is a requirement to add/delete elements from the XML i.e edit 开发者_如何转开发the XML How to edit the XML as the entire XML will not be loaded due to memory constraints ? What could be the strategy to solve this ?


You may consider to use a SAX parser instead, which doesn't keep the whole document in memory. It will be faster and will also use much less memory.


As two other answers mentioned already, a SAX parser will do the trick. Your other alternative to DOM is a StAX parser.

Traditionally, XML APIs are either:

  • DOM based - the entire document is read into memory as a tree structure for random access by the calling application
  • event based - the application registers to receive events as entities are encountered within the source document.

Both have advantages; the former (for example, DOM) allows for random access to the document, the latter (e.g. SAX) requires a small memory footprint and is typically much faster.

These two access metaphors can be thought of as polar opposites. A tree based API allows unlimited, random access and manipulation, while an event based API is a 'one shot' pass through the source document.

StAX was designed as a median between these two opposites. In the StAX metaphor, the programmatic entry point is a cursor that represents a point within the document. The application moves the cursor forward - 'pulling' the information from the parser as it needs. This is different from an event based API - such as SAX - which 'pushes' data to the application - requiring the application to maintain state between events as necessary to keep track of location within the document.


StAX is my preferred approach for handling large documents. If DOM is a requirement, check out DOM implementations like Xerces that support lazy construction of DOM nodes:

  • http://xerces.apache.org/xerces-j/faq-write.html#faq-4


Your assumption of memory constraint loading the XML document may only apply to DOM. VTD-XML loads the entire XML in memory, and does it efficiently (1.3x the size of XML document)... both in memory and performance...

http://sdiwc.us/digitlib/journal_paper.php?paper=00000582.pdf

Another distinct benefit, which none other XML framework in existence has, is its incremental update capability...

http://www.devx.com/xml/Article/36379


As stivlo mentioned you can use a SAX parser for reading the XML.

But for writing the XML you can write into fileoutput stream as plain text. I am sure that you will get requirement that mentions after which tag or under which tag the new data should be inserted.

0

精彩评论

暂无评论...
验证码 换一张
取 消