开发者

Processing XML file with Huge data

开发者 https://www.devze.com 2022-12-24 01:40 出处:网络
I am working on an application which has below requirements - Download a ZIP file from a server. Uncompress the ZIP file, get the content (which is in XML format) from this file into a String.

I am working on an application which has below requirements -

  1. Download a ZIP file from a server.
  2. Uncompress the ZIP file, get the content (which is in XML format) from this file into a String.
  3. Pass this content into another method for parsing and further processing.

Now, my concerns here is the XML file may be of Huge size say like '100MB', and my JVM has memory of only 512 MB, so how can I get this content into Chunks and pass for Parsing and then insert the data into PL/SQL tables.

Since there can be multiple requests 开发者_运维问答running at the same time and considering 512MB of memory what will be the best possible to process this.

How I can get the data into Chunks and pass it as Stream for XML parsing.


Java's XMLReader is a a SAX2 parser. Where a DOM parser reads the whole of the XML file in and creates a (often large) data structure (usually a tree) to represent its contents, a SAX parser lets you register a handler that will be called when pieces of the XML document are recognized. In that call-back code, you can save only enough data to do what you need -- e.g. you might save all the fields that will end up as a single row in the database, insert that row and then discard the data. With this type of design, your program's memory consumption depends less on the file size than on the complexity and size of a single logical data item (in your case, the data that will become one row in the database).

Even if you did use a DOM-style parser, things might not be quite as bad as you expect. XML is pretty verbose, so (depending on how it's structured and such) a 100 MB file will often represent only 10-20 MB of data, and as little as 5 MB of data wouldn't be particularly rare or unbelievable.


Any SAX parser should work since it won't load the entire XML file into memory like a DOM parser.

0

精彩评论

暂无评论...
验证码 换一张
取 消