开发者

Out of memory error in StAX

开发者 https://www.devze.com 2023-03-15 16:26 出处:网络
I am using the following simple StAX code to iterate through all the tags in XML. Size of input.xml > 100 MB

I am using the following simple StAX code to iterate through all the tags in XML. Size of input.xml > 100 MB

XMLInputFactory xif = XMLInputFactory.newInstance();
        FileInputStream in = new FileInputStream("input.xml");
        XMLStreamReader xsr = XMLInputFactory.newInstance().createXMLStreamReader(in);

        xsr.next();
        while (xsr.hasNe开发者_如何学Pythonxt()) {

            xsr.next();
            if(xsr.isStartElement() || xsr.isEndElement())
                 System.out.println(xsr.getLocalName());            
            }
        }

I am getting this error:

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space

Please tell me how to get around this. I read that StAX takes care of huge XMLs well, but I am getting the same error as DOM Parsers.


Define the Heap size while running the JVM

-Xms    initial java heap size
-Xmx    maximum java heap size
-Xmn    the size of the heap for the young generation

Example:

bin/java.exe -Xmn100M -Xms500M -Xmx500M


Increase the MaxHeap size of your Vm using the -Xmx parameter.

java -Xmx512m ....


From Wikipedia: Traditionally, XML APIs are either:

tree based - the entire document is read into memory as a tree structure for random 
access by the calling application
event based - the application registers to receive events as entities are encountered 
within the source document.

StAX was designed as a median between these two opposites. In the StAX metaphor,
the  programmatic  entry point is a cursor that represents a point within the 
document. The application moves the cursor forward - 'pulling' the information from 
the parser as it needs. This is different from an event based API - such as SAX - 
which 'pushes' data to the application - requiring the application to maintain state 
between events as necessary to keep track of location within the document.

So for 100M and more - I preffer SAX - if it possible use instead StAX.

But I tryed your code with file size 2,6GB on JVM64. Without problem. So I suppose that problem not for size of file but for may be for data.

0

精彩评论

暂无评论...
验证码 换一张
取 消