I am using the following simple StAX code to iterate through all the tags in XML. Size of input.xml > 100 MB
XMLInputFactory xif = XMLInputFactory.newInstance();
FileInputStream in = new FileInputStream("input.xml");
XMLStreamReader xsr = XMLInputFactory.newInstance().createXMLStreamReader(in);
xsr.next();
while (xsr.hasNe开发者_如何学Pythonxt()) {
xsr.next();
if(xsr.isStartElement() || xsr.isEndElement())
System.out.println(xsr.getLocalName());
}
}
I am getting this error:
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
Please tell me how to get around this. I read that StAX takes care of huge XMLs well, but I am getting the same error as DOM Parsers.
Define the Heap size while running the JVM
-Xms initial java heap size
-Xmx maximum java heap size
-Xmn the size of the heap for the young generation
Example:
bin/java.exe -Xmn100M -Xms500M -Xmx500M
Increase the MaxHeap size of your Vm using the -Xmx parameter.
java -Xmx512m ....
From Wikipedia: Traditionally, XML APIs are either:
tree based - the entire document is read into memory as a tree structure for random
access by the calling application
event based - the application registers to receive events as entities are encountered
within the source document.
StAX was designed as a median between these two opposites. In the StAX metaphor,
the programmatic entry point is a cursor that represents a point within the
document. The application moves the cursor forward - 'pulling' the information from
the parser as it needs. This is different from an event based API - such as SAX -
which 'pushes' data to the application - requiring the application to maintain state
between events as necessary to keep track of location within the document.
So for 100M and more - I preffer SAX - if it possible use instead StAX.
But I tryed your code with file size 2,6GB on JVM64. Without problem. So I suppose that problem not for size of file but for may be for data.
精彩评论