I am writing some code to parse a very large flat text file into objects which are persisted to a database. This is working on sections of the file (i.e. if I 'top' the first 2000 lines), but I am running into a java.lang.OutOfMemoryError: Java heap space
error when I try and process the full file.
I am using a BufferedReader to read the file line by line, and I was under the impression that this negates the requirement to load the entire text file into memory. Hopefully my code is fairly self-explanatory. I have run my code through the Eclipse Memory Analyser, which informs me that:
The thread java.lang.Thread @ 0x27ee0478 main keeps local variables with total size 69,668,888 (98.76%) bytes.
The memory is accumulated in one instance of "char[]" loaded by "<system class loader>"**
Helpful comments greatly appreciated!
Jonathan
public ArrayList<Statement> parseGMIFile(String filePath)
throws IOException {
ArrayList<Statement> statements = new ArrayList<Statement>();
// Statement Properties
String sAccount = "";
String sOffice = "";
String sFirm = "";
String sDate1 = "";
String sDate2 = "";
Date date = new Date();
StringBuffer sData = new StringBuffer();
BufferedReader in = new BufferedReader(new FileReader(filePath));
String line;
String prevCode = "";
int lineCounter = 1;
int globalLineCounter = 1;
while ((line = in.readLine()) != null) {
// We extract the GMI code from the end of the first line
String newCode = line.substring(GMICODE_START_POS).trim();
// Extract date
if (newCode.equals(prevCode)) {
if (lineCounter == DATE_LINE) {
sDate1 = line.substring(DATE_START_POS, DATE_END_POS).trim();}
if (lineCounter == DATE_LINE2) {
sDate2 = line.substring(DATE_START_POS, DATE_END_POS).trim();}
if (sDate1.equals("")){
sDate1 = sDate2;}
开发者_运维技巧 SimpleDateFormat formatter=new SimpleDateFormat("MMM dd, yyyy");
try {
date=formatter.parse(sDate1);
} catch (ParseException e) {
e.printStackTrace();
}
sFirm = line.substring(FIRM_START_POS, FIRM_END_POS);
sOffice = line.substring(OFFICE_START_POS, OFFICE_END_POS);
sAccount = line.substring(ACCOUNT_START_POS,
ACCOUNT_END_POS);
lineCounter++;
globalLineCounter++;
sData.append(line.substring(0, END_OF_DATA)).append("\n");
} else {
// Instantiate New Statement Object
Statement stmt = new Statement(sAccount, sOffice, sFirm,
date, sData.toString());
// Add to collection
statements.add(stmt);
// log.info("-----------NEW STATEMENT--------------");
sData.setLength(0);
lineCounter = 1;
}
prevCode = newCode;
}
return statements;
}
STACKTRACE: Exception in thread "main" org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'dbPopulator' defined in class path resource [app-context.xml]: Invocation of init method failed; nested exception is java.lang.OutOfMemoryError: Java heap space at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.initializeBean(AbstractAutowireCapableBeanFactory.java:1401) at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:512) at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:450) at org.springframework.beans.factory.support.AbstractBeanFactory$1.getObject(AbstractBeanFactory.java:290) at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:222) at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:287) at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:189) at org.springframework.beans.factory.support.DefaultListableBeanFactory.preInstantiateSingletons(DefaultListableBeanFactory.java:557) at org.springframework.context.support.AbstractApplicationContext.finishBeanFactoryInitialization(AbstractApplicationContext.java:842) at org.springframework.context.support.AbstractApplicationContext.refresh(AbstractApplicationContext.java:416) at org.springframework.context.support.ClassPathXmlApplicationContext.(ClassPathXmlApplicationContext.java:139) at org.springframework.context.support.ClassPathXmlApplicationContext.(ClassPathXmlApplicationContext.java:93) at Main.main(Main.java:11) Caused by: java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:2882) at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:100) at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:390) at java.lang.StringBuffer.append(StringBuffer.java:224) at services.GMILogParser.parseGMIFile(GMILogParser.java:133) at services.DBPopulator.init(DBPopulator.java:27) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.invokeCustomInitMethod(AbstractAutowireCapableBeanFactory.java:1529) at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.invokeInitMethods(AbstractAutowireCapableBeanFactory.java:1468) at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.initializeBean(AbstractAutowireCapableBeanFactory.java:1398) ... 12 more
Adding more memory in the start parameters is IMHO a mistake. Those parameters are application wide. And may penalize by increasing gc
times. Moreover, you might not know the size in advance.
You use MemoryMappedFiles
and look at the java.nio.* to do so. Doing so you can load as you read, and the memory is not placed in the ordinary memory space.
By reading at a low level you do it in chunks of variable length. And the speed is important. If your file is large, it may take too much time to read it. And the quantity of Objects
you store in JVM
makes the GC
works and the application slows down.
From the java reference:
A
byte buffer
can be allocated as a direct buffer, in which case the Java virtual machine will make a best effort to performnative I/O operations
directly upon it.A
byte buffer
can be created by mapping a region of a file directly into memory, in which case a few additional file-related operations defined in the MappedByteBuffer class are available.A
byte buffer
provides access to its content as either a heterogeneous or homogeneous sequence of binary data of any non-boolean primitive type, in either big-endian or little-endian byte order.
Maybe it is the statements object that is growing too large? If so, maybe you should persist it to the database in batches instead of all at once?
Another thing that can happen here: if your file is bigger than half your heap and does not contain any linebreaks in.readLine() would try to read the whole file and fail in this case.
It seems your application is using the default memory allocated by the VM (about 64 MB if I remember correctly). Since your application is a special-purpose one, I'd suggest increasing the memory available for the application (e.g. running the app using java -Xmx256m
would allow it to use up to 256 MB of RAM). You could also try running it using the server VM (java -server yourapp
), which will try to optimize things a bit.
-Xmx1024M -XX:MaxPermSize=256M
has solved my java.lang.OutOfMemoryError: Java heap space error
.
Hope this will work.
code seems right to me. maybe I should have used StringBuffer in place of String.
String are pretty nasty in java, for each modification you perform on them, a new object is created, and refs can remain anywhere in the code.
Usually I read file lines inside a private method using local vars, just to be sure that no ref to String are left around.
The list you're getting back is a list of beans with String properties? If so, change 'em to StringBuffer and rerun the profiling.
Let me know if this helped you.
Regards,
M.
It seems that sData causes the overflow. There should be several (million?) statements in the text with the same GMI code.
Accumulations by char[] means either String or StringBuilder. Since it fails with resizing StringBuilder, it should be the reason.
Just try to output sData to stdout for debugging and see what happens.
I encountered the same problem a few months back
I used Scanner
class:
Scanner scanner = new Scanner(file);
instead of:
BufferedReader in = new BufferedReader(new FileReader(filePath));
Why don't you try to replace the line (if your using JDK 6, substring memory problem was solved in JDK 7)
String newCode = line.substring(GMICODE_START_POS).trim();
Replace line:
String newCode = new String(line.substring(GMICODE_START_POS));
精彩评论