I'm getting a java.lang.OutOfMemoryError exception: Java heap space.
I'm parsing a XML file, storing data and outputting a XML file when the parsing is complete.
I'm bit surprised to get such error, because the original XML file is not long at all.
Code: http:开发者_如何学JAVA//d.pr/RSzp File: http://d.pr/PjrE
Short answer to explain why you have an OutOfMemoryError, for every centroid found in the file you loop over the already "registered" centroids to check if it is already known (to add a new one or to update the already registered one). But for every failed comparison you add a new copy of the new centroid. So for every new centroid it add it as many times as there are already centroids in the list then you encounter the first one you added, you update it and you leave the loop...
Here is some refactored code:
public class CentroidGenerator {
final Map<String, Centroid> centroids = new HashMap<String, Centroid>();
public Collection<Centroid> getCentroids() {
return centroids.values();
}
public void nextItem(FlickrDoc flickrDoc) {
final String event = flickrDoc.getEvent();
final Centroid existingCentroid = centroids.get(event);
if (existingCentroid != null) {
existingCentroid.update(flickrDoc);
} else {
final Centroid newCentroid = new Centroid(flickrDoc);
centroids.put(event, newCentroid);
}
}
public static void main(String[] args) throws IOException, SAXException {
// instantiate Digester and disable XML validation
[...]
// now that rules and actions are configured, start the parsing process
CentroidGenerator abp = (CentroidGenerator) digester.parse(new File("PjrE.data.xml"));
Writer writer = null;
try {
File fileOutput = new File("centroids.xml");
writer = new BufferedWriter(new FileWriter(fileOutput));
writeOuput(writer, abp.getCentroids());
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
} finally {
try {
if (writer != null) {
writer.close();
}
} catch (IOException e) {
e.printStackTrace();
}
}
}
private static void writeOuput(Writer writer, Collection<Centroid> centroids) throws IOException {
writer.append("<?xml version='1.0' encoding='utf-8'?>" + System.getProperty("line.separator"));
writer.append("<collection>").append(System.getProperty("line.separator"));
for (Centroid centroid : centroids) {
writer.append("<doc>" + System.getProperty("line.separator"));
writer.append("<title>" + System.getProperty("line.separator"));
writer.append(centroid.getTitle());
writer.append("</title>" + System.getProperty("line.separator"));
writer.append("<description>" + System.getProperty("line.separator"));
writer.append(centroid.getDescription());
writer.append("</description>" + System.getProperty("line.separator"));
writer.append("<time>" + System.getProperty("line.separator"));
writer.append(centroid.getTime());
writer.append("</time>" + System.getProperty("line.separator"));
writer.append("<tags>" + System.getProperty("line.separator"));
writer.append(centroid.getTags());
writer.append("</tags>" + System.getProperty("line.separator"));
writer.append("<geo>" + System.getProperty("line.separator"));
writer.append("<lat>" + System.getProperty("line.separator"));
writer.append(centroid.getLat());
writer.append("</lat>" + System.getProperty("line.separator"));
writer.append("<lng>" + System.getProperty("line.separator"));
writer.append(centroid.getLng());
writer.append("</lng>" + System.getProperty("line.separator"));
writer.append("</geo>" + System.getProperty("line.separator"));
writer.append("</doc>" + System.getProperty("line.separator"));
}
writer.append("</collection>" + System.getProperty("line.separator") + System.getProperty("line.separator"));
}
/**
* JavaBean class that holds properties of each Document entry. It is important that this class be public and
* static, in order for Digester to be able to instantiate it.
*/
public static class FlickrDoc {
private String id;
private String title;
private String description;
private String time;
private String tags;
private String latitude;
private String longitude;
private String event;
public void setId(String newId) {
id = newId;
}
public String getId() {
return id;
}
public void setTitle(String newTitle) {
title = newTitle;
}
public String getTitle() {
return title;
}
public void setDescription(String newDescription) {
description = newDescription;
}
public String getDescription() {
return description;
}
public void setTime(String newTime) {
time = newTime;
}
public String getTime() {
return time;
}
public void setTags(String newTags) {
tags = newTags;
}
public String getTags() {
return tags;
}
public void setLatitude(String newLatitude) {
latitude = newLatitude;
}
public String getLatitude() {
return latitude;
}
public void setLongitude(String newLongitude) {
longitude = newLongitude;
}
public String getLongitude() {
return longitude;
}
public void setEvent(String newEvent) {
event = newEvent;
}
public String getEvent() {
return event;
}
}
public static class Centroid {
private final String event;
private String title;
private String description;
private String tags;
private Integer time;
private int nbTimeValues = 0; // needed to calculate the average later
private Float latitude;
private int nbLatitudeValues = 0; // needed to calculate the average later
private Float longitude;
private int nbLongitudeValues = 0; // needed to calculate the average later
public Centroid(FlickrDoc flickrDoc) {
event = flickrDoc.event;
title = flickrDoc.title;
description = flickrDoc.description;
tags = flickrDoc.tags;
if (flickrDoc.time != null) {
time = Integer.valueOf(flickrDoc.time.trim());
nbTimeValues = 1; // time is the sum of one value
}
if (flickrDoc.latitude != null) {
latitude = Float.valueOf(flickrDoc.latitude.trim());
nbLatitudeValues = 1; // latitude is the sum of one value
}
if (flickrDoc.longitude != null) {
longitude = Float.valueOf(flickrDoc.longitude.trim());
nbLongitudeValues = 1; // longitude is the sum of one value
}
}
public void update(FlickrDoc newData) {
title = title + " " + newData.title;
description = description + " " + newData.description;
tags = tags + " " + newData.tags;
if (newData.time != null) {
nbTimeValues++;
if (time == null) {
time = 0;
}
time += Integer.valueOf(newData.time.trim());
}
if (newData.latitude != null) {
nbLatitudeValues++;
if (latitude == null) {
latitude = 0F;
}
latitude += Float.valueOf(newData.latitude.trim());
}
if (newData.longitude != null) {
nbLongitudeValues++;
if (longitude == null) {
longitude = 0F;
}
longitude += Float.valueOf(newData.longitude.trim());
}
}
public String getTitle() {
return title;
}
public String getDescription() {
return description;
}
public String getTime() {
if (nbTimeValues == 0) {
return null;
} else {
return Integer.toString(time / nbTimeValues);
}
}
public String getTags() {
return tags;
}
public String getLat() {
if (nbLatitudeValues == 0) {
return null;
} else {
return Float.toString(latitude / nbLatitudeValues);
}
}
public String getLng() {
if (nbLongitudeValues == 0) {
return null;
} else {
return Float.toString(longitude / nbLongitudeValues);
}
}
public String getEvent() {
return event;
}
}
}
Could try setting the (I'm assuming your using Eclipse) -Xms and -Xmx values higher in your eclipse.ini file.
ex)
-vmargs
-Xms128m //(initial heap size)
-Xmx256m //(max heap size)
If this is a one-off thing that you just want to get done, I'd try Jason's advice of increasing the memory available to Java.
You are building a very large list of objects and then looping through that list to output a String, then writing that String to a file. The list and the String are probably the reasons for your high memory usage. You could reorganise your code in a more stream-oriented way. Open your file output at the start, then write the XML for each Centroid as they are parsed. Then you wouldn't need to keep a big list of them, and you wouldn't need to hold a big String representing all the XML.
Dump the heap and analyze it. You can configure automatic heap dump on memory error using -XX:+HeapDumpOnOutOfMemoryError
system property.
http://www.oracle.com/technetwork/java/javase/index-137495.html
https://www.infoq.com/news/2015/12/OpenJDK-9-removal-of-HPROF-jhat
http://blogs.oracle.com/alanb/entry/heap_dumps_are_back_with
Answering the question "How to Debug"
It starts with gathering the information that's missing from your post. Information that could potentially help future people having the same problem.
First, the complete stack trace. An out-of-memory exception that's thrown from within the XML parser is very different from one thrown from your code.
Second, the size of the XML file, because "not long at all" is completely useless. Is it 1K, 1M, or 1G? How many elements.
Third, how are you parsing? SAX, DOM, StAX, something completely different?
Fourth, how are you using the data. Are you processing one file or multiple files? Are you accidentally holding onto data after parsing? A code sample would help here (and a link to some 3rd-party site isn't terribly useful for future SO users).
Ok, I'll admit I'm avoiding your direct question with a possible alternative. You might want to consider parsing with XStream instead to let it deal with the bulk of the work with less code. My rough example below parses your XML with a 64MB heap. Note that it requires Apache Commons IO as well just to easily read the input just to allow the hack to turn the <collection>
into a <list>
.
import java.io.File;
import java.io.IOException;
import java.util.List;
import org.apache.commons.io.FileUtils;
import com.thoughtworks.xstream.XStream;
import com.thoughtworks.xstream.annotations.XStreamAlias;
public class CentroidGenerator {
public static void main(String[] args) throws IOException {
for (Centroid centroid : getCentroids(new File("PjrE.data.xml"))) {
System.out.println(centroid.title + " - " + centroid.description);
}
}
@SuppressWarnings("unchecked")
public static List<Centroid> getCentroids(File file) throws IOException {
String input = FileUtils.readFileToString(file, "UTF-8");
input = input.replaceAll("collection>", "list>");
XStream xstream = new XStream();
xstream.processAnnotations(Centroid.class);
Object output = xstream.fromXML(input);
return (List<Centroid>) output;
}
@XStreamAlias("doc")
@SuppressWarnings("unused")
public static class Centroid {
private String id;
private String title;
private String description;
private String time;
private String tags;
private String latitude;
private String longitude;
private String event;
private String geo;
}
}
I downloaded your code, something that I almost never do. And I can say with 99% certainty that the bug is in your code: an incorrect "if" inside a loop. It has nothing whatsoever to do with Digester or XML. Either you've made a logic error or you didn't fully think through just how many objects you'd create.
But guess what: I'm not going to tell you what your bug is.
If you can't figure it out from the few hints that I've given above, too bad. It's the same situation that you put all of the other respondents through by not providing enough information -- in the original post -- to actually start debugging.
Perhaps you should read -- actually read -- my former post, and update your question with the information it requests. Or, if you can't be bothered to do that, accept your F.
精彩评论