开发者

Groovy: Handling large amounts of data with StreamingMarkupBuilder

开发者 https://www.devze.com 2023-01-20 18:46 出处:网络
The scenario is the following. I have a plain text file which contains 2,000,000 lines开发者_开发问答 with an ID. This list of IDs needs to be converted to a simple XML file. The following code works

The scenario is the following. I have a plain text file which contains 2,000,000 lines开发者_开发问答 with an ID. This list of IDs needs to be converted to a simple XML file. The following code works fine as long as there are only some thousand entries in the input file.

def xmlBuilder = new StreamingMarkupBuilder()
def f = new File(inputFile)
def input = f.readLines()
def xmlDoc = {
  Documents {
    input.each {
      Document(myAttribute: it)
    }
  }
}

def xml = xmlBuilder.bind(xmlDoc)
f.write(xml)

If the 2,000,000 entries are processed, I'm getting an OutOfMemoryException for the Java heap (set to 1024M). Is there a way to improve the above code so that it's able to handle large amounts of data?

Cheers, Robert


The issue with that solution is that it is loading everything into memory before writing it out...

This might be a better solution, as I believe it should be writing the data out to the file output.xml as it processes input.txt.

import groovy.xml.MarkupBuilder

new File( 'output.xml' ).withWriter { writer ->
  def builder = new MarkupBuilder( writer )
  builder.Documents {
    new File( 'input.txt' ).eachLine { line ->
      Document( attr: line )
    }
  }
}


here's your problem: def input = f.readLines() ;-)

0

精彩评论

暂无评论...
验证码 换一张
取 消