开发者

Could this Scala code use less memory?

开发者 https://www.devze.com 2023-03-29 06:49 出处:网络
Consider the following Set benchmark: import scala.collection.immutable._ object SetTest extends App { def time[a](f: => a): (a,Double) = {

Consider the following Set benchmark:

import scala.collection.immutable._

object SetTest extends App {
  def time[a](f: => a): (a,Double) = {
    val start = System.nanoTime()
    val result: a = f
    val end = System.nanoTime()
    (result, 1e-9*(end-start))
  }

  for (n <- List(1000000,10000000)) {
    println("n = %d".format(n))
    val (s2,t2) = time((Set() ++ (1 to n)).sum)
    prin开发者_如何学编程tln("sum %d, time %g".format(s2,t2))
  }
}

Compiling and running produces

tile:scalafab% scala SetTest
n = 1000000
sum 1784293664, time 0.982045
n = 10000000
Exception in thread "Poller SunPKCS11-Darwin" java.lang.OutOfMemoryError: Java heap space
...

I.e., Scala is unable to represent a set of 10 million Ints on a machine with 8 GB of memory. Is this expected behavior? Is there some way to reduce the memory footprint?


Generic immutable sets do take a lot of memory. The default is for only 256M heap, which leaves only 26 bytes per object. The hash trie for immutable sets generally takes one to two hundred bytes per object an extra 60 or so bytes per element. If you add -J-Xmx2G on your command line to increase the heap space to 2G, you should be fine.

(This level of overhead is one reason why there are bitsets, for example.)


I'm not that familiar with Scala, but here's what I think is happening:

First off, the integers are being stored on the heap (as the must be, since the data structure is stored on the heap). So we are talking about available heap memory, not stack memory at all (just to clarify the validity of what I'm about to say next).

The real kicker is that Java's default heap size is pretty small - I believe its only 128 megabytes (this is probably an really old number, but the point is that the number exists, and it's quite small).

So it's not that your program uses too much memory - it's more like Java just doesn't give you enough in the first place. There is a solution, though: the minimum and maximum heap sizes can be set with the -Xms and -Xmx command line options. They can be used like:

java -Xms32m -Xmx128m MyClass   (starts MyClass with a minimum heap of 32 megabytes, maximum of 128 megabytes)

java -Xms1g -Xmx3g MyClass (executes MyClass with a minimum heap of 1 gigabytes, maximum of 3 gigabytes)

If you use an IDE, there are probably options in there to change the heap size as well.


This should always overflow. Holding such large values is not required in this case. If you want to sum use an iterator or a range.

val (s2,t2) = time( (1 to n).sum)

The above line completes in a second with no overflow.

You can always increase memory allocation using other answers.

0

精彩评论

暂无评论...
验证码 换一张
取 消