开发者

Why stream fold operation throws Out of memory exception?

开发者 https://www.devze.com 2023-04-03 04:49 出处:网络
I have following simple code def fib(i:Long,j:Long):Stream[Long] = i #:: fib(j, i+j) (0l /: fib(1,1).take(10000000)) (_+_)

I have following simple code

 def fib(i:Long,j:Long):Stream[Long] = i #:: fib(j, i+j)
 (0l /: fib(1,1).take(10000000)) (_+_)

And it throws 开发者_如何学JAVAOutOfMemmoryError exception. I can not understand why, because I think all the parts use constant memmory i.e. lazy evaluation streams and foldLeft...

Those code also don't work

fib(1,1).take(10000000).sum or max, min e.t.c.

How to correctly implement infinite streams and do iterative operations upon it?

Scala version: 2.9.0

Also scala javadoc said, that foldLeft operation is memmory safe for streams

  /** Stream specialization of foldLeft which allows GC to collect
   *  along the way.
   */
  @tailrec
  override final def foldLeft[B](z: B)(op: (B, A) => B): B = {
    if (this.isEmpty) z
    else tail.foldLeft(op(z, head))(op)
  }

EDIT:

Implementation with iterators still not useful, since it throws ${domainName} exception

   def fib(i:Long,j:Long): Iterator[Long] = Iterator(i) ++  fib(j, i + j)

How to define correctly infinite stream/iterator in Scala?

EDIT2: I don't care about int overflow, I just want to understand how to create infinite stream/iterator etc in scala without side effects .


The reason to use Stream instead of Iterator is so that you don't have to calculate all the small terms in the series over again. But this means that you need to store ten million stream nodes. These are pretty large, unfortunately, so that could be enough to overflow the default memory. The only realistic way to overcome this is to start with more memory (e.g. scala -J-Xmx2G). (Also, note that you're going to overflow Long by an enormous margin; the Fibonacci series increases pretty quickly.)

P.S. The iterator implementation I have in mind is completely different; you don't build it out of concatenated singleton Iterators:

def fib(i: Long, j: Long) = Iterator.iterate((i,j)){ case (a,b) => (b,a+b) }.map(_._1)

Now when you fold, past results can be discarded.


The OutOfMemoryError happens indenpendently from the fact that you use Stream. As Rex Kerr mentioned above, Stream -- unlike Iterator -- stores everything in memory. The difference with List is that the elements of Stream are calculated lazily, but once you reach 10000000, there will be 10000000 elements, just like List.

Try with new Array[Int](10000000), you will have the same problem.

To calculate the fibonacci number as above you may want to use different approach. You can take into account the fact that you only need to have two numbers, instead of the whole fibonacci numbers discovered so far.

For example:

scala> def fib(i:Long,j:Long): Iterator[Long] = Iterator(i) ++  fib(j, i + j)
fib: (i: Long,j: Long)Iterator[Long]

And to get, for example, the index of the first fibonacci number exceeding 1000000:

scala> fib(1, 1).indexWhere(_ > 1000000)
res12: Int = 30

Edit: I added the following lines to cope with the StackOverflow

If you really want to work with 1 millionth fibonacci number, the iterator definition above will not work either for StackOverflowError. The following is the best I have in mind at the moment:

  class FibIterator extends Iterator[BigDecimal] {
       var i: BigDecimal = 1
       var j: BigDecimal = 1
       def next = {val temp = i 
                   i = i + j
                   j = temp  
                   j }
       def hasNext = true
    }
scala> new FibIterator().take(1000000).foldLeft(0:BigDecimal)(_ + _)
res49: BigDecimal = 82742358764415552005488531917024390424162251704439978804028473661823057748584031
0652444660067860068576582339667553466723534958196114093963106431270812950808725232290398073106383520
9370070837993419439389400053162345760603732435980206131237515815087375786729469542122086546698588361
1918333940290120089979292470743729680266332315132001038214604422938050077278662240891771323175496710
6543809955073045938575199742538064756142664237279428808177636434609546136862690895665103636058513818
5599492335097606599062280930533577747023889877591518250849190138449610994983754112730003192861138966
1418736269315695488126272680440194742866966916767696600932919528743675517065891097024715258730309025
7920682881137637647091134870921415447854373518256370737719553266719856028732647721347048627996967...


@yura's problem:

def fib(i:Long,j:Long):Stream[Long] = i #:: fib(j, i+j)
(0l /: fib(1,1).take(10000000)) (_+_)

besides using a Long which can't possibly hold the Fibonacci of 10,000,000, it does work. That is, if the foldLeft is written as:

fib(1,1).take(10000000).foldLeft(0L)(_+_)

Looking at the Streams.scala source, foldLeft() is clearly designed for Garbage Collection, but /: is not def'd.

The other answers alluded to another problem. The Fibonacci of 10 million is a big number and if BigInt is used, instead of just overflowing like with a Long, absolutely enormous numbers are being added to each over and over again.

Since Stream.foldLeft is optimized for GC it does look like the way to solve for really big Fibonacci numbers, rather than using a zip or tail recursion.

// Fibonacci using BigInt
def fib(i:BigInt,j:BigInt):Stream[BigInt] = i #:: fib(j, i+j)
fib(1,0).take(10000000).foldLeft(BigInt("0"))(_+_)

Results of the above code: 10,000,000 is a 8-figure number. How many figures in fib(10000000)? 2,089,877


fib(1,1).take(10000000) is the "this" of the method /:, it is likely that the JVM will consider the reference alive as long as the method runs, even if in this case, it might get rid of it. So you keep a reference on the head of the stream all along, hence on the whole stream as you build it to 10M elements.


You could just use recursion, which is about as simple:

def fibSum(terms: Int, i: Long = 1, j: Long = 1, total: Long = 2): Long = {
  if (terms == 2) total
  else fibSum(terms - 1, j, i + j, total + i + j)
}

With this, you can "fold" a billion elements in only a couple of seconds, but as Rex points out, summing the Fibbonaci sequence overflows Long very quickly.

If you really wanted to know the answer to your original problem and don't mind sacrificing some accuracy you could do this:

def fibSum(terms: Int, i: Double = 1, j: Double = 1, tot: Double = 2,
                                                     exp: Int = 0): String = {
  if (terms == 2) "%.6f".format(tot) + " E+" + exp
  else {
    val (i1, j1, tot1, exp1) =
      if (tot + i + j > 10) (i/10, j/10, tot/10, exp + 1) 
      else                  (i, j, tot, exp)
    fibSum(terms - 1, j1, i1 + j1, tot1 + i1 + j1, exp1)
  }
}

scala> fibSum(10000000)
res54: String = 2.957945 E+2089876
0

精彩评论

暂无评论...
验证码 换一张
取 消