开发者

What is the fastest way to sum a collection in Scala

开发者 https://www.devze.com 2023-01-04 23:41 出处:网络
I\'ve tried different colle开发者_Go百科ctions in Scala to sum it\'s elements and they are much slower than Java sums it\'s arrays (with for cycle). Is there a way for Scala to be as fast as Java arra

I've tried different colle开发者_Go百科ctions in Scala to sum it's elements and they are much slower than Java sums it's arrays (with for cycle). Is there a way for Scala to be as fast as Java arrays?

I've heard that in scala 2.8 arrays will be same as in java, but they are much slower in practice


Indexing into arrays in a while loop is as fast in Scala as in Java. (Scala's "for" loop is not the low-level construct that Java's is, so that won't work the way you want.)

Thus if in Java you see

for (int i=0 ; i < array.length ; i++) sum += array(i)

in Scala you should write

var i=0
while (i < array.length) {
  sum += array(i)
  i += 1
}

and if you do your benchmarks appropriately, you'll find no difference in speed.

If you have iterators anyway, then Scala is as fast as Java in most things. For example, if you have an ArrayList of doubles and in Java you add them using

for (double d : arraylist) { sum += d }

then in Scala you'll be approximately as fast--if using an equivalent data structure like ArrayBuffer--with

arraybuffer.foreach( sum += _ )

and not too far off the mark with either of

sum = (0 /: arraybuffer)(_ + _)
sum = arraybuffer.sum  // 2.8 only

Keep in mind, though, that there's a penalty to mixing high-level and low-level constructs. For example, if you decide to start with an array but then use "foreach" on it instead of indexing into it, Scala has to wrap it in a collection (ArrayOps in 2.8) to get it to work, and often will have to box the primitives as well.

Anyway, for benchmark testing, these two functions are your friends:

def time[F](f: => F) = {
  val t0 = System.nanoTime
  val ans = f
  printf("Elapsed: %.3f\n",1e-9*(System.nanoTime-t0))
  ans
}

def lots[F](n: Int, f: => F): F = if (n <= 1) f else { f; lots(n-1,f) }

For example:

val a = Array.tabulate(1000000)(_.toDouble)
val ab = new collection.mutable.ArrayBuffer[Double] ++ a
def adSum(ad: Array[Double]) = {
  var sum = 0.0
  var i = 0
  while (i<ad.length) { sum += ad(i); i += 1 }
  sum
}

// Mixed array + high-level; convenient, not so fast
scala> lots(3, time( lots(100,(0.0 /: a)(_ + _)) ) )
Elapsed: 2.434
Elapsed: 2.085
Elapsed: 2.081
res4: Double = 4.999995E11

// High-level container and operations, somewhat better
scala> lots(3, time( lots(100,(0.0 /: ab)(_ + _)) ) )    
Elapsed: 1.694
Elapsed: 1.679
Elapsed: 1.635
res5: Double = 4.999995E11

// High-level collection with simpler operation
scala> lots(3, time( lots(100,{var s=0.0;ab.foreach(s += _);s}) ) )
Elapsed: 1.171
Elapsed: 1.166
Elapsed: 1.162
res7: Double = 4.999995E11

// All low level operations with primitives, no boxing, fast!
scala> lots(3, time( lots(100,adSum(a)) ) )              
Elapsed: 0.185
Elapsed: 0.183
Elapsed: 0.186
res6: Double = 4.999995E11


You can now simply use sum.

val values = Array.fill[Double](numValues)(0)

val sumOfValues = values.sum


The proper scala or functional was to do this would be:

val numbers = Array(1, 2, 3, 4, 5)
val sum = numbers.reduceLeft[Int](_+_)

Check out this link for the full explanation of the syntax: http://www.codecommit.com/blog/scala/quick-explanation-of-scalas-syntax

I doubt this would be faster than doing it in the ways described in the other answers but I haven't tested it so I'm not sure. In my opinion this is the proper way to do it though since Scala is a functional language.


It is very difficult to explain why some code you haven't shown performs worse than some other code you haven't shown in some benchmark you haven't shown.

You may be interested in this question and its accepted answer, for one thing. But benchmarking JVM code is hard, because the JIT will optimize code in ways that are difficult to predict (which is why JIT beats traditional optimization at compile time).


Scala 2.8 Array are JVM / Java arrays and as such have identical performance characteristics. But that means they cannot directly have extra methods that unify them with the rest of the Scala collections. To provide the illusion that arrays have these methods, there are implicit conversions to wrapper classes that add those capabilities. If you are not careful you'll incur inordinate overhead using those features.

In those cases where iteration overhead is critical, you can explicitly get an iterator (or maintain an integer index, for indexed sequential structures like Array or other IndexedSeq) and use a while loop, which is a language-level construct that need not operate on functions (literals or otherwise) but can compile in-line code blocks.

val l1 = List(...) // or any Iteralbe
val i1 = l1.iterator
while (i1.hasNext) {
  val e = i1.next
  // Do stuff with e
}

Such code will execute essentially as fast as a Java counterpart.


Timing is not the only concern. With sum you might find an overflow issue:

scala> Array(2147483647,2147483647).sum
res0: Int = -2

in this case seeding foldLeft with a Long is preferable

scala> Array(2147483647,2147483647).foldLeft(0L)(_+_)
res1: Long = 4294967294

EDIT: Long can be used from beginning:

scala> Array(2147483647L,2147483647L).sum
res1: Long = 4294967294
0

精彩评论

暂无评论...
验证码 换一张
取 消