I am trying to work more with scalas immutable collection since this is easy to parallelize, but i struggle with some newbie problems. I am looking for a way to create (efficiently) a new Vector from an operation. To be precise I want something like
val v : Vector[Double] = RandomVector(10000)
val w : Vector[Double] = RandomVector(10000)
val r = v + w
I tested the following:
// 1)
val r : Vector[Double] = (v.zip(w)).map{ t:(Double,Double) => t._1 + t._2 }
// 2)
val vb = new VectorBuilder[Double]()
var i=0
while(i<v.len开发者_如何学Gogth){
vb += v(i) + w(i)
i = i + 1
}
val r = vb.result
}
Both take really long compared to the work with Array:
[Vector Zip/Map ] Elapsed time 0.409 msecs
[Vector While Loop] Elapsed time 0.374 msecs
[Array While Loop ] Elapsed time 0.056 msecs
// with warm-up (10000) and avg. over 10000 runs
Is there a better way to do it? I think the work with zip/map/reduce has the advantage that it can run in parallel as soon as the collections have support for this.
Thanks
Vector
is not specialized for Double
, so you're going to pay a sizable performance penalty for using it. If you are doing a simple operation, you're probably better off using an array on a single core than a Vector
or other generic collection on the entire machine (unless you have 12+ cores). If you still need parallelization, there are other mechanisms you can use, such as using scala.actors.Futures.future
to create instances that each do the work on part of the range:
val a = Array(1,2,3,4,5,6,7,8)
(0 to 4).map(_ * (a.length/4)).sliding(2).map(i => scala.actors.Futures.future {
var s = 0
var j = i(0)
while (j < i(1)) {
s += a(j)
j += 1
}
s
}).map(_()).sum // _() applies the future--blocks until it's done
Of course, you'd need to use this on a much longer array (and on a machine with four cores) for the parallelization to improve things.
You should use lazily built collections when you use more than one higher-order methods:
v1.view zip v2 map { case (a,b) => a+b }
If you don't use a view or an iterator each method will create a new immutable collection even when they are not needed.
Probably immutable code won't be as fast as mutable but the lazy collection will improve execution time of your code a lot.
Arrays are not type-erased, Vectors are. Basically, JVM gives Array
an advantage over other collections when handling primitives that cannot be overcome. Scala's specialization
might decrease that advantage, but, given their cost in code size, they can't be used everywhere.
精彩评论