开发者

Scala: read and save all elements of an Iterable

开发者 https://www.devze.com 2023-03-22 15:22 出处:网络
I have an Iterable[T] that is really a stream of unknown length, and want to read it all and save it into something that is still an instance of Iterable. I really do have to read it and save it; I ca

I have an Iterable[T] that is really a stream of unknown length, and want to read it all and save it into something that is still an instance of Iterable. I really do have to read it and save it; I can't do it in a lazy way. The original Iterable can have a few thousand elements, at least. What's the most efficient/best/canonical way? Should I use an ArrayBuffer, a List, a Vector?

Suppose xs is my Iterable. I can think of doing these possibilities:

xs.toArray.toIterable     // Ugh?
xs.toList                 // Fast?
xs.copyToBuffer(anArrayBuffer)
Vector(xs: _*)            // There's no toVector, sadly. Is this construct as efficient?

EDIT: I see by the questions I should be more specific. Here's a strawman example:

def f(xs: Iterable[SomeType]) {    // xs might a stream, though I can't be sure
    val allOfXS = <xs all read in at once>
    g(allOfXS)
    h(allOfXS)    // Both g开发者_如何学JAVA() and h() take an Iterable[SomeType]
}


This is easy. A few thousand elements is nothing, so it hardly matters unless it's a really tight loop. So the flippant answer is: use whatever you feel is most elegant.

But, okay, let's suppose that this is actually in some tight loop, and you can predict or have benchmarked your code enough to know that this is performance-limiting.

Your best performance for an immutable solution will likely be a Vector, used like so:

Vector() ++ xs

In my hands, this can copy a 10k iterable about 4k-5k times per second. List is about half the speed.

If you're willing to try a mutable solution under the hood, xs.toArray.toIterable usually takes the cake with about 10k copies per second. ArrayBuffer is about the same speed as List.

If you actually know the size of the target (i.e. size is O(1) or you know it from somewhere else), you can shave off another 20-30% of the execution speed by allocating just the right size and writing a while loop.

If it's actually primitives, you can gain a factor of 10 by writing your own specialized Iterable-like-thing that acts on arrays and converts to regular collections via the underlying array.

Bottom line: for a great blend of power, speed, and flexibility, use Vector() ++ xs in most situations. xs.toIndexedSeq defaults to the same thing, with the benefit that if it's already a Vector that it will take no time at all (and chains nicely without using parens), and the drawback that you are relying upon a convention, not a specification for behavior (and it takes 1-3 more characters to type).


How about Stream.force?

Forces evaluation of the whole stream and returns it.


This is hard. An Iterable's methods are defined in terms of its iterator, but that gets overridden by subtraits. For instance, IndexedSeq methods are usually defined in terms of apply.

There is the question of why do you want to copy the Iterable, but I suppose you might be guarding against the possibility of it being mutable. If you do not want to copy it, then you need to rephrase your question.

If you are going to copy it, and you want to be sure all elements are copied in a strict manner, you could use .toList. That will not copy a List, but a List does not need to be copied. For anything else, it will produce a new copy.

0

精彩评论

暂无评论...
验证码 换一张
取 消