In the Scala Collections framework, I think there are some behaviors that are counterintuitive when using map()
.
We can distinguish two kinds of transformations on (immutable) collections. Those whose implementation calls newBuilder
to recreate the resulting collection, and those who go though an implicit CanBuildFrom
to obtain the builder.
The first category contains all transformations where the type of the contained elements does not change. They are, for example, filter
, partition
, drop
, take
, span
, etc. These transformations are free to call newBuilder
and to recreate the same collection type as the one they are called on, no matter how specific: filtering a List[Int]
can always return a List[Int]
; filtering a BitSet
(or the RNA
example structure described in this article on the architecture of the collections framework) can always return another BitSet
(or RNA
). Let's call them the filtering transformations.
The second category of transformations need CanBuildFrom
s to be more flexible, as the type of the contained elements may change, and as a result of this, the type of the collection itself maybe cannot be reused: a BitSet
cannot contain String
s; an RNA
contains only Base
s. Examples of such transformations are map
, flatMap
, collect
, scanLeft
, ++
, etc. Let's call them the mapping transformations.
Now here's the main issue to discuss. No matter what the static type of the collection is, all filtering transformations will return the same collection type, while the collection type returned by a mapping operation can vary depending on the static type.
scala> import collection.immutable.TreeSet
import collection.immutable.TreeSet
scala> val treeset = TreeSet(1,2,3,4,5) // static type == dynamic type
treeset: scala.collection.immutable.TreeSet[Int] = TreeSet(1, 2, 3, 4, 5)
scala> val set: Set[Int] = TreeSet(1,2,3,4,5) // static type != dynamic type
set: Set[Int] = TreeSet(1, 2, 3, 4, 5)
scala> treeset.filter(_ % 2 == 0)
res0: scala.collection.immutable.TreeSet[Int] = TreeSet(2, 4) // fine, a TreeSet again
scala> set.filter(_ % 2 == 0)
res1: scala.collection.immutable.Set[Int] = TreeSet(2, 4) // fine
scala> treeset.map(_ + 1)
res2: scala.collection.immutable.SortedSet[Int] = TreeSet(2, 3, 4, 5, 6) // still fine
scala> set.map(_ + 1)
res3: scala.collection.immutable.Set[Int] = Set(4, 5, 6, 2, 3) // uh?!
Now, I understand why this works like this. It is explained there and there. In short: the implicit CanBuildFrom
is inserted based on the static type, and, depending on the implementation of its def apply(from: Coll)
method, may or may not be able to recreate the same collection type.
Now my only point is, when we know that we are using a mapping operation yielding a collection with the same element type (which the compiler can statically determine), we could mimic the way the filtering transformations work and use the collection's native builder. We can reuse BitSet
when mapping to Int
s, create a new TreeSet
with the same ordering, etc.
Then we would avoid cases where
for (i <- set) {
val x = i + 1
println(x)
}
does not print the incremented elements of the TreeSet
in the same order as
for (i <- set; x = i + 1)
println(x)
So:
- Do you think this would be a good idea to change the behavior of the mapping transformations as described?
- What are the inevitable caveats I have grossly overlooked?
- How could it be implemented?
I was thinking about something like an implicit sameTypeEvidence: A =:= B
parameter, maybe with a default value of null
(or rather an implicit canReuseCalleeBuilderEvidence: B <:< A = null
), which could be used at runtime 开发者_如何学Cto give more information to the CanBuildFrom
, which in turn could be used to determine the type of builder to return.
I looked again at it, and I think your problem doesn't arise from a particular deficiency of Scala collections, but rather a missing builder for TreeSet
. Because the following does work as intended:
val list = List(1,2,3,4,5)
val seq1: Seq[Int] = list
seq1.map( _ + 1 ) // yields List
val vector = Vector(1,2,3,4,5)
val seq2: Seq[Int] = vector
seq2.map( _ + 1 ) // yields Vector
So the reason is that TreeSet
is missing a specialised companion object/builder:
seq1.companion.newBuilder[Int] // ListBuffer
seq2.companion.newBuilder[Int] // VectorBuilder
treeset.companion.newBuilder[Int] // Set (oops!)
So my guess is, if you take proper provision for such a companion for your RNA
class, you may find that both map
and filter
work as you wish...?
精彩评论