开发者

subsetOf versus forall contains

开发者 https://www.devze.com 2023-02-19 13:37 出处:网络
Consider I have: case class X(...) val xs: Seq[X] = ... // some method result val ys: Seq[X] = ... // some other method result

Consider I have:

case class X(...)
val xs: Seq[X] = ... // some method result
val ys: Seq[X] = ... // some other method result

While the following holds:

xs.distinct.sameElements(xs) // true
ys.distinct.sameElements(ys) // true

I am facing:

xs forall(ys contains _)    // true
xs.toSet subsetOf ys.toSet  // false

Why? I mean, it´s clear that making a Set out of a Seq chooses random elements in case of duplicates, but there are no duplicates because of "(...).distinct.sameElements(...)".

I certainly need a deeper understanding of the kind of equality check...

EDIT:

After a long search, I found the problem and conden开发者_JAVA百科sed it to the following:

My elements are not the same, however I must take a closer look why distinct.sameElements isn´t complaining. But meanwhile a new question arose:

Consider this:

val rnd = scala.util.Random
def int2Label(i: Int) = "[%4s]".format(Seq.fill(rnd.nextInt(4))(i).mkString)
val s = Seq(1,2,3,4)

// as expected :
val m1: Map[Int,String] = s.map(i => (i,int2Label(i))).toMap
println(m1) // Map(5 -> [ 555], 1 -> [    ], 2 -> [  22], 3 -> [    ])
println(m1) // Map(5 -> [ 555], 1 -> [    ], 2 -> [  22], 3 -> [    ])

// but accessing m2 several times yields different results. Why?
val m2: Map[Int,String] = s.map(i => (i,i)).toMap.mapValues { int2Label(_) }
println(m2) // Map(5 -> [   5], 1 -> [  11], 2 -> [  22], 3 -> [ 333])
println(m2) // Map(5 -> [  55], 1 -> [  11], 2 -> [    ], 3 -> [    ])

So my elements in my first to sequences aren´t the same because they depend on a m2-construct and so each time a accessing them they are different.

My new question is, why does m2 behave like a function in contrast to m1 although both are immutable maps. That isn´t intuitively for me.


The most common reasons for problems in this area--testing set equality and the like--are

  1. hashCode does not agree with equals
  2. Your values are not stable (so previous hashCode does not agree with current equals)

The reason is that this matters is that distinct and toSet use hash codes to build sets, whereas contains simply runs over the collection with an exists:

xs forall(ys contains _) == xs forall (x => ys exists (y => x==y) )

This is made more complicated by the fact that many sets don't start using hash codes until they're larger than some minimal size (usually 4), so you don't always notice this with testing. But let's prove it to ourselves:

class Liar(s: String) {
  override def equals(o: Any) = o match {
    case l: Liar => s == l.s
    case _ => _
  }
  // No hashCode override!
}
val strings = List("Many","song","lyrics","go","na","na","na","na")
val lies = strings.map(s => new Liar(s))
val truly_distinct = lies.take(5)
lies.length          // 8
lies.distinct.length // 8!
lies.toSet.size      // 8!
lies forall( truly_distinct contains _ )   // True, because it's true
lies.toSet subsetOf truly_distinct.toSet   // False--not even the same size!

Okay, so now we know that for most of these operations, matching up hashCode and equals is a Good Thing.

Warning: in Java, mismatches happens frequently even with primitives:

new java.lang.Float(1.0) == new java.lang.Integer(1)                       // True
(new java.lang.Float(1.0)).hashCode == (new java.lang.Integer(1)).hashCode // Uh-oh

but Scala now at least catches that (hopefully every time):

(new java.lang.Float(1.0)).## == (new java.lang.Integer(1)).##   // Whew

Case classes also do this properly, so we're left with three possibilities

  1. You overrode equals but not hashCode to match
  2. Your values are not stable
  3. There is a bug and Java wrapped primitive hashCode mismatch is coming back to bite you

The first one is easy enough.

The second one seems to be your problem, and it arises from the fact that mapValues actually creates a view of the original collection, not a new collection. (filterKeys does this also.) Personally, I think this is a questionable choice of design, since normally when you have a view and you want to make a single concrete instance of it, you .force it. But default maps don't have a .force because they don't realize that they might be views. So you have to resort to things like

myMap.map{ case (k,v) => (k, /* something that produces a new v */) }
myMap.mapValues(v => /* something that produces a new v */).view.force
Map() ++ myMap.mapValues(v => /* something that produces a new v */)

This is really important if you're doing things like file IO to map your values (e.g. if your values are filenames and you're mapping to their contents) and you don't want to read the file over and over again.

But your case--where you're assigning random values--is another where it is important to pick a single copy, not recreate the values over and over.

0

精彩评论

暂无评论...
验证码 换一张
取 消