开发者

How to groupBy while discarding some elements of a collection

开发者 https://www.devze.com 2023-03-06 06:15 出处:网络
I\'d like to group a sequence into a map of sequences, based on a discriminator of type Option, similar to the result of the groupBy method but where values resulting in None are discarded. Or perhaps

I'd like to group a sequence into a map of sequences, based on a discriminator of type Option, similar to the result of the groupBy method but where values resulting in None are discarded. Or perhaps, grouping by a PartialFunction discriminator and discarding those for which the partial function is not defined.

Here is a concrete example:

I have a collection of names, and a collection of namespaces. Some, but not all, of the names belong in a valid namespace, and I want to group those that do into a Map, discarding those that don't.

Currently my solution is equivalent to:

val names = List("ns1.foo", "ns2.bar", "ns2.baz", "froznit")
val namespaces = List("ns1", "ns2")

def findNamespace(n: String): Option[String] = namespaces.find(n.startsWith)

val groupedNames = names.groupBy(findNamespace).collect {
  case (Some(ns), name) => (ns, name)
}
// Map((ns1,List(ns1.foo)), (ns2,List(ns2.bar, ns2.baz)))

My concern about this solution is that, using names.groupBy(findNamespace), I'm creating an intermediate Map which contains all the names I don't care about, under the key None. If the number of names which I discard becomes large, this solution becomes less attractive.

My attempt to avoid this is a bit of a trainwreck, though:

val groupedNames =
  names.
    map(n => (findNamespace(n), n)).
    collect({ case (Some(ns), n) => (ns, n) }).
    groupBy(_._1).
    map({ case (ns, names) => (ns, names.map开发者_运维问答(_._2)) })

If you were to solve this in a smarter way, what would it be?


Edit: ideally, the solution should only call findNamespace(name) once for each name, and build the Map using only the Option[String] values, without calling a separate hasNamespace(name) predicate.


One way to avoid collecting the discarded names is to use flatMap:

names.flatMap(n => findNamespace(n) map (ns => (ns, n)))
   .groupBy(_._1) 
   .map { case (ns, pairs) => (ns, pairs map (_._2)) }

You can achieve the same thing with a for-comprehension:

(for (n <- names; ns <- findNamespace(n)) yield (ns, n))
   .groupBy(_._1)
   .map { case (ns, pairs) => (ns, pairs map (_._2)) }


You can use a foldLeft:

val gn = names.foldLeft(Map[String, List[String]]()){ case (acc, name) =>
  findNamespace(name) match { 
    case Some(ns) => acc + (ns -> (name :: acc.get(ns).getOrElse(Nil)))
    case _ => acc
  }
}

Assuming order does not matter otherwise you can reverse the values with gn.mapValues(_.reverse).


I'm not sure how efficient toMap is, but putting the option in a for-comprehension at least avoids the gathering of None results:

scala> val m = (for { n <- names; ns <- findNamespace(n) } yield n -> ns).toMap
m: scala.collection.immutable.Map[java.lang.String,String] = Map(ns1.foo -> ns1, ns2.bar -> ns2, ns2.baz -> ns2)

scala> val groupedNames = m.keys.groupBy(m)
groupedNames: scala.collection.immutable.Map[String,Iterable[java.lang.String]] = Map(ns1 -> Set(ns1.foo), ns2 -> Set(ns2.bar, ns2.baz))


I came up with a variation on huynhjl's answer, replacing the match with a map:

val gn = (Map[String, List[String]]() /: names) { (acc, name) =>
  acc ++ findNamespace(name).map(ns => ns -> (name :: acc.getOrElse(ns, Nil)))
}


I would suggest "filter first, then groupBy" method, like this:

scala> val names = List("ns1.foo", "ns2.bar", "ns2.baz", "froznit")
names: List[java.lang.String] = List(ns1.foo, ns2.bar, ns2.baz, froznit)

scala> val namespaces = List("ns1", "ns2")
namespaces: List[java.lang.String] = List(ns1, ns2)

scala> names filter { n => namespaces exists { n startsWith _ } } groupBy { _ take 3 }
res1: scala.collection.immutable.Map[String,List[java.lang.String]] = Map(ns1 -> List(ns1.foo), ns2 -> List(ns2.bar, ns2.baz))
0

精彩评论

暂无评论...
验证码 换一张
取 消