How can I group nested collections based on column values, which are dynamically given? For example, suppose we have the following nested collections; how can I group it by the values in first and second columns?
[ ["A" 2011 "Dan"]
["A" 2011 "Jon"]
["A" 2010 "Tim"]
["B" 2009 "Tom"] ]
The desired resulting map is:
{ A {
2011 [['A', 2011, 'Dan'] ['A'开发者_JS百科, 2011, 'Joe']]
2010 [['A', 2010, 'Tim']]
}
B { 2009 [['B', 2009, 'Tom']] }
}
Following is my solution, which almost works:
(defn nest [data criteria]
(if (empty? criteria)
data
(for [[k v] (group-by #(nth % (-> criteria vals first)) data)]
(hash-map k (nest v (rest criteria))))))
I came up with the following:
user=> (def a [["A" 2011 "Dan"]
["A" 2011 "Jon"]
["A" 2010 "Tim"]
["B" 2009 "Tom"] ])
user=> (into {} (for [[k v] (group-by first a)]
[k (group-by second v)]))
{"A" {2011 [["A" 2011 "Dan"]
["A" 2011 "Jon"]],
2010 [["A" 2010 "Tim"]]},
"B" {2009 [["B" 2009 "Tom"]]}}
Generalization of group-by
I needed a generalization of group-by
that’d produce more than 2-nested map-of-maps. I wanted to be able to give such a function a list of arbitrary functions to run recursively through group-by
. Here’s what I came up with:
(defn map-function-on-map-vals
"Take a map and apply a function on its values. From [1].
[1] http://stackoverflow.com/a/1677069/500207"
[m f]
(zipmap (keys m) (map f (vals m))))
(defn nested-group-by
"Like group-by but instead of a single function, this is given a list or vec
of functions to apply recursively via group-by. An optional `final` argument
(defaults to identity) may be given to run on the vector result of the final
group-by."
[fs coll & [final-fn]]
(if (empty? fs)
((or final-fn identity) coll)
(map-function-on-map-vals (group-by (first fs) coll)
#(nested-group-by (rest fs) % final-fn))))
Your example
Applied to your dataset:
cljs.user=> (def foo [ ["A" 2011 "Dan"]
#_=> ["A" 2011 "Jon"]
#_=> ["A" 2010 "Tim"]
#_=> ["B" 2009 "Tom"] ])
cljs.user=> (require '[cljs.pprint :refer [pprint]])
nil
cljs.user=> (pprint (nested-group-by [first second] foo))
{"A"
{2011 [["A" 2011 "Dan"] ["A" 2011 "Jon"]], 2010 [["A" 2010 "Tim"]]},
"B" {2009 [["B" 2009 "Tom"]]}}
Produces exactly the desired output. nested-group-by
could take three or four or more functions and produces that many nestings of hash-maps. Perhaps this will be helpful to others.
Handy feature
nested-group-by
also has a handy extra feature: final-fn
, which defaults to identity
so if you don’t provide one, the deepest nesting returns a vector of values, but if you do provide a final-fn
, that is run on the innermost vectors. To illustrate: if you just wanted to know how many rows of the original dataset appeared in each category and year:
cljs.user=> (nested-group-by [first second] foo count)
#^^^^^ this is final-fn
{"A" {2011 2, 2010 1}, "B" {2009 1}}
Caveat
This function doesn’t use recur
so deeply-recursive calls could blow the stack. However, for the anticipated use-case, with only a small handful of functions, this shouldn’t be a problem.
Here's the solution I came up with. It works, but I'm sure it can be improved.
(defn nest [data criteria]
(if (empty? criteria)
data
(into {} (for [[k v] (group-by #(nth % (-> criteria vals first)) data)]
(hash-map k (nest v (rest criteria)))))))
I suspect the most idiomatic version of this is:
(defn nest-by
[ks coll]
(let [keyfn (apply juxt ks)]
(reduce (fn [m x] (update-in m (keyfn x) (fnil conj []) x)) {} coll)))
This takes advantage of the fact that update-in
already does most of what you want. In your particular case you'd then just go:
(nest-by [first second] [["A" 2011 "Dan"]
["A" 2011 "Jon"]
["A" 2010 "Tim"]
["B" 2009 "Tom"] ])
{"A" {2011 [["A" 2011 "Dan"] ["A" 2011 "Jon"]], 2010 [["A" 2010 "Tim"]]}, "B" {2009 [["B" 2009 "Tom"]]}}
This gets you pretty close
(defn my-group [coll]
(let [m (group-by
#(-> % val first first)
(group-by #(second %) coll))]
(into {} (for [[k v] m] [k (#(into {} %) v)]))))
(my-group [["A" 2011 "Dan"] ["A" 2011 "Jon"] ["A" 2010 "Tim"] ["B" 2009 "Tom"]])
{"A" {
2011 [["A" 2011 "Dan"] ["A" 2011 "Jon"]],
2010 [["A" 2010 "Tim"]]
},
"B" {2009 [["B" 2009 "Tom"]]}
}
As usual with Clojure, you can probably find something that is a little less verbose.
精彩评论