I have several hash maps I need to generate combinations of:
A: [x->1, y->2,...]
B: [x->1, a->4,...]
C: [x->1, b->5,...]
...
some possible combinations:
A+B; A; A+C; A+B+C...
For each combination I need to produce the joint hashmap and perform an operation of key-value pairs with the same key in both hash maps.
All I could come up with was using a binary counter and mapping the digits to the respective hash map:
001 -> A
101 -> A,C
...
Although this solution works, the modulo operations are time consuming w开发者_Python百科hen I have more than 100 hashmaps. I'm new to Scala but I believe that there must be a better way to achieve this?
Scala sequences have a combinations
function. This gives you combinations for choosing a certain number from the total. From you question it looks like you want to choose all different numbers, so your code in theory could be something like:
val elements = List('a, 'b, 'c, 'd)
(1 to elements.size).flatMap(elements.combinations).toList
/* List[List[Symbol]] = List(List('a), List('b), List('c), List('d), List('a, 'b),
List('a, 'c), List('a, 'd), List('b, 'c), List('b, 'd), List('c, 'd),
List('a, 'b, 'c), List('a, 'b, 'd), List('a, 'c, 'd), List('b, 'c, 'd),
List('a, 'b, 'c, 'd)) */
But as pointed out, all combinations will be too many. With 100 elements, choosing 2 from 100 will give you 4950 combinations, 3 will give you 161700, 4 will give you 3921225, and 5 will likely give you an overflow error. So if you just keep the argument for combinations
to 2 or 3 you should be OK.
Well, think of how many combinations there are of your maps: suppose you have N maps.
(the maps individually) + (pairs of maps) + (triples of maps) + ... + (all the maps)
Which is of course
(N choose 1) + (N choose 2) + ... + (N choose N-1)
Where N choose M
is defined as:
N! / (M! * (N-M)!)
For N=100
and M=50
, N choose M
is over 100,000,000,000,000,000,000,000,000,000 so "time consuming" really doesn't do justice to the problem!
Oh, and that assumes that ordering is irrelevant - that is that A + B
is equal to B + A
. If that assumption is wrong, you are faced with significantly more permutations than there are particles in the visible universe
Why scala might help with this problem: its parallel collections framework!
Following up on your idea to use integers to represent bitsets. Are you using the actual modulo operator? You can also use bitmasks to check whether some number is in a bitset. (Note that on the JVM they are both one instruction operations, so who knows what's happening there.)
Another potential major improvement is that, since your operation on the range of the maps is associative, you can save computations by reusing the previous ones. For example, if you combine A,B,C
but have already combined, say, A,C
into AC
, for instance, you can just combine B
with AC
.
The following code implements both ideas:
type MapT = Map[String,Int] // for conciseness later
@scala.annotation.tailrec
def pow2(i : Int, acc : Int = 1) : Int = {
// for reasonably sized ints...
if(i <= 0) acc else pow2(i - 1, 2 * acc)
}
// initial set of maps
val maps = List(
Map("x" -> 1, "y" -> 2),
Map("x" -> 1, "a" -> 4),
Map("x" -> 1, "b" -> 5)
)
val num = maps.size
// any 'op' that's commutative will do
def combine(m1 : MapT, m2 : MapT)(op : (Int,Int)=>Int) : MapT =
((m1.keySet intersect m2.keySet).map(k => (k -> op(m1(k), m2(k))))).toMap
val numCombs = pow2(num)
// precomputes all required powers of two
val masks : Array[Int] = (0 until num).map(pow2(_)).toArray
// this array will be filled, à la Dynamic Algorithm
val results : Array[MapT] = Array.fill(numCombs)(Map.empty)
// fill in the results for "combinations" of one map
for((m,i) <- maps.zipWithIndex) { results(masks(i)) = m }
val zeroUntilNum = (0 until num).toList
for(n <- 2 to num; (x :: xs) <- zeroUntilNum.combinations(n)) {
// The trick here is that we already know the result of combining the maps
// indexed by xs, we just need to compute the corresponding bitmask and get
// the result from the array later.
val known = xs.foldLeft(0)((a,i) => a | masks(i))
val xm = masks(x)
results(known | xm) = combine(results(known), results(xm))(_ + _)
}
If you print the resulting array, you get:
0 -> Map()
1 -> Map(x -> 1, y -> 2)
2 -> Map(x -> 1, a -> 4)
3 -> Map(x -> 2)
4 -> Map(x -> 1, b -> 5)
5 -> Map(x -> 2)
6 -> Map(x -> 2)
7 -> Map(x -> 3)
Of course, like everyone else pointed out, it will blow up eventually as the number of input maps increases.
精彩评论