I need to find all sub-arrays which share any mutual element and merge them into one sub-array. (Implementing in Python but any algorithmic idea would be helpful)
Multidimensional array structure:
categories = {'car':['automobile','auto'],
'bike':['vehicle','motorcycle','motorbike','automobile'],
'software':['computer','macbook','apple','microsoft','mozilla'],
'firefox':['internet','mozilla','browser']
'bicycle':['vehicle']}
I'd like to have 'car', 'bike' and 'bicycle' merged into one list (keep first 开发者_开发百科list's key new list's key could be any of the relevant keys) and 'software' and 'firefox' merged into one list as well.
Performance is crucial.
Best solution I could come with so far is to maintain a flatten one-dimension array of element => list_key (e.g 'automobile' => 'car') and then run the following recursive function for each list in the multidimensional array (pseudocode):
function merge_similar(list_key):
For each element in categories[list_key]:
If flatten_array.has_key(element):
list_to_merge = flatten_array[element]
merge_similar(list_to_merge) /* merge other lists which share an element with our newly found similar list */
categories[list_key] = merge(categories [list_key], categories[list_to_merge])
delete categories[list_to_merge]
Any idea how to improve it's performance?
Thanks!
Note that there is no "first key" -- dicts don't keep order, so if you need some order preserved you'll need to start from some different, alternative data structure.
Apart from order-related issues, I'd start with something like:
def merged(dictoflists):
result = dict()
reversed = dict()
for k, l in dictoflists.iteritems():
intersecting = set(reversed.get(w) for w in l) - set([None])
if intersecting:
pickone = intersecting.pop()
into = result[pickone]
else:
pickone = k
into = result[k] = set()
for ok in intersecting:
into.update(result.pop(ok))
into.update(l)
for w in into:
reversed[w] = pickone
return dict((k, sorted(l)) for k, l in result.iteritems())
If order is important to you, the uses of set
will be problematic and you'll need more complicated (and slower) data structures -- however, if that's the case, you should first specify in complete detail exactly what ordering constraints you need to respect in the various possible cases that can occur.
I can't imagine that a recursive solution would be speedy.
Is using list.extend() too slow?
You could do something like this:
categories['car'].extend(categories['bike']);
categories['car'].extend(categories['bicycle']);
Or to be more general, if you pass in a list of keys you want to merge:
first_key=None;
for key in keys_whose_lists_I_want_to_merge:
if first_key is None:
first_key=key;
else:
categories[first_key].extend(categories[key]);
If you're merging a ton of lists, you can optimize that loop to not perform the None check after the first time. See the tip entitled 'Re-map Functions at runtime' on the Python Performance Tips page.
>>> categories = {'car':['automobile','auto'],
'bike':['vehicle','motorcycle','motorbike','automobile'],
'software':['computer','macbook','apple','microsoft','mozilla'],
'firefox':['internet','mozilla','browser'],
'bicycle':['vehicle']}
>>> # Use sets for values
>>> for k,v in categories.items(): categories[k] = set(v)
>>> # Acumulate
>>> for k1, v1 in categories.items():
if v1:
for k2,v2 in categories.items():
if v2 and k1 != k2 and v1 & v2:
v1 |= v2
categories[k2] = None
categories[k1] = v1
>>> # Print
>>> for k1, v1 in categories.items():
if v1: print('%s: %r' %(k1,v1))
bicycle: {'motorbike', 'vehicle', 'auto', 'automobile', 'motorcycle'}
firefox: {'apple', 'mozilla', 'macbook', 'computer', 'internet', 'microsoft', 'browser'}
>>>
精彩评论