开发者

Finding intersection between two collections in MongoDb

开发者 https://www.devze.com 2023-04-04 02:16 出处:网络
I have two very large(30000+ documents) c开发者_如何学运维ollections, one contains words extracted from a text file(collection name \'word\') and one contains words from a dictionary(collection name \

I have two very large(30000+ documents) c开发者_如何学运维ollections, one contains words extracted from a text file(collection name 'word') and one contains words from a dictionary(collection name 'dictionary').

How can I get the words that exist in both collections?

(I've simplified the situation, documents inside the 'word' collection contain metadata about the words, so each word has to be a separate document.)


Copy both collections into a single collection (include a discriminator field if necessary so you can tell what kind of document you have in each instance).

Run map-reduce on that collection

In Map, emit the word as the key and a value, say {instance:1, dict:0} or {instance:0, dict:1} depending on whether the document being mapped is an instance or a dictionary entry. (You could add more fields here into the values as necessary.)

In Reduce, accumulate the scores (as usual).

Now do a query looking for instance > 0 and dict > 0 and you have all of the words that are in both.


let

 db.word.findOne() >{ word:'a_word', ... }

 db.dict.findOne() >{ word:'a_word', def:'def_of_a_word', ... }

find words in word col.

db.word.distinct('word')

check if a_word exists in dict col.

db.dict.count({word:'a_word'})  // 0=not exist
0

精彩评论

暂无评论...
验证码 换一张
取 消