So I am working on a pet project where I'm storing various text files. I have setup my app to save the tags as a string in one of my collections so an example would be:
tags: "Linux Apache WSGI"
Storing them and searching for them work just fine but my question comes when I want to do something like a tag cloud, count all the various tags, or make a dynamic selection system based on tags, what is the best way to break them up to work with? Or should I be storing them some other way?
Logically I could scan through every record and get all the tags, break them based on space, then cache the result somehow. Maybe that's the right answer but I wanted to ask the community wisdom.
I'm using pymongo to interact with my 开发者_StackOverflow中文版database.
Or should I be storing them some other way?
The standard way to store tags is to store them as an array. In your case, the DB would look something like:
tags: ['linux', 'apached', 'wsgi']
... what is the best way to break them up to work with?
This is what Map/Reduce is designed for. This effectively "scans every record". The output of a Map/Reduce is another collection that you can query.
However, there's also another way to do this and that's to keep "counters" and update them. So when you save a new document you also increment all of the tags related to that document.
精彩评论