I am attempting to create a nice interface to access a data set where each value has several possible keys. For example, suppose that I have both a number and a name for each value in the data set. I want to be able to access each value using either the number OR the name.
I have considered several possible implementations:
Using two separate dictionaries, one for the data values organized by number, and one for the data values organized by name.
Simply assigning two keys to the same value in a dictionary.
Creating dictionaries mapping each name to the corresponding number, and vice versa
Attempting to create a hash function that maps each name to a number, etc. (related to the above)
Creating an object to encapsulate all three pieces of data, then using one key to map dictionary keys to the objects and simply searching the dictionary to map the other key to the object.
None of these seem ideal. The first seems ugly and unmaintainable. The second also seems fragile. The third/fourth seem plausible, but seem to require either much manual specification or an overly complex implementation. Finally, the fifth loses constant-time performance for one of the lookups.
In C/C++, I believe that I would use pointers to reference the same piece of data from different keys.
I know that the problem is rather similar to a database lookup problem by a non-key column, however, I would like (if possible), to maintain the approximate O(1) performance of Python dictionaries.
What is the most Pythonic way to 开发者_C百科achieve this data structure?
In C/C++, I believe that I would use pointers to reference the same piece of
data from different keys.
This would correspond with option number 2. In Python, dictionaries really store pointers to objects. That means that having two keys point to the same object will not create the object twice.
Are both the names and numbers unique? Using one to find the other, first, isn't so very bad.
And two dictionaries pointing to the same data, like in C, won't duplicate the data, and is fine, too.
Encapsulating the two dictonaries into a self-contained object with add(name,number,value)
and findByName(name)
, findByNumber(number)
, will let you centralize the maintenance, be testable & so forth.
(pardon my camelCase :)
Look at it this way: You are in all essence wanting to have a three-column database where two columns are indexed, but with the simplification that you don't want to be able to look up the indexed values.
Option 5 is in practice trying to make such a simplified database. And what you end up with when making such a database in memory is a mapping from a UID to the values you have (in this case only one as you have only one value "column"), and the indexes are mapping from values to UIDs.
In your case you already have a number you can use as a UID, so you don't need a "column" for that.
That means you end up with two dictionaries: One mapping number to value, and one mapping the name to the number.
So this is what you should do, IMO.
In C/C++, I believe that I would use pointers to reference the same piece of data from different keys.
Almost anything in Python qualifies as a "C/C++ pointer".
Use your option #1, two dictionaries, and test it for performance. If you define a class for the content, then constructors and destructors can manage the dictionaries and the class can define functions for the lookups.
精彩评论