I am a beginner to C++. Can some one tell me a best data structure in C++ to store all words in a dictionary and find if a word is present in the dictionary. I know hash tables are the best but I dont know which data structure uses them ?
Thank you very much in开发者_开发百科 advance.
Your C++ implementation's standard library may have unordered_set
or hash_set
. They are essentially the same thing; the former is part of the forthcoming C++0x standard and is supported by some of the latest compilers, the latter is from the original SGI STL and is included in many standard library implementations.
Hashes are pretty good, but the best structure is a trie. You can get a trie from <ext/pb_ds/assoc_container.hpp>
in GCC. See the online reference.
#include <ext/pb_ds/assoc_container.hpp>
#include <string>
#include <iostream>
int main() {
pb_ds::trie< std::string, int > dict;
dict.insert( std::make_pair( "hello", 3 ) );
std::cerr << ( dict.find( "hello" ) != dict.end() ) << std::endl;
std::cerr << ( dict.find( "goodbye" ) != dict.end() ) << std::endl;
}
Only map
-like functionality, not a pure set
, is provided. In the above sample I added a dummy int
as data to map to… it shouldn't really hurt much.
What does hurt is that this won't work outside GCC.
On the other hand, a non-standard hash table (not std::
or ext::
anything) would allow you to only find approximate matches, i.e. to search among checksums of words instead of the words themselves. That would be the fastest, most compact solution. Dictionaries based on Bloom filters can contain many thousands of words in a few kilobytes.
hash_map, if you have it in your C++'s compiler library (e.g., GNU C++ or Microsoft Visual C++). If you're using some other, less widespread compiler, I suspect you can find a decent third party implementation of hash_map
anyway.
The forthcoming C++ standard calls this same data structure std::unordered_map
instead.
If you don't want to associate any information at all with words in your dictionary, just record whether a word is present in it or not, you can use the _set
(instead of _map
) variations of the above data structure type names.
Of course, they're all templates (as all containers in the C++ standard library), so you'll need to instantiate them appropriately with typical template syntax.
I would prefer using a Trie. A Trie will be a good data structure for building a memory-efficient dictionary with fast lookups, and yes, autocompletion.
Think of it as a hashtable, providing fast lookup of key-value pairs (or just lookup of keys), but unlike a hashtable it allows you to iterate over the keys in sorted order.
Please refer Trie - Wiki for further information/reference.
If the only requirement is to decide whether a word is contained in a never-ever-changing dictionary, without needing any other kind of information about the word, (for example, a spell-checker) then Bloom filter is an efficient data structure for this task.
If there are other data to be associated with each word that needs to be looked up, std::map
is a good, general-purpose starting point.
If auto-completion is needed (when a partial word has been entered), a Prefix tree (trie) may be used.
If you're willing to roll your own solution and your dictionary is fixed, a perfect hash is a good way to go. It guarantees a constant lookup time.
精彩评论