开发者

Create a Hash to store and retrieve verbs in english language

开发者 https://www.devze.com 2023-03-23 07:40 出处:网络
I\'m working on a product which has the need for a hash to store and retrieve verbs in a sentence. Could I get some sample code which can start things off for me. The concern here for me is speed of r

I'm working on a product which has the need for a hash to store and retrieve verbs in a sentence. Could I get some sample code which can start things off for me. The concern here for me is speed of retrieval as well as storage on an not so frequent basis.

Update : Looking for

a) Constant time retrieval O(1) b) Interesting has functions fo开发者_如何学Cr strings(sample code)


Ideally I would like to store all of the [verb] forms as 1 hash index

You might think that is almost possible to do with so-called regular verbs using some chunk they all have in common:

             happen, happens, happened, happened, happening

but it is certainly not going to be possible for so-called irregular verbs:

             eat, eats, ate, eaten, eating
             sing, sings, sang, sung, singing
             go, goes, went, gone, going
             bring, brings, brought, brought, bringing
             speak, speaks, spoke, spoken, speaking

And there are also orthographic substitution variations to deal with:

             try, tries, tried, tried, trying
             cry, cries, cried, cried, crying

And other kinds of variation:

             miss, misses, missed, missed, missing

What I would suggest is to create a hash-table like this for every verb-form, pointing at the infintive form; the infinitive form points at itself:

           verb form  
           infinitive form

for example:

          happening
          happen


          went
          go


         happen
         happen

         go
         go


        ate
        eat

Then, given a verb form, you could very quickly find its infinitive by doing a hashedkey lookup, and you could store the definition, if you wanted to do so, in another table using the infintive form as the (hashed) key.


From our point of view this is probably (college-) homework so if it is you should tag it as "homework".

In C++0B there is the new official standard Unordered Map: http://en.wikipedia.org/wiki/Unordered_map_%28C%2B%2B%29

But if this is homework then you may be required to implement this yourself! Make an array, think about what a good hash-function might be and fire away.


Try creating your own hash map by defining a function that generates a unique value for a given verb. Use the value as either the index to an array or as the key to a map.

Also search the internet for word lists construction and dictionaries. A lot of programs that use word lists and dictionaries break up their data structures by word length or the word length is involved in the hash calculation.


One problem is that many English words can be both verbs and nouns and only the context determines which it is. For example, "What is your take on the situation?". "Take" here is a noun, not a verb. Are you willing to accept a brute-force approach that results in many false-positives?

Also what do you mean by "store and retrieve verbs in a sentence"? Identify verbs in the sentence, extract them, and then store them in a database of some kind? Maybe I misunderstand your requirement?


Since storage sounds extremely infrequent and retrieval sounds like the overwhelmingly predominant case with an extreme need for performance, I recommend perfect hashing. It's not convenient at all for storage since you need to recreate the entire hash, but for retrieval the results will be guaranteed O(1). Search for "perfect hashing" on Google and you'll see Bob Jenkin's website as the second listed.

There you'll find his implementation of perfect hashing and it works rather well. You may be able to use his code as a reference to understand how you can implement perfect hashing in your product. (I've had success with this in the past, but for researching, not for production.)

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号