Confusion in finding the order of the data structure_问答_开发者

Today I attended a written test conducted by a company. The overall test was focussed on data structures. I got a problem which I thought I solved. But I had a tough time in calculating the Big O function for the data structure. I will provide the question and the answer I came up with.

Given a document you need to store and the words in the document and should be able to return the count when any word is entered. You are provided with char* GetNextWord().

What Data structure will you choose

Give the Algorithm

What will be the order of your algorithm

For question 1, I wrote I will go for TRIE data structure. For question 2, I gave a brief algo开发者_如何学Gorithm. I wrote I would construct the TRIE data structure as following.

struct TRIE{
 boolean isWord;
 int count;
 Node* myList;
}

struct Node{
 char* character;
 Node *next;
 TRIE *child;
}

I have methods constructTrie() which will do a addToTrie() for each word.

I wrote the order of addToTrie() would be O(k) where k is the length. And the order of constructTrie() would be N*O(k) where N would be the number of words.

Now my question is this: Whether the orders I've mentioned is correct or not? If not, how to attack problems like this in the future (given a ds finding the order). I got really confused after using O(k). It makes me assume O(1).

Hints/Tips/Advise are wide open!!

Edit : Corrected the question clearly mentioning that the word count should be stored for all unique words.

Comparing two generic strings take Θ(k) (k = min strlen), and the number of words is N which you have to look through, so Ω(Nk) should be the most efficient complexity you can get.

If you really want to use a trie, then addToTrie() would indeed be O(k) where k is the length of the word you are adding. constructTrie() would take O(Nk) where N is the number of words, if you just call addToTrie() for every word. However, you don't need to call the addToTrie() function for every word. Once you have finished adding a word, just reset a trie pointer to the trie's root, then move the pointer as you are moving over your current word, adding the characters as you go along. Pseudocode:

trieNode *curr = trieRoot;
for each character c in document
  if it's a word terminator (space etc)
    add a character at curr signaling the end of the current word ('\0' maybe);
    curr = trieRoot;
  else if character is not a separator
    add character c at curr->next->character[c];
    curr = curr->next;

This will give you O(C) running time for constructing the trie, where C is the number of characters in your document.

Now, this begs the question: why do you need the trie at all? Obviously you figured out a way to detect when a word has ended, so why must you add your words to a trie? It's overkill. The only datastructure you need is a few variables: one to keep track of the current character, one to keep track of the previous character and one to count the words. This is easily done in O(C) like this:

char prev = '\0';
char curr;
int count = 0;

for each character curr
  if curr is a word separator and prev isn't 
    ++count;
  prev = curr;

I think it makes no sense to use a trie for this problem, it's only complicating things. I think if they wanted to test your knowledge of tries they would have given you a problem where a trie made more sense.

Even if they gave you a getNextWord() function (did you HAVE to use it? because you can do better without it), I'm guessing it returns "\0" or something when there are no more words? So why can't you just call it until it returns "\0" and count the words like that? Either way, a trie doesn't really make sense here.