What is the most optimal way (algorithm) to search for the word that has the maximum number of occurrence开发者_运维知识库s in a document?
Finding the word that occures most times in a document can be done in O(n) by a simple histogram [hash based]:
histogram <- new map<String,int>
for each word in document:
if word in histogram:
histogram[word] <- histogram[word] + 1
else:
histogram[word] <- 1
max <- 0
maxWord<- ""
for each word in histogram:
if histogram[word] > max:
max <- histogram[word]
maxWord <- word
return maxWord
This is O(n) solution, and since the problem is clearly Omega(n) problem, it is optimal in terms of big O notation.
- Scan the document once, keeping a count of how many times you have seen every unique word (perhaps using a hashtable or a tree to do this).
- While performing step 1, keep track of the word that has the highest count of all words seen so far.
精彩评论