开发者

Something like HashMap but sorted?

开发者 https://www.devze.com 2022-12-12 20:05 出处:网络
I\'m writing a Java program that parses all the words from a text file and then adds them to a HashMap. I need to count how many dis开发者_StackOverflowtinct words are contained in the file. I also ne

I'm writing a Java program that parses all the words from a text file and then adds them to a HashMap. I need to count how many dis开发者_StackOverflowtinct words are contained in the file. I also need to figure out the highest counted words. The HashMap is comprised of each word mapped to an integer which represents how many times the word occurs.

Is there something like HashMap that will help me sort this?


The Manual way to do it is as follows:

  • Create a composite WordCount class with word and count fields.
  • Create a Comparator for that class that sorts by count.
  • When you're done filling your HashMap, create a new List of WordCount objects created from values in the HashMap.
  • Sort the List using your comparator.


You could use a HashMultiset from google-collections:

import com.google.common.collect.*;
import com.google.common.collect.Multiset.Entry;

...

  final Multiset<String> words = HashMultiset.create();
  words.addAll(...);

  Ordering<Entry<String>> byIncreasingCount = new Ordering<Entry<String>>() {
    @Override public int compare(Entry<String> a, Entry<String> b) {
      // safe because count is never negative
      return left.getCount() - right.getCount();
    }
  });

  Entry<String> maxEntry = byIncreasingCount.max(words.entrySet())
  return maxEntry.getElement();

EDIT: oops, I thought you wanted only the single most common word. But it sounds like you want the several most common -- so, you could replace max with sortedCopy and now you have a list of all the entries in order.

To find the number of distinct words: words.elementSet().size()


If you want to sort the Map by word then TreeMap is the Java built-in answer. You can either make sure your Word objects are Comparable or supply a custom Comparator.

SortedMap<Word,Integer> map = new TreeMap<Word,Integer>();
...
for all words {
    Integer count = map.get(word);
    if (count == null ) count = 0;
    map.put(word, count+1);
}

If you want to sort by frequency then you will be better off doing this after all of the words have been counted. Sorted collections don't take kindly to having their ordering messed up through external changes. Sorting by frequency requires a composite word + count object as others have posted.


Here is a Groovy version of the most popular answer to this question:

List leastCommon(Multiset myMultiset, Integer quantity)
{

    Ordering<Multiset.Entry<String>> byIncreasingCount = new Ordering<Multiset.Entry<String>>() {
      @Override public int compare(Multiset.Entry<String> a, Multiset.Entry<String> b) {
          return a.getCount() - b.getCount() }
    }

    maxIndex = Math.min(quantity, myMultiset.entrySet().size() - 1)
    return byIncreasingCount.sortedCopy(myMultiset.entrySet()).subList(0, maxIndex)

}

List mostCommon(Multiset myMultiset, Integer quantity)
{

    Ordering<Multiset.Entry<String>> byDecreasingCount = new Ordering<Multiset.Entry<String>>() {
      @Override public int compare(Multiset.Entry<String> a, Multiset.Entry<String> b) {
          return b.getCount() - a.getCount() }
    }

    maxIndex = Math.min(quantity, myMultiset.entrySet().size() - 1)
    return byDecreasingCount.sortedCopy(myMultiset.entrySet()).subList(0, maxIndex)

}


It looks like the TreeBag class from the commons collections library might do what you want. It keeps track of how many copies of an object are added to the bag, and sorts them in ascending order of count. To get the highest count item just call the last() method. One thing to be aware of is that the commons collections stuff hasn't been updated to using generics yet, so you might get a ton of compiler warnings using it.


For the count, stuff the words in a Set and count the size when done.

For the highest, iterate over all Entries and hold on to the key with the highest value.


Have you checked out java.util.PriorityQueue? A PriorityQueue is basically a list with a priority mapped to each element (implemented by a non-synchronized priority heap). Every time you read in a new string you could add it in or increase it's priority by 1 if it is already present (logarithmic time). The is present check is in linear time, and in the end this will be really easy to use. To get the numbers that appear with the most frequency just poll() for each one when you are done!

edit The standard PriorityQueue doesn't allow you to edit priority directly since it requires a comparator. You'd be better off with a simple Hash implementation or something like this


  • YourBean implements Comparable<YourBean>
  • method compareTo : order by number of word
  • TreeMap instead of hashmap
0

精彩评论

暂无评论...
验证码 换一张
取 消