开发者

Java: Equalator? (removing duplicates from a collection of objects)

开发者 https://www.devze.com 2022-12-15 18:15 出处:网络
I have a bunch of objects of a class Puzzle. I have overridden equals() and hashCode(). When it comes time to present the solutions to the user, I\'d like to filter out all the Puzzles that are \"simi

I have a bunch of objects of a class Puzzle. I have overridden equals() and hashCode(). When it comes time to present the solutions to the user, I'd like to filter out all the Puzzles that are "similar" (by the standard I have defined), so the user only sees one of each.

Similarity is transitive.

Example:

Result of computations:
A    (similar to A)
B    (similar to C)
C
D

In this case, only A or D and B or C would be presented to the user - but not two similar Puzzles. Two similar puzzles are equally valid. It is only important that they are not both shown to the user.

To accomplish this, I wanted to use an ADT that prohibits duplicates. However, I don't want to change the equals() and hashCode() methods to return a value about similarity instead. Is there some Equalator, like Comparator, that I can use in this case? Or is there another way I should be doing this?

The class I'm working on is a Puzzle that maintains a grid of letters. (Like Scrabble.) If a Puzzle contains the same words, but is in a different orientation, it is considered to be similar. So the following to puzzle:

                                    (2, 2): A           
                                    (2, 1): C           
                                    (2, 0): T

Would be similar to:

                    (1, 2): A           
                    (1, 1): C           
                 开发者_开发问答   (1, 0): T      


Okay you have a way of measuring similarity between objects. That means they form a Metric Space.

The question is, is your space also a Euclidean space like normal three dimensional space, or integers or something like that? If it is, then you could use a binary space partition in however many dimensions you've got.

(The question is, basically: is there a homomorphism between your objects and an n-dimensional real number vector? If so, then you can use techniques for measuring closeness of points in n-dimensional space.)

Now, if it's not a euclidean space then you've got a bigger problem. An example of a non-euclidean space that programers might be most familiar with would be the Levenshtein Distance between to strings.

If your problem is similar to seeing how similar a string is to a list of already existing strings then I don't know of any algorithms that would do that without O(n2) time. Maybe there are some out there.


But another important question is: how much time do you have? How many objects? If you have time or if your data set is small enough that an O(n2) algorithm is practical, then you just have to iterate through your list of objects to see if it's below a certain threshold. If so, reject it.

Just overload AbstractCollection and replace the Add function. Use an ArrayList or whatever. Your code would look kind of like this

class SimilarityRejector<T> extends AbstractCollection<T>{
     ArrayList<T> base;
     double threshold;

    public SimilarityRejector(double threshold){
        base = new ArrayList<T>();
        this.threshold = threshold;
    }

    public void add(T t){
       boolean failed = false;
       for(T compare : base){
          if(similarityComparison(t,compare) < threshold) faled = true;
       }
       if(!failed) base.add(t);
     }

    public Iterator<T> iterator() {
        return base.iterator();
    }

    public int size() {
        return base.size();
    }
}

etc. Obviously T would need to be a subclass of some class that you can perform a comparison on. If you have a euclidean metric, then you can use a space partition, rather then going through every other item.


I'd use a wrapper class that overrides equals and hashCode accordingly.

private static class Wrapper {
    public static final Puzzle puzzle;
    public Wrapper(Puzzle puzzle) { 
        this.puzzle = puzzle; 
    }
    @Override 
    public boolean equals(Object object) {
        // ...
    }
    @Override 
    public int hashCode() {
        // ...
    }
}

and then you wrap all your puzzles, put them in a map, and get them out again…

public Collection<Collection<Puzzle>> method(Collection<Puzzles> puzzles) {
    Map<Wrapper,<Collection<Puzzle>> map = new HashMap<Wrapper,<Collection<Puzzle>>();
    for (Puzzle each: puzzles) {
        Wrapper wrapper = new Wrapper(each);
        Collection<Puzzle> coll = map.get(wrapper);
        if (coll == null) map.put(wrapper, coll = new ArrayList<Puzzle>());
        coll.add(puzzle);
    }
    return map.values();
}


  1. Create a TreeSet using your Comparator
  2. Adds all elements into the set
  3. All duplicates are stripped out


Normally "similarity" is not a transitive relationship. So the first step would be to think of this in terms of equivalence rather than similarity. Equivalence is reflexive, symmetric and transitive.

Easy approach here is to define a puzzle wrapper whose equals() and hashCode() methods are implemented according to the equivalence relation in question.

Once you have that, drop the wrapped objects into a java.util.Set and that filters out duplicates.


IMHO, most elegant way was described by Gili (TreeSet with custom Comparator).

But if you like to make it by yourself, seems this easiest and clearest solution:

/**
 * Distinct input list values (cuts duplications)
 * @param items items to process
 * @param comparator comparator to recognize equal items
 * @return new collection with unique values
 */
public static <T> Collection<T> distinctItems(List<T> items, Comparator<T> comparator) {
    List<T> result = new ArrayList<>();

    for (int i = 0; i < items.size(); i++) {
        T item = items.get(i);

        boolean exists = false;
        for (int j = 0; j < result.size(); j++) {
            if (comparator.compare(result.get(j), item) == 0) {
                exists = true;
                break;
            }
        }

        if (!exists) {
            result.add(item);
        }
    }

    return result;
}
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号