开发者

How to de-dupe a List of Objects?

开发者 https://www.devze.com 2023-01-23 12:58 出处:网络
A Rec object has a member variable called tag which is a String. If I have a List of Recs, how could I de-dupe the list based on the tag member variable?

A Rec object has a member variable called tag which is a String.

If I have a List of Recs, how could I de-dupe the list based on the tag member variable?

I just need to make sure that the List contains only one Rec with each tag value.

Something like the following, but I'm not sure what's the best algorithm to keep track counts, etc:

private List<Rec> deDupe(List<Rec> recs) {

    for(Rec rec : recs) {

         // How to check whether rec.tag e开发者_如何学JAVAxists in another Rec in this List
         // and delete any duplicates from the List before returning it to
         // the calling method?

    }

    return recs;

}


Store it temporarily in a HashMap<String,Rec>.

Create a HashMap<String,Rec>. Loop through all of your Rec objects. For each one, if the tag already exists as a key in the HashMap, then compare the two and decide which one to keep. If not, then put it in.

When you're done, the HashMap.values() method will give you all of your unique Rec objects.


Try this:

private List<Rec> deDupe(List<Rec> recs) {

    Set<String> tags = new HashSet<String>();
    List<Rec> result = new ArrayList<Rec>();

    for(Rec rec : recs) {
        if(!tags.contains(rec.tags) {
            result.add(rec);
            tags.add(rec.tag);
        }
    }

    return result;
}

This checks each Rec against a Set of tags. If the set contains the tag already, it is a duplicate and we skip it. Otherwise we add the Rec to our result and add the tag to the set.


This becomes easier if Rec is .equals based on its tag value. Then you could write something like:

private List<Rec> deDupe( List<Rec> recs )
{
    List<Rec> retList = new ArrayList<Rec>( recs.size() );
    for ( Rec rec : recs )
    {
        if (!retList.contains(rec))
        {
            retList.add(rec);
        }
    }
    return retList;
 }


I would do that with the google collections. You can use the filter function, with a predicate that remember previous tags, and filters out Rec's with tag that has been there before. Something like this:

private Iterable<Rec> deDupe(List<Rec> recs) 
{
    Predicate<Rec> filterDuplicatesByTagPredicate = new FilterDuplicatesByTagPredicate();
    return Iterables.filter(recs, filterDuplicatesByTagPredicate);
}

private static class FilterDuplicatesByTagPredicate implements Predicate<Rec>
{
    private Set<String> existingTags = Sets.newHashSet();

    @Override
    public boolean apply(Rec input)
    {
        String tag = input.getTag();
        return existingTags.add(tag);
    }
}

I slightly changed the method to return Iterable instead of List, but ofcourse you change that if that's important.


If you don't care about shuffling the data around (i.e you have a small list of small objects), you can do this:

private List<T> deDupe(List<T> thisListHasDupes){
    Set<T> tempSet = new HashSet<T>();
    for(T t:thisListHasDupes){
        tempSet.add(t);
    }
    List<T> deDupedList = new ArrayList<T>();
    deDupedList.addAll(tempSet);
    return deDupedList;
}

Remember that implmenations of Set are going to want a consistent and valid equals operator. So if you have a custom object make sure that's taken care of.

0

精彩评论

暂无评论...
验证码 换一张
取 消