Which is faster in .NET, .Contains() or .Count()?_问答_开发者

I want to compare an array of modified records against a list of records pulled from the database, and delete those records from the database that do not exist in the incoming array. The modified array comes from a client app that maintains the database, and this code runs in a WCF service app, so if the client deletes a record from the array, that record should be deleted from the database. Here's the sample code snippet:

public void UpdateRecords(Record[] recs)
{
    // look for deleted records
    foreach (Record rec in UnitOfWork.Records.ToList())
    {
        var copy = rec;
        if (!recs.Contains(rec))         开发者_如何学C             // use this one?
        if (0 == recs.Count(p => p.Id == copy.Id))    // or this one?
        {
            // if not in the new collection, remove from database
            Record deleted = UnitOfWork.Records.Single(p => p.Id == copy.Id);
            UnitOfWork.Remove(deleted);
        }
    }
    // rest of method code deleted
}

My question: is there a speed advantage (or other advantage) to using the Count method over the Contains method? the Id property is guaranteed to be unique and to identify that particular record, so you don't need to do a bitwise compare, as I assume Contains might do.

Anyone? Thanks, Dave

This would be faster:

if (!recs.Any(p => p.Id == copy.Id))

This has the same advantages as using Count() - but it also stops after it finds the first match unlike Count()

You should not even consider Count since you are only checking for the existence of a record. You should use Any instead.

Using Count forces to iterate the entire enumerable to get the correct count, Any stops enumerating as soon as you found the first element.

As for the use of Contains you need to take in consideration if for the specified type reference equality is equivalent to the Id comparison you are performing. Which by default it is not.

Assuming Record implements both GetHashCode and Equals properly, I'd use a different approach altogether:

// I'm assuming it's appropriate to pull down all the records from the database
// to start with, as you're already doing it.
foreach (Record recordToDelete in UnitOfWork.Records.ToList().Except(recs))
{
    UnitOfWork.Remove(recordToDelete);
}

Basically there's no need to have an N * M lookup time - the above code will end up building a set of records from recs based on their hash code, and find non-matches rather more efficiently than the original code.

If you've actually got more to do, you could use:

HashSet<Record> recordSet = new HashSet<Record>(recs);

foreach (Record recordFromDb in UnitOfWork.Records.ToList())
{
    if (!recordSet.Contains(recordFromDb))
    {
        UnitOfWork.Remove(recordFromDb);
    }
    else
    {
        // Do other stuff
    }
}

(I'm not quite sure why your original code is refetching the record from the database using Single when you've already got it as rec...)

Contains() is going to use Equals() against your objects. If you have not overridden this method, it's even possible Contains() is returning incorrect results. If you have overridden it to use the object's Id to determine identity, then in that case Count() and Contains() are almost doing the exact same thing. Except Contains() will short circuit as soon as it hits a match, where as Count() will keep on counting. Any() might be a better choice than both of them.

Do you know for certain this is a bottleneck in your app? It feels like premature optimization to me. Which is the root of all evil, you know :)

Since you're guarenteed that there will be 1 and only 1, Any might be faster. Because as soon as it finds a record that matches it will return true.

Count will traverse the entire list counting each occurrence. So if the item is #1 in the list of 1000 items, it's going to check each of the 1000.

EDIT

Also, this might be a time to mention not doing a premature optimization.

Wire up both your methods, put a stopwatch before and after each one. Create a sufficiently large list (1000 items or more, depending on your domain.) And see which one is faster.

My guess is that we're talking on the order of ms here.

I'm all for writing efficient code, just make sure you're not taking hours to save 5 ms on a method that gets called twice a day.

It would be so:

UnitOfWork.Records.RemoveAll(r => !recs.Any(rec => rec.Id == r.Id));

May I suggest an alternative approach that should be faster I believe since count would continue even after the first match.

public void UpdateRecords(Record[] recs)
{
    // look for deleted records
    foreach (Record rec in UnitOfWork.Records.ToList())
    {
        var copy = rec;
        if (!recs.Any(x => x.Id == copy.Id)
        {
            // if not in the new collection, remove from database
            Record deleted = UnitOfWork.Records.Single(p => p.Id == copy.Id);
            UnitOfWork.Remove(deleted);
        }
    }
    // rest of method code deleted
}

That way you are sure to break on the first match instead of continue to count.

If you need to know the actual number of elements, use Count(); it's the only way. If you are checking for the existence of a matching record, use Any() or Contains(). Both are MUCH faster than Count(), and both will perform about the same, but Contains will do an equality check on the entire object while Any() will evaluate a lambda predicate based on the object.