开发者

Linq to Objects: filtering performance question

开发者 https://www.devze.com 2023-01-19 23:37 出处:网络
I was thinking about the way linq computes and it made me wonder: If I write var count = collection.Count(o => o.Category == 3);

I was thinking about the way linq computes and it made me wonder:

If I write

var count = collection.Count(o => o.Category == 3);

Will that perform any differently than:

var count = collection.Where(o => o.Category开发者_运维技巧 == 3).Count();

After all, IEnumerable<T>.Where() will return IEnumerable<T> which doesn't implement Count property, so a subsequent Count() would actually have to walk through the items to determine the count which should cause extra time being spent on this.

I wrote some quick test code to get some metrics but they seem to beat each other at random. I won't put in the test code here initially, but if anyone requests, I'll get it in.

So, am I missing something?


There won't be a lot in it, really - both forms will iterate over the collection, check the predicate against each item, and count the matches. Both approaches will stream the data - it's not like Where is actually building an in-memory list of all matches, for example.

The first form has one fewer (thin) layer of indirection in, that's all. The main reason for using it (IMO) is for readability/simplicity, rather than performance.


As Jon Skeet says, both techniques will have to essentially do the same thing - enumerate the sequence while conditionally incrementing a counter when the predicate is matched. Any performance differences between the two should be slight: insignificant for almost all use-cases. If there is a token winner though, I would think it should be the first one, since from reflector it appears that the overload ofCountthat takes a predicate uses its ownforeachto enumerate rather than the more obvious way of offloading the work to a streaming aWhereinto a parameterlessCountas in your second example. This means technique #1 is likely to have two minor performance benefits:

  1. Fewer argument validation (null-tests etc.) checks. Technique #2's Count will also check if its (piped) input is an ICollection or ICollection<T> , which it can't possibly be.
  2. A single constructed enumerator vs two enumerators piped together (an additional state-machine has costs).

There is one minor in favour of technique #2 point though:Whereis slightly more sophisticated in constructing an enumerator for the source-sequence; it uses a different one for lists and arrays. This may make it more performant in certain scenarios.

Of course, I should reiterate that I might be plain wrong about my analysis - reasoning about performance through static code analysis, especially when the differences are likely to be slight, is not a good idea. There is only one way to find out - measuring the execution times for your specific setup.

FYI, the source I reflected was from .NET 3.5 SP1.


I know what you are thinking here. At least, I think I do; Count() will look to see if Count is available as a property, and will simply return that if so. Otherwise, it has to enumerate the items to get its return value.

The version of Count() which accepts the predicate, though, will always cause the collection to be iterated, since it has to do it to see which ones match.


Above answers make good points, consider also that if you break away into any Linq-To-X implementations that deferred execution (Linq to Sql being the primary), the Expression parameters used in these methods may cause different results.

0

精彩评论

暂无评论...
验证码 换一张
取 消