开发者

C# LINQ First() faster than ToArray()[0]?

开发者 https://www.devze.com 2023-03-07 07:25 出处:网络
I am running a test. It looks like: method 1) List<int> = new List<int>{1,2,4, .....} //assume 1000k

I am running a test.

It looks like:

method 1)

List<int> = new List<int>{1,2,4, .....} //assume 1000k
var  result ErrorCodes.Where(x => ReturnedErrorCodes.Contains(x)).First();

method 2)

List<int> = new List<int>{1,2,4, .....} //assume 1000k
var  r开发者_开发知识库esult = ErrorCodes.Where(x => ReturnedErrorCodes.Contains(x)).ToArray()[0];

Why is method 2 is so slow compared to method 1?


You have a jar containing a thousand coins, many of which are dimes. You want a dime. Here are two methods for solving your problem:

  1. Pull coins out of the jar, one at a time, until you get a dime. Now you've got a dime.

  2. Pull coins out of the jar, one at a time, putting the dimes in another jar. If that jar turns out to be too small, move them, one at a time, to a larger jar. Keep on doing that until you have all of the dimes in the final jar. That jar is probably too big. Manufacture a jar that is exactly big enough to hold that many dimes, and then move the dimes, one at a time, to the new jar. Now start taking dimes out of that jar. Take out the first one. Now you've got a dime.

Is it now clear why method 1 is a whole lot faster than method 2?


Erm... because you are creating an extra array (rather than just using the iterator). The first approach stops after the first match (Where is a non-buffered streaming API). The second loads all the matches into an array (presumably with several re-sizes), then takes the first item.

As a side note; you can create infinite sequences; the first approach would still work, the second would run forever (or explode).

It could also be:

var  result ErrorCodes.First(x => ReturnedErrorCodes.Contains(x));

(that won't make it any faster, but is perhaps easier to read)


Because of deferred execution.

The code ErrorCodes.Where(x => ReturnedErrorCodes.Contains(x)) doesn't return a collection of integers, instead it returns an expression that is capable of returning a stream of integers. It doesn't do any actual work until you start reading integers from it.

The ToArray method will consume the entire stream and put all the integers in an array. This means that every item in the entire list has to be compared to the error codes.

The First method on the other hand will only get the first item from the stream, and then stop reading from the stream. This will make it a lot faster, because it will stop comparing items from the list to the error codes as soon as it finds a match.


Because ToArray() copies the entire sequence to an array.

Method 2 has to iterate over the whole sequence to build an array, and then returns the first element.

Method 1 just iterates over enough of the sequence to find the first matching element.


ToArray() walks through the whole sequence it has been given and creates and array out of it.

If you don't callt ToArray(), First() lets Where() return just the first item that matches and immediatelly returns.


First() is complexity of O(1)

ToArray()[0] is complexity O(n)+1


var @e = array.GetEnumerator();

// First
@e.MoveNext();
return @e.Current;

// ToArray (with yield [0] should as fast as First...)
while (@e.MoveNext() {
    yield return @e.Current;
}


Because in the second example, you are actually converting the IEnumerable<T> to an array, whereas in the first example, no conversion is taking place.


In method 2 the entire array must be converted to an array first. Also, it seems awkward to mix array access when First() is so much more readable.


This makes sense, ToArray probably involves a copy, which is always going to be more expensive, since Linq can't make any guarantees about how you're going to use your array, while First() can just return the single element at the beginning.

0

精彩评论

暂无评论...
验证码 换一张
取 消