I was going through Jon Skeet's Reimplemnting Linq to Objects series. In the implementation of where article, I found the following snippets, but I don't get what is the advantage that we are gettting by splitting the original method into two.
Original Method:
// Naive validation - broken!
public static IEnumerable<TSource> Where<TSource>(
this IEnumerable<TSource> source,
Func<TSource, bool> predicate)
{
if (source == null)
{
throw new ArgumentNullException("source");
}
if (predicate == null)
{
throw new ArgumentNullException("predicate");
}
foreach (TSource item in source)
{
if (predicate(item))
{
yield return item;
}
}
}
Refactored Method:
public static IEnumerable<TSource> Where<TSource>(
this IEnumerable<TSource> source,
Func<TSource, bool> predicate)
{
if (source == null)
{
throw new ArgumentNullException("source");
}
if (predicate == null)
{
throw new ArgumentNullException("predicate");
}
return WhereImpl(source, predicate);
}
private static IEnumerable<TSource> WhereImpl<TSource>(
this IEnumerable<TSource> source,
Func<TSource, bool> predicate)
{
foreach (TSource item in source)
{
if (predicate(item))
{
yield return item;
开发者_运维技巧 }
}
}
Jon says - Its for eager validation and then defferring for the rest of the part. But, I don't get it.
Could some one please explain it in a little more detail, whats the difference between these 2 functions and why will the validations be performed in one and not in the other eagerly?
Conclusion/Solution:
I got confused due to my lack of understanding on which functions are determined to be iterator-generators. I assumed that, it is based on signature of a method like IEnumerable
<T>
. But, based on the answers, now I get it, a method is an iterator-generator if it uses yield statements.
The broken code is a single method, really an iterator-generator. That means it initially just returns a state machine without doing anything. Only when the calling code calls MoveNext (likely as part of a for-each loop) does it execute everything from the beginning up to the first yield-return.
With the correct code, Where
is not an iterator-generator. That means it executes everything immediately, like normal. Only WhereImpl
is. So the validation is executed right away, but the WhereImpl
code up to and including the first yield return is deferred.
So if you have something like:
IEnumerable<int> evens = list.Where(null); // Correct code gives error here.
foreach(int i in evens) // Broken code gives it here.
the broken version won't give you an error until you start iterating.
I think Jon explains it pretty well in his article, but the explanation relies on you understanding how the compiler generates code when there is a yield
statement. Essentially what happens is the compiler generates an iterator that doesn't get invoked (deferred execution) until one of the items from the iteration is required. The initial method contains both the code that checks the arguments and the iteration code. The compiler bundles all of this up into the iterator which, remember, doesn't get invoked until the first item is needed. This means that validation doesn't happen until you try to access one of the items in the enumerable.
By separating it into two methods, one containing the validation and one containing the iterator block, it ensures that the validation code gets run when the iterator is constructed, not when it is executed. This is because the only code bundled into the iterator is the code in the second method; it's the only code whose execution is deferred. The validation code is executed at the time you create the iterator.
精彩评论