LINQ ToList().Take(10) vs Take(10).ToList() which one generates more efficient query_问答_开发者

Given the following LINQ Statement(s), which will be more efficient?

ONE:

public List<Log> GetLatestLogEntries()
{
    var logEntries = from entry in db.Logs
           开发者_高级运维      select entry;
    return logEntries.ToList().Take(10);
}

TWO:

public List<Log> GetLatestLogEntries()
{
    var logEntries = from entry in db.Logs
                 select entry;
    return logEntries.Take(10).ToList();
}

I am aware that .ToList() executes the query immediately.

The first version wouldn't even compile - because the return value of Take is an IEnumerable<T>, not a List<T>. So you'd need it to be:

public List<Log> GetLatestLogEntries()
{
    var logEntries = from entry in db.Logs
                 select entry;
    return logEntries.ToList().Take(10).ToList();
}

That would fetch all the data from the database and convert it to a list, then take the first 10 entries, then convert it to a list again.

Getting the Take(10) to occur in the database (i.e. the second form) certainly looks a heck of a lot cheaper to me...

Note that there's no Queryable.ToList() method - you'll end up calling Enumerable.ToList() which will fetch all the entries. In other words, the call to ToList doesn't participate in SQL translation, whereas Take does.

Also note that using a query expression here doesn't make much sense either. I'd write it as:

public List<Log> GetLatestLogEntries()
{
    return db.Log.Take(10).ToList();
}

Mind you, you may want an OrderBy call - otherwise it'll just take the first 10 entries it finds, which may not be the latest ones...

Your first option won't work, because .Take(10) converts it to IEnumerable<Log>. Your return type is List<Log>, so you would have to do return logEntries.ToList().Take(10).ToList(), which is more inefficient.

By doing .ToList().Take(10), you are forcing the .Take(10) to be LINQ to objects, while the other way the filter could be passed on to the database or other underlying data source. In other words, if you first do .ToList(), ALL the objects have to be transferred from the database and allocated in memory. THEN you filter to the first 10. If you're talking about millions of database rows (and objects) you can imagine how this is VERY inefficient and not scalable.

The second one will also run immediately because you have .ToList(), so no difference there.

The second version will be more efficient (in both time and memory usage). For example, imagine that you have a sequence containing 1,000,000 items:

The first version iterates through all 1,000,000 items, adding them to a list as it goes. Then, finally, it will take the first 10 items from that large list.
The second version only needs to iterate the first 10 items, adding them to a list as it goes. (The remaining 999,990 items don't even need to be considered.)

How about this ?

I have 5000 records in "items"

version 1:

  IQueryable<T> items = Items; // my items
  items = ApplyFilteringCriteria(items, filter); // my filter BL 
  items = ApplySortingCriteria(items, sortBy, sortDir); // my sorting BL
  items = items.Skip(0);
  items = items.Take(25);
  return items.ToList();

this took : 20 sec on server

version 2:

  IQueryable<T> items = Items; // my items
  items = ApplyFilteringCriteria(items, filter); // my filter BL 
  items = ApplySortingCriteria(items, sortBy, sortDir); // my sorting BL
  List<T> x = items.ToList();
  items = x.Skip(0).ToList();
  items = x.Take(25).ToList();
  return x;

this took : 1 sec on server

What do you think now ? Any idea why ?

The second option.

The first will evaluate the entire enumerable, slurping it into a List(); then you set up the iterator that will iterate through the first ten objects and then exit.

The second sets up the Take() iterator first, so whatever happens later than that, only 10 objects will be evaluated and sent to the "downstream" processing (in this case the ToList() which will take those ten elements and return them as the concrete List).