开发者

Grouping data in memory using Linq To Objects or using native SQL -- WHICH IS FASTER?

开发者 https://www.devze.com 2023-01-06 19:41 出处:网络
I not开发者_JAVA技巧iced that LINQ to Objects has a GroupBy method.In light of this, I was wondering if anyone can build a case for grouping a list of objects in memory using LINQ vs having SQL Server

I not开发者_JAVA技巧iced that LINQ to Objects has a GroupBy method. In light of this, I was wondering if anyone can build a case for grouping a list of objects in memory using LINQ vs having SQL Server perform the grouping?


Some reasons you might want to group using LINQ on the client are:

  1. Your objects are already in memory.
  2. You might want to group by different keys at different times during the lifecycle of the data, and return trips to the database might be costly.
  3. You might want to group by something that's inconvenient for SQL Server to calculate, such as data.GroupBy(d => DoSomethingComplicatedWith(d)).


On a large data set with properly indexed tables, SQL Server will be faster. Every time. Hands down. For smaller datasets, you might not even notice a difference.


If you are dealing with a small set of data, performance on the client is not a concern, and going back to the database again to get the data into the format you want is not an option (or an undesirable option), then doing it in memory is fine.

In all other cases, your best option would be to let the database do this work, as it is optimized for exactly this kind of operation.


There are (not too few) situations where your objects are not contained in a database.


As already stated there may be several reasons for choosing grouping client-side (C#) vs. server-side (SQL Server). I decided to make a small test - the following query should decide what pairs of articles and users are missing from a specific computation.

The query returns about 1.5M records grouped in more than 15000 groups.

Grouping in SQL

allArticleConcepts = DataAccess.ArticleConceptRepository.AllNoTracking
    .Join(DataAccess.ArticleAnalysisDataRepository.AllNoTracking.Where(aa => aa.CommentCount >= minCommentCount), 
        outer => outer.ArticleId, 
        inner => inner.ArticleId, 
        (outer, inner) => outer)
    .Where(ac => missingXData.Any(x => x.ArticleId == ac.ArticleId))
    .GroupBy(ac => ac.ArticleId)
    .ToDictionary(grp => grp.Key, grp => grp
        .Select(ac => new Concept { ContextSynLexemId = ac.LexemId, LexemId = ac.LexemId, Frequency = ac.Freq })
        .ToList());
  • SQL duration = 22 seconds
  • Total duration = 23 seconds

Generated query is very complex and thus taking a long time.

Grouping in .NET

allArticleConcepts = DataAccess.ArticleConceptRepository.AllNoTracking
    .Join(DataAccess.ArticleAnalysisDataRepository.AllNoTracking.Where(aa => aa.CommentCount >= minCommentCount), 
        outer => outer.ArticleId, 
        inner => inner.ArticleId, 
        (outer, inner) => outer)
    .Where(ac => missingXData.Any(x => x.ArticleId == ac.ArticleId))
    .ToList()
    .GroupBy(ac => ac.ArticleId)
    .ToDictionary(grp => grp.Key, grp => grp
        .Select(ac => new Concept { ContextSynLexemId = ac.LexemId, LexemId = ac.LexemId, Frequency = ac.Freq })
        .ToList());
  • SQL duration = 13 seconds
  • Total duration = 15 seconds

Generated SQL is much simpler and faster. However, C# computation is slightly slower.

Grouping in .NET with some parallel flavor

allArticleConcepts = DataAccess.ArticleConceptRepository.AllNoTracking
    .Join(DataAccess.ArticleAnalysisDataRepository.AllNoTracking.Where(aa => aa.CommentCount >= minCommentCount), 
        outer => outer.ArticleId, 
        inner => inner.ArticleId, 
        (outer, inner) => outer)
    .Where(ac => missingXData.Any(x => x.ArticleId == ac.ArticleId))
    .ToList()
    .AsParallel()
    .GroupBy(ac => ac.ArticleId)
    .ToDictionary(grp => grp.Key, grp => grp
        .Select(ac => new Concept { ContextSynLexemId = ac.LexemId, LexemId = ac.LexemId, Frequency = ac.Freq })
        .ToList());

This gives a minor improvement on the client side.

As a conclusion on how to perform grouping:

  • if logic cannot be translated to SQL: Linq2Objects is a must
  • if logic is quite complex and/or dealing with a large number of entities, Linq2Objects is most probably the way to go
  • if logic is quite simple and indexes can be used, Linq2SQL is most probabily the way to go. However, this also requires carefully writing of the LINQ to avoid generating inefficient queries.

Note: if query deals with large number of entities, but the result is relatively small, one can consider direct query execution (or stored procedure) and map the result to entities. This ensures maximum flexibility for writing the query and minimizing C# - SQL Server round-trip time.

0

精彩评论

暂无评论...
验证码 换一张
取 消