EF 4.1: Why does turning a constant into a variable result in extra sub query?_问答_开发者

Today I discovered that Entity Framework was adding an unnecessary sub query to the SQL it generates. I started digging my code trying to narrow down where it might come from. A (long) while later I pin-pointed what's causing it. But now I'm more confused than when I started, as I have no clue why it causes it.

Basically what I discovered is that on certain scenarios, simply converting a constant into a variable can alter the SQL that Entity Framework generates. I've shrunk everything to the bare minimum and packed it in a little console app:

using System;
using System.Data.Entity;
using System.Linq;

class Program
{
    private static readonly BlogContext _db = new BlogContext();

    static void Main(string[] args)
    {
        const string email = "foo@bar.com";

        var comments = from c in _db.Comments
                       where c.Email == email
                       select c;

        var result = (from p in _db.Posts
                      join c in comments on p.PostId equals c.PostId
                      orderby p.Title
                      select new { p.Title, c.Content });

        Console.WriteLine(result);
    }
}

public class BlogContext : DbContext
{
    public DbSet<Post> Posts { get; set; }
    public DbSet<Comment> Comments { get; set; }
}

public class Post
{
    public int PostId { get; set; }
    public string Title { get; set; }
}

public class Comment
{
    public int CommentId { get; set; }
    public int PostId { get; set; }
    public string Email { get; set; }
    public string Content { get; set; }
}

This shows the following output, which is perfect:

SELECT
[Extent1].[PostId] AS [PostId],
[Extent1].[Title] AS [Title],
[Extent2].[Content] AS [Content]
FROM  [dbo].[Posts] AS [Extent1]
INNER JOIN [dbo].[Comments] AS [Extent2] ON [Extent1].[PostId] = [Extent2].[PostId]
WHERE N'foo@bar.com' = [Extent2].[Email]
ORDER BY [Extent1].[Title] ASC

Now if I make email a variable:

/*const*/ string email = "foo@bar.com";

The output changes radically:

SELECT
[Project1].[PostId] AS [PostId],
[Project1].[Title] AS [Title],
[Project1].[Content] AS [Content]
FROM ( SELECT
        [Extent1].[PostId] AS [PostId],
        [Extent1].[Title] AS [Title],
        [Extent2].[Content] AS [Content]
        FROM  [dbo].[Posts] AS [Extent1]
        INNER JOIN [dbo].[Comments] AS [Extent2] ON [Extent1].[PostId] = [Extent2].[PostId]
        WHERE [Extent2].[Email] = @p__linq__0
)  AS [Project1]
ORDER BY [Project1].[Title] ASC

As a side note, LINQ to SQL does not seem to do this. I know it's probably okay to ignore this, as both commands return the same data. But I'm extremely curious as to wh开发者_如何学编程y this happens. Up until today I always had the (perhaps false?) impression that it is always safe to turn a constant into a variable, providing that the value remains the same (which in this case does). So I have to ask...

Why a seemingly insignificant change causes such a big difference in the generated SQL?

Update:

Just to be clear, my question isn't about the value of email being a hard-coded value in the first query and a variable in the second (which makes all the sense in the world). My question is about why the variable version results in the extra sub query.

Thanks!

The answer is fairly simple. Your LINQ query is expressed with expression trees. The difference with the const variable vs non const one are lies in ConstantExpression and ParameterExpression.

When you use const your LINQ query uses ConstExpression for this variable, and when you use non const it uses ParameterExpression which are interpreted differently by the EF Runtime.

Constant actually means that the value will never change and the value can be inlined into the query.

NOT an answer to the question - just context on using Parameters.

This is to do with creating a query such that it will re-use existing query plans.

If you inject the variable (as opposed to a reference to a parameter) into the generate SQL, then SQL Server (and probably other database engines) will not be able to re-use the same plan when the variable changes.

For constants this is not an issue, because you know the value is always the same, but for variables each time the query is executed the SQL and thus query plan would be slightly different.

This might not sound like much, but SQL only has a certain amount of space assigned for query plans, so having hundreds/thousands of minor variations in the cache is a real 'waste of space' so to speak!

Is this actually a big difference in the SQL? The inner query is the same as the original query, and the outer query is just a wrapper over the inner that doesn't change the result set.

Unless this is causing problems, I personally wouldn't worry about it. Do the query plans differ between the two flavours of query? My guess is that they're identical.

Like people said. Difference between both queries is minimal.

The reason is that the expression, that is created when you create your LINQ is different when you use variable and when constant. And EF will catch this and will generate your SQL acordingly. It knows it will never change, so it can be hard-coded into the query for (possible) performance gain.

Edit: I dont think there is answer to this question except "Thats how EF does it." But it is very well known that EF likes to creates many subselects. It can result for many subselects for more complex queries. Some even dismis even using EF for this fact. But this is simply price for using tool like EF. You loose fine-grained control over something, that can have big performance gain. Why do you use .NET, when you can use C and gain more performance? Why use C when you can use assembly, to gain more performance gain?

Only way to be safe and still be able to use high-abstraction layer EF is to use SQL profiller often and check if there are not queries that take too long on real data. And if your find some then either conver them to direct SQL or stored procedures.