I apologize for the long question in advance. Most online articles don't go over this, they just show a quick result set. For such an important and commonly used idea, I want to fully understand this. I've seen a lot of the post on here with specific examples, but none got the core idea in my head. My question is when you do a 3+ table join, how does this work in memory? The statement I'm currently using is:
select a.cust_id, a.[first name],a.[last name],a.[primary zip],c.jerseynum
from contact as a
join notes as b
on a.cust_id = b.cust_id
join jerseytable as c
on a.cust_id = c.cust_id
so after the first join between a and b we get a result set, we'll call it 1 I then do a join on a and c... this is were it gets fuzzy for me. This result set doesn't just take the place of my previ开发者_如何学运维ous join, does it only add records to 1 that fit just the join between a and c?
You're basically asking how a database does its query execution. There's a lot of theory and practice in this area, more than a single answer can give you.
The query engine has a lot of tools at its disposal, depending on the joins, the indexes, and other statistics it keeps. It can construct in-memory tables, reorder joins (in some cases) to better limit the number of returned rows. It might identify the results of the different joins and merge them together.
Read up on query plans to get started: http://en.wikipedia.org/wiki/Query_plan and the related section on query optimization.
JOIN
is a relational operator: it takes two relations as parameters and the result is another relation.
Relational operators can be strung together. Consider your query written in the relational language Tutorial D:
Assuming x
and y
are suitably declared relation variables (relvars):
x := a MATCHING b;
y := x JOIN c {jerseynum};
Alternatively:
y := a JOIN c {jerseynum};
x := y MATCHING b;
However, the above forces an order of execution on the optimizer: assigning the intermediate results to relvars is essentially telling the optimizer how to do it's job (i.e. not good). They can be strung together e.g. as follows:
a MATCHING b JOIN c {jerseynum};
The SQL FROM
clause works in a similar way i.e. no need to assign to intermediate (derived) tables. The optimizer is free to evaluate them in any order it sees fit. Trust the optimizer :)
a joined with b, then the result set is joined with c. (If you use MS SQL Server, you can see this process in query execution plan).
After it parsed your query, the database engine will generate a plan which describes the actual steps to be taken to get the results of the query. You should examine your actual plan to get an insight what is really going on. Basically the optimizer will choose the order of joins regardless of the order you wrote in the sql. The actual order of joins will depend among other things on the indexes and on the statistics kept on the data. see this article on query optimizer http://research.microsoft.com/pubs/76059/pods98-tutorial.pdf
精彩评论