I'm reading a paper on SCOPE that discusses SQL like query semantics for big data applications. It does not follow how S开发者_开发技巧QL deals with null values and discusses "null-extended" variables, which I have not encountered before. Consider the pseudo-query
SELECT * FROM DATA WHERE A != B
What does "the predicate A != B is satisfied only for rows that are null extended on B" mean?
The term "null extended" is used generally to refer to the set algebra in a modern DBMS. That is, it "extends" regular relational algebra by introducing NULL
values, or rather a single universal NULL
value. Every predicate involving a NULL
has a defined result that is logically consistent with the rest of the algebra.
I've also seen the term used to refer to outer joins. For example, this query:
SELECT A.id, A.name AS NameA, B.name AS NameB
FROM A
LEFT JOIN B
ON B.id = A.id
Might give you the following results:
id | NameA | NameB
------+-----------+-----------
10 | John | Smith
11 | Jane | Doe
12 | Bob | NULL
What's happening here is that for id 12, A is being "null extended" with the columns from B because there are no values from B available. In general, when you perform a join on two relations, A and B, and you want to include tuples in A that have no matching tuples in B (outer join), then A must be null-extended with the attributes of B in order to form a complete result set.
This specific line that you put in quotations:
the predicate A != B is satisfied only for rows that are null extended on B
...doesn't really make sense when taken out of context. You have to look at the whole thing:
SQ = SELECT DISTINCT Sc FROM S WHERE Sa < 20;
M1 = SELECT Ra, Rb, Rc FROM R WHERE Rb < 100;
M2 = SELECT Ra, Rb, Rc, Sc
FROM M1 LEFT OUTER JOIN SQ ON Rc == Sc;
Q = SELECT Ra, Rb FROM M2
WHERE Ra > 5 OR Rc != Sc;
And a little bit later:
M1 rows that do not join with any rows from SQ are null-extended on Sc. The fourth select (Q) computes the final result by applying the predicate referencing the subquery. Note that the predicate Rc != Sc is satisfied only for rows that are null-extended on Sc.
Now with some context, it's easier to understand what they're trying to say. Since the join condition in M2 is Rc == Sc
, it follows that the condition Rc != Sc
can only be true if Sc
is NULL
- otherwise, Sc
would be equal to Rc
because that's how it was joined. In other words, the condition Rc != Sc
can only be true for the rows in M2
where M1
was null-extended with the columns from SQ
because it did not match any rows in SQ
.
Hopefully that clears up some of the confusion.
精彩评论