Why does the comparison of value
to null
return false, except when using a NOT IN
, where it returns true?
Given a query to find all stackoverflow users who have a post:
SELECT * FROM Users
WHERE UserID IN (SELECT UserID FROM Posts)
This works as expected; i get a list of all users who have a post.
Now query for the inverse; find all stackoverflow users who don't have a post:
SELECT * F开发者_Go百科ROM Users
WHERE UserID NOT IN (SELECT UserID FROM Posts)
This returns no records, which is incorrect.
Given hypothetical data1
Users Posts
================ ===============================
UserID Username PostID UserID Subject
------ -------- ------- ------ ----------------
1 atkins 1 1 Welcome to stack ov...
2 joels 2 2 Welcome all!
... ... ... ...
399573 gt6989b ... ...
... ... ... ...
10592 null (deleted by nsl&fbi...
... ...
And assume the rules of NULLs:
NULL = NULL
evaluates to unknownNULL <> NULL
evaluates to unknownvalue = NULL
evaluates unknown
If we look at the 2nd query, we're interested in finding all rows where the Users.UserID is not found in the Posts.UserID column. i would proceed logically as follows:
Check UserID 1
1 = 1
returns true. So we conclude that this user has some posts, and do not include them in the output list
Now check UserID 2:
2 = 1
returns false, so we keep looking2 = 2
returns true, so we conclude that this user has some posts, and do not include them in the output list
Now check UserID 399573
399573 = 1
returns false, so we keep looking399573 = 2
returns false, so we keep looking- ...
399573 = null
returns unknown, so we keep looking- ...
We found no posts by UserID 399573, so we would include him in the output list.
Except SQL Server doesn't do this. If you have a NULL in your in
list, then suddenly it finds a match. It suddenly finds a match. Suddenly 399573 = null
evaluates to true.
Why does the comparison of value
to null
return unknown, except when it returns true?
Edit: i know that i can workaround this nonsensical behavior by specifically excluding the nulls:
SELECT * FROM Users
WHERE UserID NOT IN (
SELECT UserID FROM Posts
WHERE UserID IS NOT NULL)
But i shouldn't have to, as far as i can tell the boolean logic should be fine without it - hence my question.
Footnotes
- 1 hypothetical data; if you don't like it: make up your down.
- celko now has his own tag
Common problem, canned answer:
The behavior of NOT IN clause may be confusing and as such it needs some explanations. Consider the following query:
SELECT LastName, FirstName FROM Person.Contact WHERE LastName NOT IN('Hedlund', 'Holloway', NULL)
Although there are more than a thousand distinct last names in AdventureWorks.Person.Contact, the query returns nothing. This may look counterintuitive to a beginner database programmer, but it actually makes perfect sense. The explanation consist of several simple steps. First of all, consider the following two queries, which are clearly equivalent:
SELECT LastName, FirstName FROM Person.Contact
WHERE LastName IN('Hedlund', 'Holloway', NULL)
SELECT LastName, FirstName FROM Person.Contact
WHERE LastName='Hedlund' OR LastName='Holloway' OR LastName=NULL
Note that both queries return expected results. Now, let us recall DeMorgan's theorem, which states that:
not (P and Q) = (not P) or (not Q)
not (P or Q) = (not P) and (not Q)
I am cutting and pasting from Wikipedia (http://en.wikipedia.org/wiki/De_Morgan_duality). Applying DeMorgan's theorem to this queries, it follows that these two queries are also equivalent:
SELECT LastName, FirstName FROM Person.Contact WHERE LastName NOT IN('Hedlund', 'Holloway', NULL)
SELECT LastName, FirstName FROM Person.Contact
WHERE LastName<>'Hedlund' AND LastName<>'Holloway' AND LastName<>NULL
This last LastName<>NULL can never be true
The assumption in your first sentence isn't right:
Why does the comparison of value to null return false, except when using a NOT IN, where it returns true?
But comparison of a value to null does not return false
; it returns unknown
. And unknown
has its own logic:
unknown AND true = unknown
unknown OR true = true
unknown OR false = unknown
One example of how this works out:
where 1 not in (2, null)
--> where 1 <> 2 and 1 <> null
--> where true and unknown
--> where unknown
The where
clause only matches on true
, so this filters out any row.
You can find the full glory of 3 value logic at Wikipedia.
精彩评论