So, let's say I have a users
table and a pages
table. Let's say that I want to allow users to hide/forget/ignore certain pages. At this point, I can think of two possible approaches:
An outer join with null
matching:
I can create a separate ignored_pages
table with columns user_id
and page_id
, writing to it INSERT INTO ignored_pages (user_id, page_id) VALUES (1,2);
when the user with ID of 1 ignores a page with ID of 2.
Then I can run something like SELECT pages.* FROM pages LEFT OUTER JOIN ignored_pages ON pages.id = ignored_pages.page_id WHERE ignored_pages.user_id = 1 AND ignored_pages.id IS NULL;
EDIT: Joe Stefanelli pointed on an error in my query. It should be SELECT pages.* FROM pages LEFT OUTER JOIN ignored_pages ON pages.id = ignored_pages.page_id AND ignored_pages.user_id = 1 WHERE ignored_pages.id IS NULL;
A subquery and NOT IN
:
I can use the same many-to-many table and then run something like SELECT pages.* FROM pages WHERE pag开发者_如何学Ce_id NOT IN (SELECT page_id FROM ignored_pages WHERE user_id = 1);
.
Is there a best practice or series of rules of thumb or (likely) a better approach to this problem than the one I'm taking?
Your best performance may actually be a NOT EXISTS
SELECT p.*
FROM pages p
WHERE NOT EXISTS(SELECT 1
FROM ignored_pages
WHERE user_id = 1
AND page_id = p.id);
If you decide to stick with your LEFT JOIN
option, you'll need to correct that query to test user_id
on the join condition rather than the WHERE
clause.
SELECT pages.*
FROM pages
LEFT OUTER JOIN ignored_pages
ON pages.id = ignored_pages.page_id
AND ignored_pages.user_id = 1
WHERE ignored_pages.id IS NULL;
For best performance you should use NOT EXISTS
SELECT pages.*
FROM pages
WHERE NOT EXISTS(
SELECT NULL
FROM ignored_pages
WHERE user_id = 1 AND ignored_pages.page_id = pages.page_id)
Also check the link http://explainextended.com/2009/09/16/not-in-vs-not-exists-vs-left-join-is-null-postgresql/
This page has a good comparison of using LEFT OUTER JOIN vs NOT EXISTS. A related link on that page shows that NOT EXISTS is either comparable, or faster than NOT IN, at least for the example case on that blog. The first link shows that NOT EXISTS performs almost two times better (cpu cycles, and execution time) than LEFT OUTER JOIN, so long as you have an index on all of the columns being joined/matched on.
The index for ignored_pages would likely look something like this:
CREATE UNIQUE CLUSTERED INDEX IX_Ignored_Pages ON ignored_pages (user_id, page_id);
Adapting to your code, the NOT EXISTS syntax would look something like this:
SELECT p.*
FROM pages p
WHERE NOT EXISTS (
SELECT 1
FROM ignored_pages i
WHERE i.user_id = @user_id
AND i.page_id = p.page_id
);
精彩评论