I have two tables orders, and line_items with following structure:
Orders (id = PK, indexes on user_id)
-------------------------------------
id user_id
== ======
1 1
2 2
3 1
4 3
5 1
LineItems (id = PK, indexes on order_id and product_id)
id order_id product_id quantity
== ====== ======== ======
1 1 1 1
2 1 2 2
3 2 1 4
4 2 3 6
5 3 1 1
6 4 1 1
7 5 1 1
I am trying to find the most efficient way to solve the following requirements:
Given a
user
and aproduct
find theLineItems
belonging toOrders
where given product is the only product ordered. E.g: If user_id is 1 and product_id is 1, the query should return line items 5 and 7.Given a
user
and aproduct
find theOrders
where given product is the only product ordered. E.g: If user_id is 1 and product_id is 1, the query should return orders 3 and 5.
The Orders and LineItems table can have millions of rows.
I have a working solution that uses COUNT
and HAVING
. I am not certain that this is the most efficient solution.
Specifically, I am wondering if this can be addressed by using the technique outlined by Cletus
in this answer.
Note: I am using Orders and LineItems tables to describe the s开发者_如何学编程cenario. My actual table is quite different and it not related to order etc.
Edit 2
Is this query efficient than using GROUP BY
and HAVING
?
SELECT A.id
FROM LineItems A
JOIN Orders B ON B.id = A.order_id AND B.user_id = 1
LEFT OUTER JOIN LineItems C ON C.order_id = A.order_id AND
C.product_id != A.product_id
WHERE A.product_id = 1 AND C.id IS NULL
select o.id OrderID, MIN(i.id) LineItemID
from orders o
inner join lineitems i on i.order_id = o.id
where o.user_id= 1
group by o.id
having count(*)=1
GROUP BY, HAVING, COUNT is the most efficient for this type of query. Basically it will scan the required data fully, but only within the user's orders, but in that single pass will produce the result.
You can kill two birds with one stone, since for orders with a single line item, min(i.id) gives you the (only) LineItemID.
Indexes you NEED to have: orders.user_id
, lineitems.order_id
select
*
from
(
select
*
from
LineItems
group by
order_id
having count(*) = 1
) l
inner join Orders o on l.order_id = o.id and user_id =1 and product_id =1
If you have really huge project and really huge amount of data then it would be better to have "similar goods" to be precalculated, and refreshed by some scheduler (once a day, hour, week, ...) or some "trigger" (after new good was added).
It is impossible to make the queries that you mentioned (using COUNT + HAVING + GROUP BY) to be highly performant.
Count(*) =1
is special: you don't need to actually count to detect it
You could for instance use NOT EXISTS to pick the wanted tuples:
SELECT id
FROM lineitems li
WHERE NOT EXISTS (
SELECT *
FROM lineitems nx
WHERE nx.order_id = li.order_id
AND nx.id <> li.id
)
;
This (sub)query can be very fast (most codegenerators will detect it as an ANTI-join). The grouping (on order_id) will still be needed internally, but the counting can be omitted. (the subquery can return false once the first duplicate order_id is encountered)
精彩评论