开发者

Efficient query to find singular orders for a product

开发者 https://www.devze.com 2023-02-20 17:57 出处:网络
I have two tables orders, and line_items with following structure: Orders (id = PK, indexes on user_id)

I have two tables orders, and line_items with following structure:

Orders (id = PK, indexes on user_id)
-------------------------------------
id   user_id
==   ======
1     1
2     2
3     1
4     3
5     1

LineItems (id = PK, indexes on order_id and product_id)
id   order_id product_id quantity
==   ======   ========   ======
1      1         1       1
2      1         2       2
3      2         1       4
4      2         3       6
5      3         1       1
6      4         1       1
7      5         1       1

I am trying to find the most efficient way to solve the following requirements:

  • Given a user and a product find the LineItems belonging to Orders where given product is the only product ordered. E.g: If user_id is 1 and product_id is 1, the query should return line items 5 and 7.

  • Given a user and a product find the Orders where given product is the only product ordered. E.g: If user_id is 1 and product_id is 1, the query should return orders 3 and 5.

The Orders and LineItems table can have millions of rows.

I have a working solution that uses COUNT and HAVING. I am not certain that this is the most efficient solution.

Specifically, I am wondering if this can be addressed by using the technique outlined by Cletus in this answer.

Note: I am using Orders and LineItems tables to describe the s开发者_如何学编程cenario. My actual table is quite different and it not related to order etc.

Edit 2

Is this query efficient than using GROUP BY and HAVING?

SELECT A.id
FROM   LineItems A
JOIN   Orders B ON B.id = A.order_id AND B.user_id = 1
LEFT OUTER JOIN LineItems C ON C.order_id = A.order_id AND 
                               C.product_id != A.product_id   
WHERE  A.product_id = 1 AND C.id IS NULL


select o.id OrderID, MIN(i.id) LineItemID
from orders o
inner join lineitems i on i.order_id = o.id
where o.user_id= 1
group by o.id
having count(*)=1

GROUP BY, HAVING, COUNT is the most efficient for this type of query. Basically it will scan the required data fully, but only within the user's orders, but in that single pass will produce the result.

You can kill two birds with one stone, since for orders with a single line item, min(i.id) gives you the (only) LineItemID.

Indexes you NEED to have: orders.user_id, lineitems.order_id


select 
  * 
from 
  (
    select 
      *  
    from 
      LineItems   
    group by 
      order_id 
    having count(*) = 1 
  ) l 
    inner join Orders o on l.order_id = o.id and user_id =1 and product_id =1


If you have really huge project and really huge amount of data then it would be better to have "similar goods" to be precalculated, and refreshed by some scheduler (once a day, hour, week, ...) or some "trigger" (after new good was added).

It is impossible to make the queries that you mentioned (using COUNT + HAVING + GROUP BY) to be highly performant.


Count(*) =1 is special: you don't need to actually count to detect it You could for instance use NOT EXISTS to pick the wanted tuples:

SELECT id
FROM lineitems li
WHERE NOT EXISTS (
    SELECT *
    FROM lineitems nx
    WHERE nx.order_id = li.order_id
    AND nx.id <> li.id
    )
    ;

This (sub)query can be very fast (most codegenerators will detect it as an ANTI-join). The grouping (on order_id) will still be needed internally, but the counting can be omitted. (the subquery can return false once the first duplicate order_id is encountered)

0

精彩评论

暂无评论...
验证码 换一张
取 消