Suppose I have a grocery store application with a table of purchases:
customerId int
itemId int
Four customers come into the store:
Bob buys a banana, lemonade, and a cookie
Kevin buys a banana, lemonade, and a donut
Sam buys a banana, orange juice, and a cupcake
Susie buys a banana
I am trying to write a query which would return which combinations of items are most popular. In this case, the results of this query should be:
banana and lemonade-2
I have already written a query which tells me a list of all items which were in a multi-item purchase (we exclude sales of one item - it cannot form a "combination"). It returns:
banana - 3
lemonade - 2
cookie - 1
donut - 1
cupcake - 1
orange juice - 1
Here is the query:
SELE开发者_JS百科CT itemId, count( * )
FROM grocery_store
INNER JOIN (
SELECT customerId
FROM grocery_store
GROUP BY customerId
HAVING count( itemId ) > 1
)subQuery ON subQuery.customerId = grocery_store.customerId
GROUP BY itemId;
Could I get a pointer about how to expand my existing query to get the desired output?
select a.itemID, b.itemID, COUNT(*) countForCombination
from grocery_store a
inner join grocery_store b
on a.customer_id = b.customer_id
and a.itemID < b.itemID
group by a.itemID, b.itemID
order by countForCombination desc
Assumed:
grocery_store = sales records
customer_id = unique sale
- This query takes all the grocery_store records and for each single sales transaction, it creates all the possible combinations (a.itemid, b.itemid) in a specific order (a.itemid
- This specific order eliminates duplicates (apple, orange) is kept whereas (orange, apple) is not necessary.
- After producing all the combinations from all sales, a simple group by and sorting by count is used to show the most popular combinations at the top
精彩评论