I am joining two tables (shipments and returns) and using group by to view totals for certain criteria. The two tables are related via shipment_id. this column is mostly unique, but contains a few duplicates because each shipment can contain more than one item that is also contained in the table.
I'm trying to count all the distinct shipments grouped by warehouse, seller, and size. count(distinct works great, but does not report correct information when used with group by if the range of items being grouped is significant.
The query below returns 7 shipments (added up) 4 returns (also added). Whil开发者_如何学编程e with the small amount of test data I have the return count is correct, there are in actuality 6 distinct shipments, not 7. With this query i'm basically looking at all shipments and joining return information if an item in the shipment has been returned.
select s.warehouse, s.seller, s.size,
count(distinct s.shipment_id) as total_shipments,
count(distinct r.shipment_id) as total_returns
from shipments s
left join returns r
on s.shipment_id = r.shipment_id
group by s.warehouse, s.seller, s.size
I'm concerned that the report I generate won't be entirely accurate. Is there a work around for this issue? I've seen similar issues, but none that really apply. I am using MYSQL
I see a potential problem. If a shipment has multiple items and may end up in duplicate shipment records, that means that it's possible that the shipment comes from different warehouses or sellers or that the size is different. By grouping by those fields, you risk ending with with shipment being calculated more then once since the shipment_id
is technically distinct for that group.
You could try grouping by s.shipment_id
instead of s.warehouse, s.seller, s.size
. The problem here is that if the warehouse, seller or size differs, you'll end up missing one row (for that warehouse/selling/size) but the totals will add up.
精彩评论