This scenario is based upon a schema in another question and I'm not interested in any discussion about the validity of the schema!
I'm interested to know if there are any good techniques in SQL Server to perform an aggregation of one column (amount1
开发者_如何学C below) based on the distinct value of another column (id1
).
Plan1 below scans table1 twice, performs two aggregations by p_id then joins the result together. It seems as though this could be improved upon. Query 2 could return the wrong result in some circumstances and the plan is worse anyway!
Any ideas?
DDL
IF OBJECT_ID('tempdb..#table1') IS NOT NULL DROP TABLE #table1;
IF OBJECT_ID('tempdb..#table2') IS NOT NULL DROP TABLE #table2;
CREATE TABLE #table1 (id1 int primary key nonclustered, amount1 int, p_id int);
CREATE CLUSTERED INDEX ix ON #table1 (p_id,id1);
INSERT INTO #table1
SELECT 1,500,10 UNION ALL
SELECT 2,700,20 UNION ALL
SELECT 3,500,10 UNION ALL
SELECT 4,450,20 UNION ALL
SELECT 5,300,10;
CREATE TABLE #table2 (id2 int primary key, amount2 int, id1 int);
INSERT INTO #table2
SELECT 1,300,1 UNION ALL
SELECT 2,200,1 UNION ALL
SELECT 3,200,2 UNION ALL
SELECT 4,500,2 UNION ALL
SELECT 5,400,3 UNION ALL
SELECT 6,150,4 UNION ALL
SELECT 7,300,4 UNION ALL
SELECT 8,300,5;
Query 1
WITH t1
AS (SELECT p_id,SUM(amount1) AS total1
FROM #table1
GROUP BY p_id),
t2
AS (SELECT p_id,SUM(amount2) AS total2
FROM #table2 table2
JOIN #table1 table1
ON table1.id1 = table2.id1
GROUP BY p_id)
SELECT t1.p_id,total1,total2
FROM t1
JOIN t2
ON t1.p_id = t2.p_id
Plan 1
Query 2
SELECT table1.p_id,
FLOOR(SUM(DISTINCT amount1 + table1.id1/100000000.0)) AS total1,
SUM(amount2) AS total2
FROM #table1 table1 JOIN #table2 table2 ON table1.id1=table2.id1
GROUP BY table1.p_id
Plan 2
This one will scan each record in either table only once:
SELECT p_id, SUM(amount1) AS total1, SUM(s_amount2) AS total2
FROM #table1 t1
CROSS APPLY
(
SELECT SUM(amount2) AS s_amount2
FROM #table2 t2
WHERE t2.id1 = t1.id1
) t2
GROUP BY
p_id
|--Compute Scalar(DEFINE:([Expr1006]=CASE WHEN [Expr1026]=(0) THEN NULL ELSE [Expr1027] END, [Expr1007]=CASE WHEN [Expr1028]=(0) THEN NULL ELSE [Expr1029] END))
|--Stream Aggregate(GROUP BY:([t1].[p_id]) DEFINE:([Expr1026]=COUNT_BIG([tempdb].[dbo].[#table1].[amount1] as [t1].[amount1]), [Expr1027]=SUM([tempdb].[dbo].[#table1].[amount1] as [t1].[amount1]), [Expr1028]=COUNT_BIG([Expr1005]), [Expr1029]=SUM([Expr1005])))
|--Nested Loops(Left Outer Join, OUTER REFERENCES:([t1].[id1]))
|--Clustered Index Scan(OBJECT:([tempdb].[dbo].[#table1] AS [t1]), ORDERED FORWARD)
|--Compute Scalar(DEFINE:([Expr1005]=CASE WHEN [Expr1024]=(0) THEN NULL ELSE [Expr1025] END))
|--Stream Aggregate(DEFINE:([Expr1024]=COUNT_BIG([tempdb].[dbo].[#table2].[amount2] as [t2].[amount2]), [Expr1025]=SUM([tempdb].[dbo].[#table2].[amount2] as [t2].[amount2])))
|--Clustered Index Scan(OBJECT:([tempdb].[dbo].[#table2] AS [t2]), WHERE:([tempdb].[dbo].[#table2].[id1] as [t2].[id1]=[tempdb].[dbo].[#table1].[id1] as [t1].[id1]))
, though it's not necessarily more efficient.
This one:
SELECT p_id, SUM(amount1) AS total1, SUM(s_amount2) AS total2
FROM #table1 t1
JOIN (
SELECT id1, SUM(amount2) AS s_amount2
FROM #table2
GROUP BY
id1
) t2
ON t2.id1 = t1.id1
GROUP BY
p_id
will do the same with more options for the joins, however, an extra spool may be used in the plan if t2
will be chosen leading.
Well, @Quassnoi solution seems pretty good. In any case, for SQL Server 2005+, you can use the PARTITION BY
clause to try and make a simpler query, but the execution plan is not better, though it doesn't necessarily mean is more or less efficient.
SELECT A.p_id, MIN(amount1) total1, SUM(amount2) total2
FROM (SELECT p_id, id1, SUM(amount1) OVER(PARTITION BY p_id) amount1 FROM #table1) A
JOIN #table2 B
ON A.id1 = B.id1
GROUP BY A.p_id
精彩评论