开发者

Compute median of column in SQL common table expression

开发者 https://www.devze.com 2023-01-06 20:00 出处:网络
In MSSQL2008, I am trying to compute the median of a column of numbers from a common table expression using the classic median que开发者_JS百科ry as follows:

In MSSQL2008, I am trying to compute the median of a column of numbers from a common table expression using the classic median que开发者_JS百科ry as follows:

WITH cte AS
(
   SELECT number
   FROM table
) 

SELECT cte.*,
(SELECT 
  (SELECT (   
    (SELECT TOP 1 cte.number  
     FROM     
     (SELECT TOP 50 PERCENT cte.number     
      FROM cte
      ORDER BY cte.number) AS medianSubquery1   
    ORDER BY cte.number DESC)  
    +   
  (SELECT TOP 1 cte.number
   FROM     
    (SELECT TOP 50 PERCENT cte.number    
     FROM cte   
     ORDER BY cte.number DESC) AS medianSubquery2   
   ORDER BY cte.number ASC) ) / 2)) AS median

FROM cte
ORDER BY cte.number

The result set that I get is the following:

NUMBER    MEDIAN
x1        x1
x1        x1
x1        x1
x2        x2
x3        x3

In other words, the "median" column is the same as the "number" column when I would expect the median column to be "x1" all the way down. I use a similar expression to compute the mode and it works fine over the same common table expression.


Here's a slightly different way to do it:

WITH cte AS
(
   SELECT number
   FROM table1
)
SELECT T1.number, T3.median
FROM cte T1, 
(
    SELECT AVG(number) AS median
    FROM
    (
        SELECT number, ROW_NUMBER() OVER(ORDER BY number) AS rn
        FROM cte
    ) T2
    WHERE T2.rn = ((SELECT COUNT(*) FROM table1) + 1) / 2
    OR T2.rn = ((SELECT COUNT(*) FROM table1) + 2) / 2
) T3


The problem with your query is that you are doing

SELECT TOP 1 cte.number FROM...

but it isn't correlated with the sub query it is correlated with the Outer query so the subquery is irrelevant. Which explains why you simply end up with the same value all the way down. Removing the cte. (as below) gives the median of the CTE. Which is a constant value. What are you trying to do?

WITH cte AS
    ( SELECT NUMBER
    FROM master.dbo.spt_values
    WHERE TYPE='p'
    )

SELECT cte.*,
(SELECT 
  (SELECT (   
    (SELECT TOP 1 number  
     FROM     
     (SELECT TOP 50 PERCENT cte.number     
      FROM cte
      ORDER BY cte.number) AS medianSubquery1   
    ORDER BY number DESC)  
    +   
  (SELECT TOP 1 number
   FROM     
    (SELECT TOP 50 PERCENT cte.number    
     FROM cte   
     ORDER BY cte.number DESC) AS medianSubquery2   
   ORDER BY number ASC) ) / 2)) AS median
FROM cte
ORDER BY cte.number

Returns

NUMBER      median
----------- -----------
0           1023
1           1023
2           1023
3           1023
4           1023
5           1023
6           1023
7           1023


This is not an entirely new answer as it mostly expands on Mark Byer's answer, but there are a couple of options for simplifying the query even further.

The first thing is to really make use of CTE's. Not only can you have multiple CTE's, but they can refer to each other. With this in mind, we can create an additional CTE to compute the median based on the results of the first. This encapsulates the median computation and leaves the actual SELECT to do only what it needs to do. Note that the ROW_NUMBER() had to be moved into the first CTE.

;WITH cte AS
(
   SELECT number, ROW_NUMBER() OVER(ORDER BY number) AS rn
   FROM table1
),
med AS
(
    SELECT AVG(number) AS median
    FROM cte
    WHERE cte.rn = ((SELECT COUNT(*) FROM cte) + 1) / 2
    OR cte.rn = ((SELECT COUNT(*) FROM cte) + 2) / 2
)
SELECT cte.number, med.median
FROM cte
CROSS JOIN med

And to further reduce complexity, you "could" use a custom CLR Aggregate to handle the Median (such as the one provided in the free SQL# library at http://www.SQLsharp.com/ [which I am the author of]).

;WITH cte AS
(
   SELECT number
   FROM table1
),
med AS
(
    SELECT  SQL#.Agg_Median(cte.number) AS median
    FROM    cte
)
SELECT cte.number, med.median
FROM cte
CROSS JOIN med
0

精彩评论

暂无评论...
验证码 换一张
取 消