开发者

25th percentile for multiple columns

开发者 https://www.devze.com 2023-01-24 16:43 出处:网络
Let\'s say I have a table with columns A, B, C, D, E, etc. where each column is an int.This table is populated with hundreds of rows with random values in each column.What is the optimal way to return

Let's say I have a table with columns A, B, C, D, E, etc. where each column is an int. This table is populated with hundreds of rows with random values in each column. What is the optimal way to return the value corresponding to the 25th percentile for each column?

For example:

A B C D E
1 5 8 9 3
3 6 5 0 2
8 3 6 1 8
4 1 8 3 1
7 2 2 6 9

Here column A is composed of the values 1, 3, 4, 8, and 7. The 25th percentile for this set should be 3. Column B is composed of the values 1, 2, 3, 5, and 6. 25th percent开发者_StackOverflow中文版ile for this set should be 2. C is 5, D is 1, and E is 2. Since there are 5 rows, the 25th percentile is simply the second smallest value. If there were 40 rows it would be the 10th smallest value. How can I efficiently write a query that returns:

A B C D E
3 2 5 1 2

What I've tried:

WITH 
    ATable as (
        SELECT A, CAST(NTILE(100) OVER (ORDER BY A) as int) as Percentile
        FROM MyTable
    ),
    BTable as (
        SELECT B, CAST(NTILE(100) OVER (ORDER BY B) as int) as Percentile
        FROM MyTable
    ),
    ...
SELECT
    (SELECT TOP 1 A FROM ATable where Percentile = 25) as A,
    (SELECT TOP 1 B FROM BTable where Percentile = 25) as B,
    ...

I have about 40 of these columns and my experience with my dataset and NTILE is with that many columns, performance becomes a serious problem. Does anyone have a better idea?


Your solution is correct. I don't see a whizzier solution.

You require a different row per column so you have to query each column separately. If you have 40 columns then it's 40 CTEs joined together. Simple.

Normally, you'd expect a whole row or group or rows or a subset of columns for the rows(s) that meet some criteria: not expect each column to be queried separately to get values unrelated to other values in the same row.


I am not sure if the following article is still relevant for the current versions of MS SQL server, but you might check it out

A Better NTILE

0

精彩评论

暂无评论...
验证码 换一张
取 消