开发者

PostgreSQL window function: grouping by column without ordering (~ Python itertools.groupby)

开发者 https://www.devze.com 2023-02-12 04:17 出处:网络
I need to partition a table in PostgreSQL based on a column without sorting & making the result unique; Basically what I am trying to achieve is to reproduce the itertools.groupby() behavior from

I need to partition a table in PostgreSQL based on a column without sorting & making the result unique; Basically what I am trying to achieve is to reproduce the itertools.groupby() behavior from Python in PostgreSQL.

Given the table containing two columns:

1   C
2   C
3   C
4   C
5   C
6   C
7   C
8   C
9   C
10  S
11  E
12  E
13  E
14  E
15  E
16  C
17  C
18  C
19  C
20  E
21  E
22  E
23  E
24  E

I want to partition it by the value in the second (whilst preserving the existing order), to end up with this:

1   C   1
2   C   1
3   C   1
4   C   1
5   C   1
6   C   1
7   C   1
8   C   1
9   C   1
10  S   2
11  E   3
12  E   3
13  E   3
14  E   3
15  E   3
16  C   4
17  C   4
18  C   4
19  C   4
20  E   5
21  E   5
22  E   5
23  E   5
24  E   5

I tried to achieve that w开发者_C百科ith window functions, using a combination of ROW_NUMBER() and LAG() to compare the current row with the previous to see if it has changed. The problem in that case was that I would also need a variable that increments every time the value changes.


Try this:

WITH T1 AS
(
    SELECT
        id,
        grp,
        LAG(grp) OVER (ORDER BY id) IS DISTINCT FROM grp AS changes
    FROM yourtable
)
SELECT id, grp, SUM(changes::int) OVER (ORDER BY id) FROM T1


Although you can come up with a solution based entirely on window functions, this kind of stateful processing of the result set seems best suited to an iterative approach. Here's a question that asked something a bit similar before: PostgreSQL function to iterate through/act on many rows with state

Basically the window-only approach comes down to producing an integer column with a "1" (or any other positive integer) at the start of each partition (so where col2<>LAG(col2)) and then using another window aggregation to sum those group indicators from the start of the result set up to the current row. It works, it's just ugly both in complexity and (I believe) in performance.

0

精彩评论

暂无评论...
验证码 换一张
取 消