I have a table likes the below.
id session_id start_time answer_time
1 111 2022-12-06 13:40:50 2022-12-06 13:40:55
2 111 2022-12-06 13:40:51 Null
3 111 2022-12-06 13:40:57 开发者_运维知识库 Null
4 222 2022-12-06 13:40:58 Null
5 222 2022-12-06 13:41:10 Null
6 222 2022-12-06 13:41:10 Null
7 333 2022-12-06 13:46:10 2022-12-06 13:46:15
8 333 2022-12-06 13:46:18 2022-12-06 13:46:20
There are three sessions in the table, with session ids 111, 222, and 333; Each session has multiple records, but the session_id is the same; and the session is successful or unsuccessfulis depends on answer_time
is Null or not of the smallest id record of that session.
The id 1
and id 4
and id 7
records in the above sample table determine whether a session is successful or unsuccessful.
I have the below SQL to query it, and it works well.
WITH t AS
(
SELECT DISTINCT ON (session_id) start_time, answer_time
FROM logs
WHERE ((SELECT NOW() AT TIME ZONE 'UTC') - start_time < interval '24 HOURS')
ORDER BY logs.session_id, id
)
SELECT
COUNT(*) FILTER (WHERE (answer_time IS NOT NULL)) AS sccess_count,
COUNT(*) FILTER (WHERE (answer_time IS NULL)) AS fail_count
FROM t;
But if the DB table have about 50M records, the query taken 20 seconds, this is unacceptable in the production environment, how can I optimize it? My goal is less than 1 second for the 50M records.
精彩评论