开发者

Replace selfjoin with analytic functions

开发者 https://www.devze.com 2022-12-23 05:19 出处:网络
How do I go about replacing the following self join using analytics: SELECT t1.col1 col1, t1.col2 col2,

How do I go about replacing the following self join using analytics:

SELECT 
t1.col1 col1,
t1.col2 col2,
SUM((extract(hour FROM (t1.times_stamp - t2.times_stamp)) * 3600 + extract(minute FROM ( t1.times_stamp - t2.times_stamp)) * 60 + extract(second FROM ( t1.times_stamp - t2.times_stamp)) ) ) div,
COUNT(*) tot_count
FROM tab1 t1,
tab1 t2
WHERE t2.col1      = t1.col1
AND t2.col2  = t1.col2
AND t2.col3        = t1开发者_开发知识库.sequence_num
AND t2.times_stamp     < t1.times_stamp
AND t2.col4         = 3
AND t1.col4         = 4
AND t2.col5 NOT IN(103,123)
AND t1.col5     != 549
GROUP BY t1.col1, t1.col2


I'm pretty sure you won't be able to replace the self-join with analytics because you are using inter-rows operations (t1.time_stamp - t2.time_stamp). Analytics can only access the values of the current row and the value of aggregate functions over a subset of rows (windowing clause).

See this article from Tom Kyte and this paper for further analysis of the limitations of analytics.


It almost looks like you could eliminate the self join on t2 and replace

t1.time_stamp - t2.time_stamp

with something like

t1.time_stamp - lag(t1.time_stamp) over (partition by col1, col2 order by time_stamp)

The different filters on t1 and t2 on col4 and col5 are what prevents you from doing this.
Analytic functions are applied after the where / group by on the main query, so you'd need to have a single filter on t1 in order to use lag/lead to specify following or preceding rows in a sequence.

Also, you'd need to push the sum/group by to an outer query to aggregate after the analytic function:

select col1, col2, sum(timestamp_diff) from (
  select col1, col2, timestamp - lag(timestamp) over(.....) as timestamp_diff
  where ....
) group by col1, col2
0

精彩评论

暂无评论...
验证码 换一张
取 消