I am trying to improve query readability by using a function in SQL Server Express 2008. Here is a sample of what I'm trying to do.
I have a table where we store a max temperatures reading per hour of the day, then I want to select all days where the max temperature between 8-10AM was greater than the max temp between 12-2PM
So here is how it's like:
DECLARE @TableA TABLE ([Date] DATE, [Time] TIME(0), HighTemp DECIMAL(6,2));
INSERT @TableA VALUES
('2011-09-10','08:00:00',38.15),
('2011-09-10','09:00:00',38.32),
('2011-09-10','10:00:00',38.17),
('2011-09-10','11:00:00',38.10),
('2011-09-10','12:00:00',38.05),
('2011-09-10','13:00:00',38.15),
('2011-09-10','14:00:00',38.30),
('2011-09-11','08:00:00',38.12),
('2011-09-11','09:00:00',38.09),
('2011-09-11','10:00:00',38.07),
('2011-09-11','11:00:00',38.15),
('2011-09-11','12:00:00',38.13),
('2011-09-11','13:开发者_如何学Go00:00',38.11),
('2011-09-11','14:00:00',38.10),
('2011-09-12','08:00:00',38.30),
('2011-09-12','09:00:00',38.33),
('2011-09-12','10:00:00',38.35),
('2011-09-12','11:00:00',38.30),
('2011-09-12','12:00:00',38.25),
('2011-09-12','13:00:00',38.23),
('2011-09-12','14:00:00',38.20)
select distinct [DATE] from @TableA maintbl
where
-- Select the high temp between 08:00:00-10:00:00
(select MAX(HighTemp) from @TableA tmptbl where tmptbl.Time >= '08:00:00' and tmptbl.Time <= '10:00:00' and maintbl.Date = tmptbl.Date)
>
-- Select the high between 12:00:00-14:00:00
(select MAX(HighTemp) from @TableA tmptbl where tmptbl.Time >= '12:00:00' and tmptbl.Time <= '14:00:00' and maintbl.Date = tmptbl.Date)
The query runs well (fast) and the result for the above query should be: 2011-09-10 2011-09-12
Now, I have tried to simplify the query by using a function that retrieves the max tempreture in a certain day and time period, so the query is easier to read, like so:
select distinct [DATE] from @TableA maintbl
where GetPeriodHigh(maintbl.Date, '08:00:00', '10:00:00') > GetPeriodHigh(maintbl.Date, '12:00:00', '14:00:00')
And the function is like so:
CREATE FUNCTION [dbo].[GetPeriodHigh]
(
@Date date,
@From time,
@To time
)
RETURNS decimal(6,2)
AS
BEGIN
declare @res decimal(6,2)
select @res = MAX(high) from MyTable
where Time >= @from and Time <= @to and Date = @Date
return @res
END
The problem is that running the query using the function takes LOOONG time, actually I never saw it finishes, looks like it is in some sort of infinite loop...
Any ideas why is that, and there is anything I can do to simplify my query?
Thx.
Scalar functions that do data access generally suck and are best avoided. They don't get expanded out by the optimiser and this basically enforces the function query as being the inner side of a nested loops join regardless of suitability.
Making things worse you may well not have correct indexing to evaluate the Time >= @from and Time <= @to and Date = @Date
predicate inside the function which means that for each row in the outer query you are enforcing 2 table scans via the function calls.
This lack of indexes is also the case in your source example and it can be seen with the inline version that the query optimiser is able to effectively rewrite this as two MAX
/ GROUP BY
queries with different WHERE
clauses then merge join the results together. When the logic is in scalar UDFs this kind of transformation is not currently considered.
Another approach you could try is
SELECT [Date]
FROM @TableA
WHERE Time BETWEEN '08:00:00' AND '10:00:00'
OR Time BETWEEN '12:00:00' AND '14:00:00'
GROUP BY [Date]
HAVING MAX(CASE
WHEN Time BETWEEN '08:00:00' AND '10:00:00' THEN HighTemp END) >
MAX(CASE
WHEN Time BETWEEN '12:00:00' AND '14:00:00' THEN HighTemp END)
To improve performance, try rewriting the scalar valued UDF as an inline table valued UDF.
Some links :
http://sqlblog.com/blogs/alexander_kuznetsov/archive/2008/05/23/reuse-your-code-with-cross-apply.aspx
http://sqlblog.com/blogs/alexander_kuznetsov/archive/2008/04/21/not-all-udfs-are-bad-for-performance.aspx
精彩评论