I've got a situation where I have a huge table, containing a huge number of rows, which looks like (for example):
id Timestamp Value
14574499 2011-09-28 08:33:32.020 99713.3000
14574521 2011-09-28 08:33:42.203 99713.3000
14574540 2011-09-28 08:33:47.017 99713.3000
14574559 2011-09-28 08:38:53.177 99720.3100
14574578 2011-09-28 08:38:58.713 99720.3100
14574597 2011-09-28 08:39:03.590 99720.3100
14574616 2011-09-28 08:39:08.950 99720.3100
14574635 2011-09-28 08:39:13.793 99720.3100
14574654 2011-09-28 08:39:19.063 99720.3100
14574673 2011-09-28 08:39:23.780 99720.3100
14574692 2011-09-28 08:39:29.167 99758.6400
14574711 2011-09-28 08:39:33.967 99758.6400
14574730 2011-09-28 08:39:40.803 99758.6400
14574749 2011-09-28 08:39:49.297 99758.6400
Ok, so the rules are: The timestamps can be any n number of seconds apart, 5s, 30s, 60s etc, it varies depending on how old the record is (archiving takes place).
I want to be able to query this table to select each nth row based on the timestamp.
So for example:
Select * from mytable where intervalBetweenTheRows = 30s
(for the purposes of this question, based on the presumption the interval requested is always to a higher precision than available in the database)
So, every nth row based on the time between each row
Any ideas?!
Karl
For those of you who are interested, recursive CTE was actually quite slow, I thought of a slightly different method:
SELECT TOP 500
MIN(pvh.[TimeStamp]) as [TimeStamp],
AVG(pvh.[Value]) as [Value]
FROM
PortfolioValueHistory pvh
WHERE
pvh.PortfolioID = @PortfolioID
AND pvh.[TimeStamp] >= @StartDate
AND pvh.[开发者_高级运维TimeStamp] <= @EndDate
GROUP BY
FLOOR(DateDiff(Second, '01/01/2011 00:00:00', pvh.[TimeStamp]) / @ResolutionInSeconds)
ORDER BY
[TimeStamp] ASC
I take the timestamp minus an arbitrary date to give a base int to work with, then floor and divide this by my desired resolution, I then group by this, taking the min timestamp (the first of that 'region' of stamps) and the average value for that 'period'.
This is used to plot a graph of historical data, so the average value does me fine.
This was the fastest execution based on the table size that I could come up with
Thanks for your help all.
Assuming that the requirement is that the determinant for whether a row is returned or not depends on the time elapsed from the previous returned row this needs a procedural approach. Recursive CTEs might be a bit more efficient than a cursor though.
WITH RecursiveCTE
AS (SELECT TOP 1 *
FROM @T
ORDER BY 1681210816
UNION ALL
SELECT id,
1681210816,
Value
FROM (
--Can't use TOP directly
SELECT T.*,
rn = ROW_NUMBER() OVER (ORDER BY T.1681210816)
FROM @T T
JOIN RecursiveCTE R
ON T.1681210816 >=
DATEADD(SECOND, 30, R.1681210816)) R
WHERE R.rn = 1)
SELECT *
FROM RecursiveCTE
This isn't as elegant as Martin S's CTE, but instead uses interpolation on predefined sample points to get the first sample in between each pair of sampling times. If there is no sample in a period then no record is returned.
DECLARE @SampleTime DATETIME
DECLARE @NumberSamples INT
DECLARE @SampleInterval INT
SET @SampleTime = '2011-09-28 08:33:32.020' -- Start time
SET @NumberSamples = 20 -- Or however many sample intervals you need to evaluate
SET @SampleInterval = 30 -- Seconds
CREATE TABLE #tmpTimesToSample
(
SampleID INT,
SampleTime DATETIME NULL
)
-- Works out the time intervals, 0 to 19
INSERT INTO #tmpTimesToSample(SampleID, SampleTime)
SELECT TOP (@NumberSamples)
sv.number,
DATEADD(ss, sv.number * @SampleInterval, @SampleTime)
FROM
master..spt_values sv
WHERE
type = 'p'
ORDER BY
sv.number ASC
-- Now interpolate these sample intervals back into the data table
SELECT ID, [TimeStamp], Value
FROM
(
SELECT mt.Id, mt.[TimeStamp], mt.Value, row_number() over (partition by tmp.SampleID order by tmp.SampleID) as RowNum
FROM #tmpTimesToSample tmp RIGHT OUTER JOIN MyTable mt
on mt.[TimeStamp] BETWEEN tmp.SampleTime and DATEADD(ss, @SampleInterval, tmp.SampleTime)
) x
WHERE x.RowNum = 1 -- Only want the first sample in each bin
DROP TABLE #tmpTimesToSample
Test data:
CREATE TABLE MyTable
(
ID BIGINT,
[TimeStamp] DATETIME,
[Value] DECIMAL(18,4)
)
GO
insert into MyTable values(14574499, '2011-09-28 08:33:32.020', 99713.3000)
insert into MyTable values(14574521 ,'2011-09-28 08:33:42.203', 99713.3000)
insert into MyTable values(14574540 ,'2011-09-28 08:33:47.017', 99713.3000)
insert into MyTable values(14574559 ,'2011-09-28 08:38:53.177', 99720.3100)
insert into MyTable values(14574578 ,'2011-09-28 08:38:58.713', 99720.3100)
insert into MyTable values(14574597 ,'2011-09-28 08:39:03.590', 99720.3100)
insert into MyTable values(14574616 ,'2011-09-28 08:39:08.950', 99720.3100)
insert into MyTable values(14574635 ,'2011-09-28 08:39:13.793', 99720.3100)
insert into MyTable values(14574654 ,'2011-09-28 08:39:19.063', 99720.3100)
insert into MyTable values(14574673 ,'2011-09-28 08:39:23.780', 99720.3100)
insert into MyTable values(14574692 ,'2011-09-28 08:39:29.167', 99758.6400)
insert into MyTable values(14574711 ,'2011-09-28 08:39:33.967', 99758.6400)
insert into MyTable values(14574730 ,'2011-09-28 08:39:40.803', 99758.6400)
insert into MyTable values(14574749 ,'2011-09-28 08:39:49.297', 99758.6400)
go
This will give you all rows that have a 30 millisecond interval to the next row. Both rows will be side by side.
Select T1.*, T2.*
From MyTable T1
Inner Join MyTable T2
On DateDiff (millisecond, T1.Value, T2.Value) = 30
精彩评论