开发者

SQL Optimize - From History table get value from two different dates

开发者 https://www.devze.com 2023-03-29 16:52 出处:网络
Not sure where to start...But basically I have a report table, an account table, and an account history table.The account history table will have zero or more records, where each record is the state o

Not sure where to start... But basically I have a report table, an account table, and an account history table. The account history table will have zero or more records, where each record is the state of the account cancelled flag after it changed. There is other stuff going on, but basically i am looking to return the account detail data, with the state of account cancelled bit on the start date and enddate as different columns.

What is the best way to do this?

I have the following working query below

(Idea) Should I do seperate joins on history table, 1 for each date?

I guess I could do it in three separate queries ( Get Begin Snapshot, End Snapshot, Normal Report query with a join to each snapshot)

something else?

Expected output:

AccountID, OtherData, StartDateCancelled, EndDateCancelled

Test Tables:

DECLARE @Report TABLE (ReportID INT, StartDate DATETIME, EndDate DATETIME)
DECLARE @ReportAcc开发者_JAVA百科ountDetail TABLE( ReportID INT, Accountid INT, Cancelled BIT )
DECLARE @AccountHistory TABLE( AccountID INT, ModifiedDate DATETIME, Cancelled BIT )

INSERT INTO @Report
SELECT 1,'1/1/2011', '2/1/2011'
--
INSERT INTO @ReportAccountDetail
SELECT 1 AS ReportID, 1 AS AccountID, 0 AS Cancelled
UNION
SELECT 1,2,0
UNION
SELECT 1,3,1
UNION
SELECT 1,4,1
--
INSERT INTO @AccountHistory
SELECT 2 AS CustomerID, '1/2/2010' AS ModifiedDate, 1 AS Cancelled
UNION--
SELECT 3, '2/1/2011', 1
UNION--
SELECT 4, '1/1/2010', 1
UNION
SELECT 4, '2/1/2010', 0
UNION
SELECT 4, '2/1/2011', 1

Current Query:

SELECT Accountid, OtherData,
 MAX(CASE WHEN BeginRank = 1 THEN CASE WHEN BeginHistoryExists = 1 THEN HistoryCancelled ELSE DefaultCancel END ELSE NULL END ) AS StartDateCancelled,
 MAX(CASE WHEN EndRank = 1 THEN CASE WHEN EndHistoryExists = 1 THEN HistoryCancelled ELSE DefaultCancel END ELSE NULL END ) AS EndDateCancelled
FROM
(
SELECT c.Accountid,
'OtherData' AS OtherData,
--lots of other data
ROW_NUMBER() OVER (PARTITION BY c.AccountID ORDER BY
    CASE WHEN ch.ModifiedDate <= Report.StartDate THEN 1 ELSE 0 END DESC, ch.ModifiedDate desc) AS BeginRank,
CASE WHEN ch.ModifiedDate <= Report.StartDate THEN 1 ELSE 0 END AS BeginHistoryExists,
ROW_NUMBER() OVER ( PARTITION BY c.AccountID ORDER BY
    CASE WHEN ch.ModifiedDate <= Report.EndDate THEN 1 ELSE 0 END DESC, ch.ModifiedDate desc) AS EndRank,
CASE WHEN ch.ModifiedDate <= Report.EndDate THEN 1 ELSE 0 END AS EndHistoryExists,
    CAST( ch.Cancelled AS INT) AS HistoryCancelled,
    0 AS DefaultCancel
FROM
@Report AS Report
INNER JOIN @ReportAccountDetail AS C ON Report.ReportID = C.ReportID
--Others joins related for data to return
LEFT JOIN @AccountHistory AS CH ON CH.AccountID = C.AccountID
WHERE Report.ReportID = 1
) AS x
GROUP BY AccountID, OtherData

Welcome input on writing stack overflow questions. Thanks!


ROW_NUMBER() often suprises me and out-performs my expectations. In this case, however, I'd be tempted to just use correlated sub-queries. At least, I'd test them against the alternatives.

Note: I would also use real tables, with real indexes, and a realistic volume of fake data. (If it's worth posting this question, I'm assuming that it's worth testing this realistically.)

SELECT
  [Report].ReportID,
  [Account].AccountID,
  [Account].OtherData,
  ISNULL((SELECT TOP 1 Cancelled FROM AccountHistory WHERE AccountID = [Account].AccountID AND ModifiedDate <= [Report].StartDate ORDER BY ModifiedDate DESC), 0) AS StartDateCancelled,
  ISNULL((SELECT TOP 1 Cancelled FROM AccountHistory WHERE AccountID = [Account].AccountID AND ModifiedDate <= [Report].EndDate   ORDER BY ModifiedDate DESC), 0) AS EndDateCancelled
FROM
  Report                 AS [Report]
LEFT JOIN
  ReportAccountDetail    AS [Account]
    ON [Account].ReportID = [Report].ReportID
ORDER BY
  [Report].ReportID,
  [Account].AccountID

Note: For whatever reason, I've found that TOP 1 and ORDER BY is faster than MAX().


In terms of your suggested answer, I'd modify it slightly to just use ISNULL instead of trying to make the Exists columns work.

I'd also join on the "other data" after all of the working out, rather than inside the inner-most query, so as to avoid having to group by all the "other data".

WITH
  HistoricData AS
(
  SELECT
    Report.ReportID,
    c.Accountid,
    c.OtherData,
    ROW_NUMBER() OVER (PARTITION BY c.ReportID, c.AccountID ORDER BY CASE WHEN ch.ModifiedDate <= Report.StartDate THEN 1 ELSE 0 END DESC, ch.ModifiedDate DESC) AS BeginRank,
    ROW_NUMBER() OVER (PARTITION BY c.ReportID, c.AccountID ORDER BY ch.ModifiedDate DESC) AS EndRank,
    CH.Cancelled
  FROM
    @Report AS Report
  INNER JOIN
    @ReportAccountDetail AS C
      ON Report.ReportID = C.ReportID
  LEFT JOIN
    @AccountHistory AS CH
      ON  CH.AccountID     = C.AccountID
      AND CH.ModifiedDate <= Report.EndDate
)
,
  FlattenedData AS
(
  SELECT
    ReportID,
    Accountid,
    OtherData,
    ISNULL(MAX(CASE WHEN BeginRank = 1 THEN Cancelled END), 0) AS StartDateCancelled,
    ISNULL(MAX(CASE WHEN EndRank   = 1 THEN Cancelled END), 0) AS EndDateCancelled
  FROM
    [HistoricData]
  GROUP BY
    ReportID,
    AccountID,
    OtherData
)
SELECT
  *
FROM
  [FlattenedData]
LEFT JOIN
  [OtherData]
    ON Whatever = YouLike
WHERE
  [FlattenedData].ReportID = 1


And a final possible version...

WITH
  ReportStartHistory AS
(
  SELECT
    *
  FROM
  (
    SELECT
      [Report].ReportID,
      ROW_NUMBER() OVER (PARTITION BY [Report].ReportID, [History].AccountID ORDER BY [History].ModifiedDate) AS SequenceID,
      [History].*
    FROM
      Report                 AS [Report]
    INNER JOIN
      AccountHistory         AS [History]
        ON [History].ModifiedDate <= [Report].StartDate
  )
    AS [data]
  WHERE
    SequenceID = 1
)
,
  ReportEndHistory AS
(
  SELECT
    *
  FROM
  (
    SELECT
      [Report].ReportID,
      ROW_NUMBER() OVER (PARTITION BY [Report].ReportID, [History].AccountID ORDER BY [History].ModifiedDate) AS SequenceID,
      [History].*
    FROM
      Report                 AS [Report]
    INNER JOIN
      AccountHistory         AS [History]
        ON [History].ModifiedDate <= [Report].EndDate
  )
    AS [data]
  WHERE
    SequenceID = 1
)
SELECT
  [Report].ReportID,
  [Account].*,
  ISNULL([ReportStartHistory].Cancelled, 0) AS StartDateCancelled,
  ISNULL([ReportEndHistory].Cancelled,   0) AS EndDateCancelled
FROM
  Report                     AS [Report]
INNER JOIN
  Account                    AS [Account]
LEFT JOIN
  [ReportStartHistory]
    ON  [ReportStartHistory].ReportID  = [Report].ReportID
    AND [ReportStartHistory].AccountID = [Account].AccountID
LEFT JOIN
  [ReportEndHistory]
    ON  [ReportEndHistory].ReportID    = [Report].ReportID
    AND [ReportEndHistory].AccountID   = [Account].AccountID
0

精彩评论

暂无评论...
验证码 换一张
取 消