开发者

Efficiently pull different columns using a common correlated subquery

开发者 https://www.devze.com 2023-03-09 01:57 出处:网络
I need to pull multiple columns from a subquery which also requires a WHERE filter referencing columns of the FROM table. I have a couple of questions about this:

I need to pull multiple columns from a subquery which also requires a WHERE filter referencing columns of the FROM table. I have a couple of questions about this:

  1. Is there another solution to this problem besides mine below?
  2. Is another solution even necessary or is this solution efficient enough?

Example:

In the following example I'm writing a view to present test scores, particularly to discover failures that may need to be addressed or retaken.

I cannot simply use JOIN because I need to filter my actual subquery first (notice I'm getting TOP 1 for the "examinee", sorted either by score or date descending)

My goal is to avoid writing (and executing) essentially the same subquery repeatedly.

SELECT ExamineeID, LastName, FirstName, Email,
   (SELECT COUNT(examineeTestID)
    FROM exam.ExamineeTest tests
    WHERE E.ExamineeID = ExamineeID AND TestRevisionID = 3 AND TestID = 2) Attempts,
   (SELECT TOP 1 ExamineeTestID
    FROM exam.ExamineeTest T
    WHERE E.ExamineeID = ExamineeID AND TestRevisionID = 3 AND TestID = 2
    ORDER BY Score DESC) bestExamineeTestID,
   (SELECT TOP 1 Score
    FROM exam.ExamineeTest T
    WHERE E.ExamineeID = ExamineeID AND TestRevisionID = 3 AND TestID = 2
    ORDER BY Score DESC) be开发者_开发技巧stScore,
   (SELECT TOP 1 DateDue
    FROM exam.ExamineeTest T
    WHERE E.ExamineeID = ExamineeID AND TestRevisionID = 3 AND TestID = 2
    ORDER BY Score DESC) bestDateDue,
   (SELECT TOP 1 TimeCommitted
    FROM exam.ExamineeTest T
    WHERE E.ExamineeID = ExamineeID AND TestRevisionID = 3 AND TestID = 2
    ORDER BY Score DESC) bestTimeCommitted,
   (SELECT TOP 1 ExamineeTestID
    FROM exam.ExamineeTest T
    WHERE E.ExamineeID = ExamineeID AND TestRevisionID = 3 AND TestID = 2
    ORDER BY DateDue DESC) currentExamineeTestID,
   (SELECT TOP 1 Score
    FROM exam.ExamineeTest T
    WHERE E.ExamineeID = ExamineeID AND TestRevisionID = 3 AND TestID = 2
    ORDER BY DateDue DESC) currentScore,
   (SELECT TOP 1 DateDue
    FROM exam.ExamineeTest T
    WHERE E.ExamineeID = ExamineeID AND TestRevisionID = 3 AND TestID = 2
    ORDER BY DateDue DESC) currentDateDue,
   (SELECT TOP 1 TimeCommitted
    FROM exam.ExamineeTest T
    WHERE E.ExamineeID = ExamineeID AND TestRevisionID = 3 AND TestID = 2
    ORDER BY DateDue DESC) currentTimeCommitted
FROM exam.Examinee E


To answer your second question first, yes, a better way is in order, because the query you're using is hard to understand, hard to maintain, and even if the performance is acceptable now, it's a shame to query the same table multiple times when you don't need to plus the performance may not always be acceptable if your application ever grows to an appreciable size.

To answer your first question, I have a few methods for you. These assume SQL 2005 or up unless where noted.

Note that you don't need BestExamineeID and CurrentExamineeID because they will always be the same as ExamineeID unless no tests were taken and they're NULL, which you can tell from the other columns being NULL.

You can think of OUTER/CROSS APPLY as an operator that lets you move correlated subqueries from the WHERE clause into the JOIN clause. They can have an outer reference to a previously-named table, and can return more than one column. This enables you to do the job only once per logical query rather than once for each column.

SELECT
   ExamineeID,
   LastName,
   FirstName,
   Email,
   B.Attempts,
   BestScore = B.Score,
   BestDateDue = B.DateDue,
   BestTimeCommitted = B.TimeCommitted,
   CurrentScore = C.Score,
   CurrentDateDue = C.DateDue,
   CurrentTimeCommitted = C.TimeCommitted
FROM
   exam.Examinee E
   OUTER APPLY ( -- change to CROSS APPLY if you only want examinees who've tested
      SELECT TOP 1
         Score, DateDue, TimeCommitted,
         Attempts = Count(*) OVER ()
      FROM exam.ExamineeTest T
      WHERE
         E.ExamineeID = T.ExamineeID
         AND T.TestRevisionID = 3
         AND T.TestID = 2
      ORDER BY Score DESC
   ) B
   OUTER APPLY ( -- change to CROSS APPLY if you only want examinees who've tested
      SELECT TOP 1
         Score, DateDue, TimeCommitted
      FROM exam.ExamineeTest T
      WHERE
         E.ExamineeID = T.ExamineeID
         AND T.TestRevisionID = 3
         AND T.TestID = 2
      ORDER BY DateDue DESC
   ) C

You should experiment to see if my Count(*) OVER () is better than having an additional OUTER APPLY that just gets the count. If you're not restricting the Examinee from the exam.Examinee table, it may be better to just do a normal aggregate in a derived table.

Here's another method that (sort of) goes and gets all the data in one swoop. It conceivably could perform better than other queries, except my experience is that windowing functions can get very and surprisingly expensive in some situations, so testing is in order.

WITH Data AS (
   SELECT
      *,
      Count(*) OVER (PARTITION BY ExamineeID) Cnt,
      Row_Number() OVER (PARTITION BY ExamineeID ORDER BY Score DESC) ScoreOrder,
      Row_Number() OVER (PARTITION BY ExamineeID ORDER BY DateDue DESC) DueOrder
   FROM
      exam.ExamineeTest
), Vals AS (
   SELECT
      ExamineeID,
      Max(Cnt) Attempts,
      Max(CASE WHEN ScoreOrder = 1 THEN Score ELSE NULL END) BestScore,
      Max(CASE WHEN ScoreOrder = 1 THEN DateDue ELSE NULL END) BestDateDue,
      Max(CASE WHEN ScoreOrder = 1 THEN TimeCommitted ELSE NULL END) BestTimeCommitted,
      Max(CASE WHEN DueOrder = 1 THEN Score ELSE NULL END) BestScore,
      Max(CASE WHEN DueOrder = 1 THEN DateDue ELSE NULL END) BestDateDue,
      Max(CASE WHEN DueOrder = 1 THEN TimeCommitted ELSE NULL END) BestTimeCommitted
   FROM Data
   GROUP BY
      ExamineeID
)
SELECT
   E.ExamineeID,
   E.LastName,
   E.FirstName,
   E.Email,
   V.Attempts,
   V.BestScore, V.BestDateDue, V.BestTimeCommitted,
   V.CurrentScore, V.CurrentDateDue, V.CurrentTimeCommitted
FROM
   exam.Examinee E
   LEFT JOIN Vals V ON E.ExamineeID = V.ExamineeID
   -- change join to INNER if you only want examinees who've tested

Finally, here's a SQL 2000 method:

SELECT
   E.ExamineeID,
   E.LastName,
   E.FirstName,
   E.Email,
   Y.Attempts,
   Y.BestScore, Y.BestDateDue, Y.BestTimeCommitted,
   Y.CurrentScore, Y.CurrentDateDue, Y.CurrentTimeCommitted
FROM
   exam.Examinee E
   LEFT JOIN ( -- change to inner if you only want examinees who've tested
      SELECT
         X.ExamineeID,
         X.Cnt Attempts,
         Max(CASE Y.Which WHEN 1 THEN T.Score ELSE NULL END) BestScore,
         Max(CASE Y.Which WHEN 1 THEN T.DateDue ELSE NULL END) BestDateDue,
         Max(CASE Y.Which WHEN 1 THEN T.TimeCommitted ELSE NULL END) BestTimeCommitted,
         Max(CASE Y.Which WHEN 2 THEN T.Score ELSE NULL END) CurrentScore,
         Max(CASE Y.Which WHEN 2 THEN T.DateDue ELSE NULL END) CurrentDateDue,
         Max(CASE Y.Which WHEN 2 THEN T.TimeCommitted ELSE NULL END) CurrentTimeCommitted
      FROM
         (
            SELECT ExamineeID, Max(Score) MaxScore, Max(DueDate) MaxDueDate, Count(*) Cnt
            FROM exam.ExamineeTest
            WHERE
               TestRevisionID = 3
               AND TestID = 2
            GROUP BY ExamineeID
         ) X
         CROSS JOIN (SELECT 1 UNION ALL SELECT 2) Y (Which)
         INNER JOIN exam.ExamineeTest T
            ON X.ExamineeID = T.ExamineeID
            AND (
               (Y.Which = 1 AND X.MaxScore = T.MaxScore)
               OR (Y.Which = 2 AND X.MaxDueDate = T.MaxDueDate)
            )
      WHERE
         T.TestRevisionID = 3
         AND T.TestID = 2
      GROUP BY
         X.ExamineeID,
         X.Cnt
   ) Y ON E.ExamineeID = Y.ExamineeID

This query will return unexpected extra rows if the combination of (ExamineeID, Score) or (ExamineeID, DueDate) can return multiple rows. That's probably not unlikely with Score. If neither is unique, then you need to use (or add) some additional column that can grant uniqueness so it can used to select one row. If only Score can be duplicated then an additional pre-query that gets the max Score first, then dovetailing in with the max DueDate would combine to pull the most recent score that was a tie for the highest at the same time as getting the most recent data. Let me know if you need more SQL 2000 help.

Note: The biggest thing that is going to control whether CROSS APPLY or a ROW_NUMBER() solution is better is whether you have an index on the columns that are being looked up and whether the data is dense or sparse.

  • Index + you're pulling only a few examinees with lots of tests each = CROSS APPLY wins.
  • Index + you're pulling a huge number of examines with only a few tests each = ROW_NUMBER() wins.
  • No index = string concatenation/value packing method wins (not shown here).

The group by solution that I gave for SQL 2000 will probably perform the worst, but not guaranteed. Like I said, testing is in order.

If any of my queries do give performance problems let me know and I'll see what I can do to help. I'm sure I probably have typos as I didn't work up any DDL to recreate your tables, but I did my best without trying it.

If performance really does become crucial, I would create ExamineeTestBest and ExamineeTestCurrent tables that get pushed to by a trigger on the ExamineeTest table that would always keep them updated. However, this is denormalization and probably not necessary or a good idea unless you've scaled so awfully big that retrieving results becomes unacceptably long.


It's not same subquery. It's three different subqueries.

  • count() on all
  • TOP (1) ORDER BY Score DESC
  • TOP (1) ORDER BY DateDue DESC

You can't avoid executing it less than 3 times.
The question is, how to make it execute no more than 3 times.


One option would be to write 3 inline table functions and use them with outer apply. Make sure they are actually inline, otherwise your performance will drop a hundred times. One of these three functions might be:

create function dbo.topexaminee_byscore(@ExamineeID int)
returns table
as
return (
  SELECT top (1)
    ExamineeTestID as bestExamineeTestID,
    Score as bestScore,
    DateDue as bestDateDue,
    TimeCommitted as bestTimeCommitted
  FROM exam.ExamineeTest
  WHERE (ExamineeID = @ExamineeID) AND (TestRevisionID = 3) AND (TestID = 2)
  ORDER BY Score DESC
)

Another option would be to do essentially the same, but with subqueries. Because you fetch data for all students anyway, there shouldn't be too much of a difference performance-wise. Create three subqueries, for example:

select bestExamineeTestID, bestScore, bestDateDue, bestTimeCommitted
from (
  SELECT
    ExamineeTestID as bestExamineeTestID,
    Score as bestScore,
    DateDue as bestDateDue,
    TimeCommitted as bestTimeCommitted,
    row_number() over (partition by ExamineeID order by Score DESC) as takeme
  FROM exam.ExamineeTest
  WHERE (TestRevisionID = 3) AND (TestID = 2)
) as foo
where foo.takeme = 1

Same for ORDER BY DateDue DESC and for all records, with respective columns being selected.

Join these three on the examineeid.

What is going to be better/more performant/more readable is up to you. Do some testing.


It looks like you can replace the three columns that are based on the alias "bestTest" with a view. All three of those subqueries have the same WHERE clause and the same ORDER BY clause.

Ditto for the subquery aliased "bestNewTest". Ditto ditto for the subquery aliased "currentTeest".

If I counted right, that would replace 8 subqueries with 3 views. You can join on the views. I think the joins would be faster, but if I were you, I'd check the execution plan of both versions.


You could use a CTE and OUTER APPLY.

;WITH testScores AS
(
    SELECT ExamineeID, ExamineeTestID, Score, DateDue, TimeCommitted
    FROM exam.ExamineeTest
    WHERE TestRevisionID = 3 AND TestID = 2
)
SELECT ExamineeID, LastName, FirstName, Email, total.Attempts,
       bestTest.*, currentTest.*
FROM exam.Examinee
LEFT OUTER JOIN
(
    SELECT ExamineeID, COUNT(ExamineeTestID) AS Attempts
    FROM testScores
    GROUP BY ExamineeID
) AS total ON exam.Examinee.ExamineeID = total.ExamineeID
OUTER APPLY
(
    SELECT TOP 1 ExamineeTestID, Score, DateDue, TimeCommitted
    FROM testScores
    WHERE exam.Examinee.ExamineeID = t.ExamineeID
    ORDER BY Score DESC
) AS bestTest (bestExamineeTestID, bestScore, bestDateDue, bestTimeCommitted)
OUTER APPLY
(
    SELECT TOP 1 ExamineeTestID, Score, DateDue, TimeCommitted
    FROM testScores
    WHERE exam.Examinee.ExamineeID = t.ExamineeID
    ORDER BY DateDue DESC
) AS currentTest (currentExamineeTestID, currentScore, currentDateDue, 
                  currentTimeCommitted)
0

精彩评论

暂无评论...
验证码 换一张
取 消