I am creating a stored procedure with Sql Server 2008 which will return 2 result sets. The first query returns a result set that I would like to resuse as in the second query as a subquery (see example below). However, since the first query and the subquery essentially re开发者_JAVA技巧turn the same data, I was wondering if there is some caching meachanism that I can use. Is it possible to do that? I am trying to optimize for performance.
SELECT *
FROM Employees
WHERE BossId = 1
SELECT *
FROM CostCenters
WHERE EmployeeId IN (
SELECT EmployeeId
FROM Employees
WHERE BossId = 1
)
P.S. The example is a simplified problem.
You can cache CTEs by reusing the query plan. This requires injecting the Eager Spool between the resultset produced by the function. Quassnoi makes use of it in this article, but I can't find a better example at this time. Here's another good read on Eager Spool.
Table Variables are your best option. You can also improve performance by using the exists
operator for the subquery rather than in
:
-- obviously the columns should match your Employees table
declare @results table (
employeeId int,
column1 varchar,
column2 int
)
insert into @results
select * from Employees
where BossId = 1
-- using exists/not exists performs much better than in
select * from CostCenters
where exists ( select 0
from @results as r
where CostCenters.employeeId = r.employeeId )
As far as I know you would need to use either a temp table or table variable for this. A comparison of the two is here.
The below uses the OUTPUT clause to fill the table variable and select from it in one statement.
declare @MatchingResults table
(
EmployeeId int primary key --Other Columns
)
INSERT INTO @MatchingResults
OUTPUT INSERTED.*
SELECT EmployeeId --Other Columns
FROM Employees
WHERE BossId = 1
SELECT *
FROM CostCenters
WHERE EmployeeId IN (
SELECT EmployeeId
@MatchingResults))
Caching the data of the first query will probably NOT result in better performance. When SQL Server receives the query it breaks it down to simple steps, chooses the proper indexes and operators and retrieves the data using those indexes. By storing the first query's data in a table variable or temporary table you are preventing SQL Server from using any indexes on the Employees table.
If you rewrite your query to its equivalent using JOIN it's easier to see what happens
SELECT c.*
FROM CostCenters c INNER JOIN Employees e on c.EmployeeId=e.EmployeeId
WHERE e.BossId=1
When SQL Server sees this query it will check the statistics of the tables. If BossId is a highly selective indexed column it may first try to filter by this. Otherwise it will use any indexes on the EmployeeId columns to limit rows from both tables to a minimum, then BossId to find the proper rows and return them.
Filtering operations on indexes are quite fast as the indexes contain only a subset of the row data, are easier to cache in memory and have a physical structure that allows quick searching.
You really shouldn't try to second-guess SQL Server's query optimizer before you encounter an actual performance problem. Most of the time you will prevent it from selecting the best execution plan and result in worse performance
The best solution I can think of is to go with CTE
http://msdn.microsoft.com/en-us/library/ms190766.aspx
精彩评论