Need help improving query performance_问答_开发者

I need help with improving the performance of the following SQL query. The database design of this application is based on OLD mainframe entity designs. All the query does is returns a list of clients based on some search criteria:

@Advisers: Only returns clients which was captured by this adviser.
@outlets: just ignore this one
@searchtext: (firstname, surname, suburb, policy number) any combination of that

What I'm doing is creating a temporary table, then query all the tables involved, creating my own dataset, and then insert that dataset into a easily understandable table (@clients)

This query takes 20 seconds to execute and currently only returns 7 rows!

Screenshot of all table count can be found here: Table Record Count

Any ideas where I can start to optimize this query?

ALTER PROCEDURE [dbo].[spOP_SearchDashboard] 
    @advisers varchar(1000),
    @outlets varchar(1000),
    @searchText varchar(1000)
AS
BEGIN
-- SET NOCOUNT ON added to prevent extra result sets from
-- interfering with SELECT statements.
SET NOCOUNT ON;

-- Set the prefixes to search for (firstname, surname, suburb, policy number)

DECLARE @splitSearchText varchar(1000)
SET     @splitSearchText = REPLACE(@searchText, ' ', ',')

DECLARE @AdvisersListing TABLE
(
    adviser varchar(200)
) 

DECLARE @SearchParts TABLE
(
    prefix varchar(200)
) 

DECLARE @OutletListing TABLE
(
    outlet varchar(200)
) 

INSERT INTO @AdvisersListing(adviser)
SELECT part as adviser FROM SplitString (@advisers, ',')

INSERT INTO @SearchParts(prefix)
SELECT part as prefix FROM SplitString (@splitSearchText, ',')

INSERT INTO @OutletListing(outlet)
SELECT part as outlet FROM SplitString (@outlets, ',')

DECLARE @Clients TABLE
(
    source varchar(2),
    adviserId bigint, 
    integratedId varchar(50), 
    rfClientId bigint, 
    ifClientId uniqueidentifier, 
    title varchar(30), 
    firstname varchar(100), 
    surname varchar(100), 
    address1 varchar(500), 
    address2 varchar(500), 
    suburb varchar(100), 
    state varchar(100), 
    postcode varchar(100), 
    policyNumber varchar(100), 
    lastAccess datetime,
    deleted bit
)

    INSERT INTO @Clients
       SELECT 
          source, adviserId, integratedId, rfClientId, ifClientId, title, 
          firstname, surname, address1, address2, suburb, state, postcode, 
          policyNumber, max(lastAccess) as lastAccess, deleted
       FROM 
          (SELECT DISTINCT
              'RF' as Source,
              advRel.SourceEntityId as adviserId,
              cast(pe.entityId as varchar(50)) AS IntegratedID,
              pe.entityId AS rfClientId, 
              cast(ifClient.Id as uniqueidentifier) as ifClientID,
              ISNULL(p.title, '') AS title,
              ISNULL(p.firstname, '') AS firstname, 
              ISNULL(p.surname, '') AS surname, 
              ISNULL(ct.address1, '') AS address1, 
              ISNULL(ct.address2, '') AS address2, 
              ISNULL(ct.suburb, '') AS suburb, 
              ISNULL(ct.state, '') AS state, 
              ISNULL(ct.postcode, '') AS postcode,
              ISNULL(contract.policyNumber,'') AS policyNumber,
              coalesce(pp.LastAccess, d_portfolio.dateCreated, pd.dateCreated) AS lastAccess,
              ISNULL(client.deleted, 0) as deleted
           FROM     
               tbOP_Entity pe 
           INNER JOIN tbOP_EntityRelationship advRel ON pe.EntityId = advRel.TargetEntityId  
                                                     AND advRel.RelationshipId = 39 
           LEFT OUTER JOIN tbOP_Data pd ON pe.EntityId = pd.entityId 
           LEFT OUTER JOIN tbOP__Person p ON pd.DataId = p.DataId
           LEFT OUTER JOIN tbOP_EntityRelationship ctr ON pe.EntityId = ctr.SourceEntityId 
                                                       AND ctr.RelationshipId = 79 
           LEFT OUTER JOIN tbOP_Data ctd ON ctr.TargetEntityId = ctd.entityId 
           LEFT OUTER JOIN tbOP__Contact ct ON ctd.DataId = ct.DataId 
           LEFT OUTER JOIN tbOP_EntityRelationship  ppr ON pe.EntityId = ppr.SourceEntityId 
                                                        AND ppr.RelationshipID = 113 
           LEFT OUTER JOIN tbOP_Data ppd ON ppr.TargetEntityId = ppd.EntityId 
           LEFT OUTER JOIN tbOP__Portfolio pp ON ppd.DataId = pp.DataId 
           LEFT OUTER JOIN tbOP_EntityRelationship er_policy ON ppd.EntityId = er_policy.SourceEntityId 
                                                             AND er_policy.Relationship开发者_StackOverflow社区Id = 3
           LEFT OUTER JOIN tbOP_EntityRelationship er_contract ON er_policy.TargetEntityId = er_contract.SourceEntityId AND er_contract.RelationshipId = 119
           LEFT OUTER JOIN tbOP_Data d_contract ON er_contract.TargetEntityId = d_contract.EntityId
           LEFT OUTER JOIN tbOP__Contract contract ON d_contract.DataId = contract.DataId
           LEFT JOIN tbOP_Data d_portfolio ON ppd.EntityId = d_portfolio.EntityId
           LEFT JOIN tbOP__Portfolio pt ON d_portfolio.DataId = pt.DataId
           LEFT JOIN tbIF_Clients ifClient on pe.entityId = ifClient.RFClientId
           LEFT JOIN tbOP__Client client on client.DataId = pd.DataId
        where 
           p.surname <> '' 
           AND (advRel.SourceEntityId IN (select adviser from @AdvisersListing)
                OR 
                pp.outlet COLLATE SQL_Latin1_General_CP1_CI_AS in (select outlet from @OutletListing)
               ) 
       ) as RFClients
group by 
        source, adviserId, integratedId, rfClientId, ifClientId, title, 
        firstname, surname, address1, address2, suburb, state, postcode, 
        policyNumber, deleted

SELECT * FROM @Clients --THIS ONLY RETURNS 10 RECORDS WITH MY CURRENT DATASET

END

Clarifying questions

What is the MAIN piece of data that you are querying on - advisers, search-text, outlets?

It feels like your criteria allows for users to search in many different ways. A sproc will always use exactly the SAME plan for every question you ask of it. You get better performance by using several sprocs - each tuned for a specific search scenario (i.e I bet you could write something blazingly fast for querying just by policy-number).

If you can separate your search-text into INDIVIDUAL parameters then you may be able to:

Search for adviser relationships matching your supplied list - store in temp table (or table variable).
IF ANY surnames have been specified then delete all records from temp which aren't for people with your supplied names.
Repeat for other criteria lists - all the time reducing your temp records.
THEN join to the outer-join stuff and return the results.

In your notes you say that outlets can be ignored. If this is true then taking them out would simplify your query. The "or" clause in your example means that SQL-Server needs to find ALL relationships for ALL portfolios before it can realistically get down to the business of filtering the results that you actually want.

Breaking the query up

Most of you query consists of outer-joins that are not involved in filtering. Try moving these joins into a separate select (i.e. AFTER you have applied all of your criteria). When SQL-Server sees lots of tables then it switches off some of its possible optimisations. So your first step (assuming that you always specify advisers) is just:

SELECT advRel.SourceEntityId as adviserId, 
    advRel.TargetEntityId AS rfClientId
INTO #temp1
FROM @AdvisersListing advisers
INNER JOIN tbOP_EntityRelationship advRel
ON advRel.SourceEntityId = advisers.adviser
AND advRel.RelationshipId = 39;

The link to tbOP_Entity (aliased as "pe") does not look like it is needed for its data. So you should be able to replace all references to "pe.EntityId" with "advRel.TargetEntityId".

The DISTINCT clause and the GROUP-BY are probably trying to achieve the same thing - and both of them are really expensive. Normally you find ONE of these used when a previous developer has not been able to get his results right. Get rid of them - check your results - if you get duplicates then try to filter the duplicates out. You may need ONE of them if you have temporal data - you definitely don't need both.

Indexes

Make sure that the @AdvisersListing.adviser column is same datetype as SourceEntityId and that SourceEntityId is indexed. If the column has a different datatype then SQL-Server won't want to use the index (so you would want to change the data-type on @AdvisersListing).

The tbOP_EntityRelationship tables sounds like it should have an index something like:

CREATE UNIQUE INDEX advRel_idx1 ON tbOP_EntityRelationship (SourceEntityId,
    RelationshipId, TargetEntityId);

If this exists then SQL-Server should be able to get everything it needs by ONLY going to the index pages (rather than to the table pages). This is known as a "covering" index.

There should be a slightly different index on tbOP_Data (assuming it has a clustered primary key on DataId):

CREATE INDEX tbOP_Data_idx1 ON tbOP_Data (entityId) INCLUDE (dateCreated);

SQL-Server will store the keys from the table's clustered index (which I assume will be DataId) along with the value of "dateCreated" in the index leaf pages. So again we have a "covering" index.

Most of the other tables (tbOP__Client, etc) should have indexes on DataId.

Query plan

Unfortunately I couldn't see the explain-plan picture (our firewall ate it). However 1 useful tip is to hover your mouse over some of the join lines. It tells you how many records be accessed.

Watch out for full-table-scans. If SQL-Server needs to use them then its pretty-much given up on your indexes.

Database structure

Its been designed as a transaction database. The level of normalization (and all of the EntityRelationship-this and Data-that are really painful for reporting). You really need to consider having a separate reporting database that unravels some of this information into a more usable structure.

If you are running reports directly against your production database then I would expect a bunch of locking problems and resource contention.

Hope this has been useful - its the first time I've posted here. Has been ages since I last tuned a query in my current company (they have a bunch of stern-faced DBAs for sorting this sort of thing out).

Looking at your execution plan... 97% of the cost of your query is in processing the DISTINCT clause. I'm not sure it is even necessary since you are taking all that data and doing a group by on it anyway. You might want to take it out and see how that affects the plan.

That kind of query is just going to take time, with that many joins and that many temp tables, there's just nothing easy or efficient about it. One trick I have been using is using local variables. It might not be an all out solution, bit if it shaves a few seconds, it's worth it.

DECLARE @Xadvisers varchar(1000)
DECLARE @Xoutlets varchar(1000)
DECLARE @XsearchText varchar(1000)

SET @Xadvisers = @advisers
SET @Xoutlets = @outlets
SET @XsearchText = @searchText

Believe me, I have tested it thoroughly, and it helps with complicated scripts. Something about the way SQL Server handles local variables. Good luck!