开发者

How can I optimize this query for time performance?

开发者 https://www.devze.com 2022-12-21 15:21 出处:网络
I am working with a large amount of data: 6 million rows. I need the query to run as fast as possible, but am at a loss for further optimization. I already removed 3 subqueries and moved it from 11+ h

I am working with a large amount of data: 6 million rows. I need the query to run as fast as possible, but am at a loss for further optimization. I already removed 3 subqueries and moved it from 11+ hours to just 35 minutes on a modest dataset of 100k rows. See below!

declare @UserId uniqueidentifier;
set @UserId = '936DA01F-9ABD-4d9d-80C7-02AF85C822A8';


select
    temp.Address_Line1,
    temp.Cell_Phone_Number,
    temp.City,
    temp.CPM_delt_acd,
    temp.CPM_delt_date,
    temp.Customer_Id,
    temp.Customer_Type,
    temp.Date_Birth,
    temp.Email_Business,
    temp.Email_Home,
    temp.First_Name,
    temp.Geo,
    temp.Home_Phone_Number,
    temp.Last_Name,
    temp.Link_Customer_Id,
    temp.Middle_Name,
    temp.Naics_Code,
    temp.Office_Phone_Number,
    temp.St,
    temp.Suffix,
    temp.Tin,
    temp.TIN_Indicator,
    temp.Zip_Code,

    crm_c.contactid as CrmRecordId, 
    crm_c.ownerid as OldOwnerId, 
    crm_c.ext_profiletype as old_profileType,
    coalesce(crm_fim.ownerid, @UserId) as OwnerId,
    2 as profileType,

    case 
        when
            (temp.Tin = crm_c.ext_retail_prime_taxid collate database_default 
            and temp.Last_Name = crm_c.lastname collate database_default)
        then
            ('Tin/LastName: '+temp.Tin + '/' + temp.Last_Name)
        when
            (temp.Customer_ID = crm_c.ext_customerid collate database_default)
        then
            ('Customer_ID: '+temp.Customer_ID)
        else
            ('New Customer: '+temp.Customer_ID)
    end as FriendlyName,

    case 
        when
            (temp.Customer_ID = crm_c.ext_customerid collate database_default)
        then
            0
        else
            1
    end as ForceFieldLock

from DailyProfile_Current temp

left join crm_contact crm_c 
    on (temp.Customer_ID = crm_c.ext_customerid collate database_default 
        or (temp.Tin = crm_c.ext_retail_prime_taxid collate database_default 
        and temp.Last_Name = crm_c.lastname collate database_default))
    and 0 = crm_c.deletionstatecode and 0 = crm_c.statecode    

left outer join crm_ext_ImportMapping crm_fim 
    on temp.Geo = crm_fim.ext_geocode collate database_default 
    and 0 = crm_fim.deletionstatecode and 0 = crm_fim.statecode

Where crm_contact is a synonym that points to a view in another database. That view pulls data from a contact table and a contactextension table. I need data from both. I could probably separate this into two joins, if necess开发者_运维问答ary. In general, columns that begin with "ext_" are from the extension part of the crm_contact view.

When I run this against 100k rows in the DailyProfile_Current table, it takes about 35 minutes. That table is a bunch of nvarchar(200) columns that had a flatfile dumped into it. It sucks, but it's what I inherited. I wonder if using real datatypes would help, but I'd like possible solutions that don't involve that as well.

If the DailyProfile_Current table is full of things that don't match the join conditions, this runs incredibly fast. If the table is full of things that do match the join conditions, it's incredibly slow.

There are indexes on Customer_ID and Geo from the temp table. There are also assorted indexes on the crm_contact tables. I don't know how much an index can help on an nvarchar(200) column, though.

In case it matters, I'm using Sql Server 2005.

Any ideas are appreciated.


I would definitely split it into 2 Queries as the or function can be slow at times. Also, put a non clustered index on these columns (group them by line):

DailyProfile_Current:
Customer_ID 
Tin, Last_Name
Geo 

crm_contact:
ext_customerid,deletionstatecode,statecode
ext_retail_prime_taxid, lastname ,deletionstatecode,statecode

crm_ext_ImportMapping:
ext_geocode,deletionstatecode,statecode


Why dont you try to run it through Query Profiler? It might give you few hints.
Or else include the execution plan in the query result and look through it.

From the look on the query I can only suggest to split it in two by moving the OR from JOIN clause and use UNION ALL to union the results. At least, it might give you and idea which of two types of JOINs is slow, and work from there.


Run it throught the query analyzer and allow it to create indexes for you. i'm guessing you have at least sql 2000. And why not break up some of the functionality in code. for example you can do the case statement in code. But that is assuming that you are writing a query for code. I find that splitting up queries and taking up some of the load in code provides a significant difference in run time.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号