I am working with a large amount of data: 6 million rows. I need the query to run as fast as possible, but am at a loss for further optimization. I already removed 3 subqueries and moved it from 11+ hours to just 35 minutes on a modest dataset of 100k rows. See below!
declare @UserId uniqueidentifier;
set @UserId = '936DA01F-9ABD-4d9d-80C7-02AF85C822A8';
select
temp.Address_Line1,
temp.Cell_Phone_Number,
temp.City,
temp.CPM_delt_acd,
temp.CPM_delt_date,
temp.Customer_Id,
temp.Customer_Type,
temp.Date_Birth,
temp.Email_Business,
temp.Email_Home,
temp.First_Name,
temp.Geo,
temp.Home_Phone_Number,
temp.Last_Name,
temp.Link_Customer_Id,
temp.Middle_Name,
temp.Naics_Code,
temp.Office_Phone_Number,
temp.St,
temp.Suffix,
temp.Tin,
temp.TIN_Indicator,
temp.Zip_Code,
crm_c.contactid as CrmRecordId,
crm_c.ownerid as OldOwnerId,
crm_c.ext_profiletype as old_profileType,
coalesce(crm_fim.ownerid, @UserId) as OwnerId,
2 as profileType,
case
when
(temp.Tin = crm_c.ext_retail_prime_taxid collate database_default
and temp.Last_Name = crm_c.lastname collate database_default)
then
('Tin/LastName: '+temp.Tin + '/' + temp.Last_Name)
when
(temp.Customer_ID = crm_c.ext_customerid collate database_default)
then
('Customer_ID: '+temp.Customer_ID)
else
('New Customer: '+temp.Customer_ID)
end as FriendlyName,
case
when
(temp.Customer_ID = crm_c.ext_customerid collate database_default)
then
0
else
1
end as ForceFieldLock
from DailyProfile_Current temp
left join crm_contact crm_c
on (temp.Customer_ID = crm_c.ext_customerid collate database_default
or (temp.Tin = crm_c.ext_retail_prime_taxid collate database_default
and temp.Last_Name = crm_c.lastname collate database_default))
and 0 = crm_c.deletionstatecode and 0 = crm_c.statecode
left outer join crm_ext_ImportMapping crm_fim
on temp.Geo = crm_fim.ext_geocode collate database_default
and 0 = crm_fim.deletionstatecode and 0 = crm_fim.statecode
Where crm_contact is a synonym that points to a view in another database. That view pulls data from a contact table and a contactextension table. I need data from both. I could probably separate this into two joins, if necess开发者_运维问答ary. In general, columns that begin with "ext_" are from the extension part of the crm_contact view.
When I run this against 100k rows in the DailyProfile_Current table, it takes about 35 minutes. That table is a bunch of nvarchar(200) columns that had a flatfile dumped into it. It sucks, but it's what I inherited. I wonder if using real datatypes would help, but I'd like possible solutions that don't involve that as well.
If the DailyProfile_Current table is full of things that don't match the join conditions, this runs incredibly fast. If the table is full of things that do match the join conditions, it's incredibly slow.
There are indexes on Customer_ID and Geo from the temp table. There are also assorted indexes on the crm_contact tables. I don't know how much an index can help on an nvarchar(200) column, though.
In case it matters, I'm using Sql Server 2005.
Any ideas are appreciated.
I would definitely split it into 2 Queries as the or function can be slow at times. Also, put a non clustered index on these columns (group them by line):
DailyProfile_Current:
Customer_ID
Tin, Last_Name
Geo
crm_contact:
ext_customerid,deletionstatecode,statecode
ext_retail_prime_taxid, lastname ,deletionstatecode,statecode
crm_ext_ImportMapping:
ext_geocode,deletionstatecode,statecode
Why dont you try to run it through Query Profiler? It might give you few hints.
Or else include the execution plan in the query result and look through it.
From the look on the query I can only suggest to split it in two by moving the OR
from JOIN
clause and use UNION ALL to union the results. At least, it might give you and idea which of two types of JOIN
s is slow, and work from there.
Run it throught the query analyzer and allow it to create indexes for you. i'm guessing you have at least sql 2000. And why not break up some of the functionality in code. for example you can do the case statement in code. But that is assuming that you are writing a query for code. I find that splitting up queries and taking up some of the load in code provides a significant difference in run time.
精彩评论