I have a performance problem on a query.
First table is a Customer table which has millions records in it. Customer table has a column of email address and some other information about customer.
Second table is a Com开发者_如何学运维municationInfo table which contains just Email addresses.
And What I want in here is; how many times the email address in CommunicationInfo table repeats in Customers table. What could be the the most performer query.
The basic query that I can explain this situation is;
Select ci.Email, count(*) from Customer c left join
CommunicationInfo ci on c.Email1 = ci.Email or c.Email2 = ci.Email
Group by ci.Email
But sure, it takes about 5, 6 minutes in execution.
Thanks in Advance.
this query is about as good as it gets if you have an index on Customer.Email and another on CommunicationInfo.Email
Select
c.Email, count(*)
from Customer c
left join CommunicationInfo ci on c.Email1 = ci.Email
left join CommunicationInfo ci2 on c.Email2 = ci2.Email
Group by c.Email
You mention:
And What I want in here is; how many times the email address in CommunicationInfo table repeats in Customers table. What could be the the most performer query.
To me, that sounds like you could easily use an INNER JOIN - this would most likely be a lot faster, since it will limit the search scope to just those customers who really do have an e-mail - anyone who doesn't have an e-mail at all (and thus a count(*) = 0) will not even be looked at - that might make a big difference even just in the number of rows SQL Server has to count and group.
So try this:
SELECT
ci.Email, COUNT(*)
FROM
dbo.Customer c
INNER JOIN dbo.CommunicationInfo ci
ON c.Email1 = ci.Email OR c.Email2 = ci.Email
GROUP BY
ci.Email
How does that perform in your case??
Using the OR
condition robs the optimizer of opportunity to use HASH JOIN
or MERGE JOIN
.
Use this:
SELECT ci.Email, SUM(cnt)
FROM (
SELECT ci.Email, COUNT(c.Email) AS cnt
FROM CommunicationInfo ci
LEFT JOIN
Customer c
ON c.Email1 = ci.Email
GROUP BY
ci.Email
UNION ALL
SELECT ci.Email, COUNT(c.Email) AS cnt
FROM CommunicationInfo ci
LEFT JOIN
Customer c
ON c.Email2 = ci.Email
GROUP BY
ci.Email
) q2
GROUP BY
ci.Email
or this:
SELECT ci.Email, COUNT(*)
FROM CommunicationInfo ci
LEFT JOIN
(
SELECT Email1 AS email
FROM Customer c
UNION ALL
SELECT Email2
FROM Customer
) q
ON q.Email = ci.Email
GROUP BY
ci.Email
Make sure that you have indexes on Customer(Email)
and Customer(Email2)
The first query will be more efficient if your emails are mostly not filled, the second one — if most emails are filled.
Depending on your environment there may not be much you can do to optimize this.
A couple of questions:
- How many records in CommunicationInfo?
- How often do you really need to run this query? Is it a one time analysis, or are multiple people going to be running this every 10 minutes?
- Are the fields indexed? I'll make a guess that neither Email1 nor Email2 field is indexed. However, I wouldn't suggest adding an index without taking the balance of the whole system into consideration.
- Why are you using a left join? Do you really need EVERYTHING from the Customer table? You're counting, so no harm in doing an INNER JOIN.
Suggestions:
- Run the query through the Query Optimization wizard to see if there is anything SQL Server would recommend.
- An extreme suggestion would be to dump the Email1 and Email2 columns into a temp table and join to that. I've seen queries run slowly because of a large amount of stress on a particular table, so sometimes copying the records into a temp table is faster, but this technique is very dependent on how much memory there is, how fast IO is, and the amount of stress on a particular table.
精彩评论