I asked a question earlier about a replacement to using SELECT DISTINCT as the query takes a lot longer to execute. I was recommended to use EXISTS and it worked A LOT better (0 second execution versus the previous 44 second execution. I'm not too familiar with query syntax but am learning. I was hoping someone might be able to rephrase the following query without using DISTINCT and preferably using EXISTS (as it worked so well previously). I appreciate any help.
select distinct EM.Employee,
rtrim(EM.First开发者_开发问答Name) + ' ' + rtrim(EM.LastName) as Name
from EM EM
inner join PR PR
on EM.Employee = PR.ProjMgr
where PR.WTS1 in (Select distinct WTS1
from TabFields
where custInclude = 'Y' and WTS2 = '')
and PR.WTS2 = ''
order by Name
When you JOIN, you get a partial Cartesian product. The partial comes from the INNER JOIN condition.
So for 2 rows in EM that have 3 rows and 4 rows in PR respectively, you'd get 7 rows in the output. As expected. You asked "give me all partial Cartesian product of EM and PR where matching rows"
However, you want "give me rows in EM where there is something in PR". So the INNER JOIN is the wrong construct.
You can use IN, EXISTS (or INTERSECT in other situations) and they are all semantically correct for this latter question.
In this case your use of IN is in the wrong place. As I said before, the whole "test" condition should be pushed into the subquery.
So, both of these are correct
- One table in the FROM clause
- All conditions in the subquery
- No DISTINCT
Stop JOINing!
select
EM.Employee, rtrim(EM.FirstName) + ' ' + rtrim(EM.LastName) as Name
from
EM EM
WHERE
EXISTS (SELECT *
FROM
PR PR
JOIN
TabFields TF ON PR.WTS1 = TF.WTS1
WHERE
PR.WTS2 = '' AND
TF.custInclude = 'Y' and TF.WBT2 = '' AND
EM.Employee = PR.ProjMgr
)
select
EM.Employee, rtrim(EM.FirstName) + ' ' + rtrim(EM.LastName) as Name
from
EM EM
WHERE
EM.Employee IN (SELECT PR.ProjMgr
FROM
PR PR
JOIN
TabFields TF ON PR.WTS1 = TF.WTS1
WHERE
PR.WTS2 = '' AND
TF.custInclude = 'Y' and TF.WBT2 = ''
)
Using 2 INs which is closer to your original query:
select EM.Employee,
rtrim(EM.FirstName) + ' ' + rtrim(EM.LastName) as Name
from EM EM
WHERE
EM.Employee IN (SELECT PR.ProjMgr
FROM
PR PR
where PR.WTS1 in (Select distinct WTS1
from TabFields
where custInclude = 'Y' and WTS2 = '')
and PR.WTS2 = ''
)
order by Name
The (new) Query optimizer will treat IN
the same as EXISTS
if you are evaluating a subquery.
Since you are using IN
, you don't need the DISTINCT
in your subquery. Just take it out!
It looks like someone is writing queries and always using DISTINCT
which is a pretty terrible habit to get into. DISTINCT
should only be used to avoid duplicates. There's no reason to eliminate dupes in your subquery since it will be just as accurate to say that 1
is in 1,2,1,3
as it is to say 1
is in 1,2,3
.
No distinct is needed in the code below
select EM.Employee,
rtrim(EM.FirstName) + ' ' + rtrim(EM.LastName) as Name
from EM
inner join PR
on EM.Employee = PR.ProjMgr
where exists (Select *
from TabFields
where custInclude = 'Y' and WTS2 = '' and PR.WTS1 = tabfields.WTS1 )
and PR.WTS2 = '' -- Comment: Check if this clause is covered by the subquery and can be ommited
order by Name
Try using GROUP BY instead of select DISTINCT. Also you can move your sub-query from the WHERE clause and use it as a derived table in a join.
SELECT EM.Employee,
RTRIM(EM.FirstName) + ' ' + RTRIM(EM.LastName) AS Name
FROM EM EM
INNER JOIN PR PR
ON EM.Employee = PR.ProjMgr
INNER JOIN (SELECT WTS1
FROM TabFields
WHERE custInclude = 'Y' AND WTS2 = ''
GROUP BY WTS1) x
ON PR.WTS1 = x.WTS1
AND PR.WTS2 = ''
GROUP BY EM.Employee, RTRIM(EM.FirstName) + ' ' + RTRIM(EM.LastName)
ORDER BY RTRIM(EM.FirstName) + ' ' + RTRIM(EM.LastName)
This query seems fine to me. Is is running slow? Do you really need to trim first and last name? If not you can ignore them.
select distinct E.Employee, rtrim(E.FirstName) + ' ' + rtrim(E.LastName) as Name
from EM E join PR P on E.Employee = P.ProjMgr
where P.WTS1 in (Select WTS1 from TabFields where custInclude = 'Y' and WTS2 = '')
and P.WTS2 = ''
order by Name
Exists() can be used to check something as a condition before some other sqls executed. If you need distinct rows, exists() will not help.
精彩评论