开发者

SQL Server - Select Distinct Alternative Query

开发者 https://www.devze.com 2023-03-14 09:03 出处:网络
I asked a question earlier about a replacement to using SELECT DISTINCT as the query takes a lot longer to execute. I was recommended to use EXISTS and it worked A LOT better (0 second execution versu

I asked a question earlier about a replacement to using SELECT DISTINCT as the query takes a lot longer to execute. I was recommended to use EXISTS and it worked A LOT better (0 second execution versus the previous 44 second execution. I'm not too familiar with query syntax but am learning. I was hoping someone might be able to rephrase the following query without using DISTINCT and preferably using EXISTS (as it worked so well previously). I appreciate any help.

select distinct EM.Employee, 
                rtrim(EM.First开发者_开发问答Name) + ' ' + rtrim(EM.LastName) as Name 
from EM EM 
inner join PR PR 
    on EM.Employee = PR.ProjMgr 
where PR.WTS1 in (Select distinct WTS1 
                  from TabFields 
                  where custInclude = 'Y' and WTS2 = '') 
    and PR.WTS2 = '' 
order by Name


When you JOIN, you get a partial Cartesian product. The partial comes from the INNER JOIN condition.

So for 2 rows in EM that have 3 rows and 4 rows in PR respectively, you'd get 7 rows in the output. As expected. You asked "give me all partial Cartesian product of EM and PR where matching rows"

However, you want "give me rows in EM where there is something in PR". So the INNER JOIN is the wrong construct.

You can use IN, EXISTS (or INTERSECT in other situations) and they are all semantically correct for this latter question.

In this case your use of IN is in the wrong place. As I said before, the whole "test" condition should be pushed into the subquery.

So, both of these are correct

  • One table in the FROM clause
  • All conditions in the subquery
  • No DISTINCT

Stop JOINing!

select
    EM.Employee, rtrim(EM.FirstName) + ' ' + rtrim(EM.LastName) as Name 
from
    EM EM 
WHERE
   EXISTS (SELECT *
       FROM
          PR PR 
          JOIN
          TabFields TF ON PR.WTS1 = TF.WTS1
       WHERE
          PR.WTS2 = '' AND
          TF.custInclude = 'Y' and TF.WBT2 = '' AND
          EM.Employee = PR.ProjMgr
          )

select
    EM.Employee, rtrim(EM.FirstName) + ' ' + rtrim(EM.LastName) as Name 
from
    EM EM 
WHERE
   EM.Employee IN (SELECT PR.ProjMgr
       FROM
          PR PR 
          JOIN
          TabFields TF ON PR.WTS1 = TF.WTS1
       WHERE
          PR.WTS2 = '' AND
          TF.custInclude = 'Y' and TF.WBT2 = ''
       )

Using 2 INs which is closer to your original query:

select EM.Employee, 
                rtrim(EM.FirstName) + ' ' + rtrim(EM.LastName) as Name 
from EM EM 
WHERE 
      EM.Employee IN (SELECT PR.ProjMgr
          FROM 
            PR PR 
          where PR.WTS1 in (Select distinct WTS1 
                  from TabFields 
                  where custInclude = 'Y' and WTS2 = '') 
            and PR.WTS2 = ''
       ) 
order by Name


The (new) Query optimizer will treat IN the same as EXISTS if you are evaluating a subquery.

Since you are using IN, you don't need the DISTINCT in your subquery. Just take it out!

It looks like someone is writing queries and always using DISTINCT which is a pretty terrible habit to get into. DISTINCT should only be used to avoid duplicates. There's no reason to eliminate dupes in your subquery since it will be just as accurate to say that 1 is in 1,2,1,3 as it is to say 1 is in 1,2,3.


No distinct is needed in the code below

select  EM.Employee, 
                rtrim(EM.FirstName) + ' ' + rtrim(EM.LastName) as Name 
from EM 
inner join PR 
    on EM.Employee = PR.ProjMgr 
where exists (Select * 
                  from TabFields 
                  where custInclude = 'Y' and WTS2 = '' and PR.WTS1 = tabfields.WTS1 ) 
    and PR.WTS2 = '' -- Comment: Check if this clause is covered by the subquery and can be ommited
order by Name


Try using GROUP BY instead of select DISTINCT. Also you can move your sub-query from the WHERE clause and use it as a derived table in a join.

SELECT EM.Employee, 
       RTRIM(EM.FirstName) + ' ' + RTRIM(EM.LastName) AS Name 
FROM EM EM 
INNER JOIN PR PR 
    ON EM.Employee = PR.ProjMgr 
INNER JOIN (SELECT WTS1 
              FROM TabFields 
             WHERE custInclude = 'Y' AND WTS2 = ''
             GROUP BY WTS1) x
    ON PR.WTS1 = x.WTS1
    AND PR.WTS2 = '' 
GROUP BY EM.Employee,    RTRIM(EM.FirstName) + ' ' + RTRIM(EM.LastName) 
ORDER BY RTRIM(EM.FirstName) + ' ' + RTRIM(EM.LastName)


This query seems fine to me. Is is running slow? Do you really need to trim first and last name? If not you can ignore them.

 select distinct E.Employee, rtrim(E.FirstName) + ' ' + rtrim(E.LastName) as Name 
 from EM E join PR P on E.Employee = P.ProjMgr 
 where P.WTS1 in (Select WTS1 from TabFields where custInclude = 'Y' and WTS2 = '') 
   and P.WTS2 = '' 
 order by Name

Exists() can be used to check something as a condition before some other sqls executed. If you need distinct rows, exists() will not help.

0

精彩评论

暂无评论...
验证码 换一张
取 消