set difference in SQL query_问答_开发者_运维开发者技术经验分享

开发者 https://www.devze.com 2022-12-25 13:38 出处：网络

I\'m trying to select records with a statement SELECT * FROM A WHERE LEFT(B, 5) IN (SELECT * FROM (SELECT LEFT(A.B,5), COUNT(DISTINCT A.C) c_count

I'm trying to select records with a statement

SELECT * 
FROM A 
WHERE 
  LEFT(B, 5) IN 
    (SELECT * FROM 
       (SELECT LEFT(A.B,5), COUNT(DISTINCT A.C) c_count 
        FROM A 
        GROUP BY LEFT(B,5)
       ) p1 
       WHERE p1.c_count = 1
     ) 
     AND C IN 
        (SELECT * FROM 
            (SELECT A.C , COUNT(DISTINCT LEFT(A.B,5)) b_count 
             FROM A 
             GROUP BY C
            ) p2 
    开发者_如何转开发      WHERE p2.b_count = 1)

which takes a long time to run ~15 sec.

Is there a better way of writing this SQL?

If you would like to represent Set Difference (A-B) in SQL, here is solution for you. Let's say you have two tables A and B, and you want to retrieve all records that exist only in A but not in B, where A and B have a relationship via an attribute named ID. An efficient query for this is:

# (A-B)
SELECT DISTINCT A.* FROM (A LEFT OUTER JOIN B on A.ID=B.ID) WHERE B.ID IS NULL

-from Jayaram Timsina's blog.

You don't need to return data from the nested subqueries. I'm not sure this will make a difference withiut indexing but it's easier to read.

And EXISTS/JOIN is probably nicer IMHO then using IN

SELECT * 
FROM
    A 
    JOIN
    (SELECT LEFT(B,5) AS b1
        FROM A 
        GROUP BY LEFT(B,5)
        HAVING COUNT(DISTINCT C) = 1
    ) t1 On LEFT(A.B, 5) = t1.b1
    JOIN
    (SELECT C AS C1
        FROM A 
        GROUP BY C
        HAVING COUNT(DISTINCT LEFT(B,5)) = 1
    ) t2 ON A.C = t2.c1

But you'll need a computed column as marc_s said at least

And 2 indexes: one on (computed, C) and another on (C, computed)

Well, not sure what you're really trying to do here - but obviously, that LEFT(B, 5) expression keeps popping up. Since you're using a function, you're giving up any chance to use an index.

What you could do in your SQL Server table is to create a computed, persisted column for that expression, and then put an index on that:

ALTER TABLE A
   ADD LeftB5 AS LEFT(B, 5) PERSISTED

CREATE NONCLUSTERED INDEX IX_LeftB5 ON dbo.A(LeftB5)

Now use the new computed column LeftB5 instead of LEFT(B, 5) anywhere in your query - that should help to speed up certain lookups and GROUP BY operations.

Also - you have a GROUP BY C in there - is that column C indexed?

If you are looking for just set difference between table1 and table2, the below query is simple that gives the rows that are in table1, but not in table2, such that both tables are instances of the same schema with column names as columnone, columntwo, ...

with 
col1 as (
  select columnone from table2
  ),
col2 as (
  select columntwo from table2
)
...
select * from table1
where (
  columnone not in col1 
  and columntwo not in col2
  ...
);