Can anyone explain why on earth these queries are not the same?_问答_开发者

Can anyone explain why on earth these queries are not the same?

开发者 https://www.devze.com 2022-12-18 22:36 出处：网络

I had am maintaining an query that is as follows: select field_1, field_2 from source_table minus select field_1, field_2

I had am maintaining an query that is as follows:

select field_1, field_2
from source_table
minus
select field_1, field_2
from source_table
where status_code in (3, 600);开发者_开发技巧

When I looked at this query, I immediately thought, "That's lame. Why not just use a 'NOT IN' and remove the MINUS business. So I re-wrote it as so:

select field_1, field_2
from source_table
where status_code not in (3, 600);

Just to double-check my sanity, I got counts of each query. To my surprise, the first query returned 789,089 records, and the second query returned 1,518,450 records!

I've looked at this from several angles but can't figure out how these two queries are different. Can anyone explain what is going on, or why I am an idiot this morning?

These queries are indeed different. field_1 and field2 do not equate to status_code 3 and 600. field_1 could be 'A' and field_2 could be 'B', so you would be eliminating records from the first SELECT that look like A, B. The original is probably the best way of achieving the correct result.

Edit: To give you a better idea of what's going on, you could get the same result, in a similar fashion to how you thought to write the query, by doing a subquery:

select distinct field_1, field_2
from source_table
where (field_1, field_2) not in (
    select field_1, field_2
    from source_table
    where status_code in (3, 600)
);

If there isn't a unique constraint on the combination of field_1 and field_2, the second query may contain duplicates while the first won't, as 'minus' will suppress them. Try the second query with a 'distinct' and see if the counts match.

The UNION, MINUS and INTERSECT operators return only unique values. If you have two rows with identical field_1 and field_2, the first query will count it once, while the second will count it twice:

SQL> insert into source_table values ('a', 'b', 10);

SQL> insert into source_table values ('a', 'b', 10);

SQL> select field_1, field_2
  2  from source_table
  3  minus
  4  select field_1, field_2
  5  from source_table
  6  where status_code in (3, 600);

FIELD_1    FIELD_2
---------- ----------
a          b

SQL> select field_1, field_2
  2  from source_table
  3  where status_code not in (3, 600);

FIELD_1    FIELD_2
---------- ----------
a          b
a          b

If you do not have a unique constraint on field_1, field_2 or both, Alison is probably right. Consider you have a table: A B 3 A B 10

The first query would eliminate both rows, the second only one. Alternatively, if you have NULL values in status_code column, you may get different results, (A or not A) does not work in SQL if you have NULL values in columns.