开发者

MySQL: SELECT(x) WHERE vs COUNT WHERE?

开发者 https://www.devze.com 2022-12-20 16:12 出处:网络
This is going to be one of those questions but I need to ask it. I have a large table which may or may not have one unique row. I therefore need a MySQL query that will just tell me TRUE or FALSE.

This is going to be one of those questions but I need to ask it.

I have a large table which may or may not have one unique row. I therefore need a MySQL query that will just tell me TRUE or FALSE.

With my current knowledge, I see two options开发者_开发问答 (pseudo code):

[id = primary key]

OPTION 1:

SELECT id FROM table WHERE x=1 LIMIT 1
... and then determine in PHP whether a result was returned.

OPTION 2:

SELECT COUNT(id) FROM table WHERE x=1
... and then just use the count.

Is either of these preferable for any reason, or is there perhaps an even better solution?

Thanks.


If the selection criterion is truly unique (i.e. yields at most one result), you are going to see massive performance improvement by having an index on the column (or columns) involved in that criterion.

create index my_unique_index on table(x)

If you want to enforce the uniqueness, that is not even an option, you must have

create unique index my_unique_index on table(x)

Having this index, querying on the unique criterion will perform very well, regardless of minor SQL tweaks like count(*), count(id), count(x), limit 1 and so on. For clarity, I would write

select count(*) from table where x = ?

I would avoid LIMIT 1 for two other reasons:

  • It is non-standard SQL. I am not religious about that, use the MySQL-specific stuff where necessary (i.e. for paging data), but it is not necessary here.
  • If for some reason, you have more than one row of data, that is probably a serious bug in your application. With LIMIT 1, you are never going to see the problem. This is like counting dinosaurs in Jurassic Park with the assumption that the number can only possibly go down.


AFAIK, if you have an index on your ID column both queries will be more or less equal performance. The second query will need 1 less line of code in your program but that's not going to make any performance impact either.


Personally I typically do the first one of selecting the id from the row and limiting to 1 row. I like this better from a coding perspective. Instead of having to actually retrieve the data, I just check the number of rows returned.

If I were to compare speeds, I would say not doing a count in MySQL would be faster. I don't have any proof, but my guess would be that MySQL has to get all of the rows and then count how many there are. Altough...on second thought, it would have to do that in the first option as well so the code will know how many rows there are as well. But since you have COUNT(id) vs COUNT(*), I would say it might be slightly slower.


Intuitively, the first one could be faster since it can abort the table(or index) scan when finds the first value. But you should retrieve x not id, since if the engine it's using an index on x, it doesn't need to go to the block where the row actually is.

Another option could be:

select exists(select 1 from mytable where x = ?) from dual

Which already returns a boolean.


Typically, you use group by having clause do determine if there are duplicate rows in a table. If you have a table with id and a name. (Assuming id is the primary key, and you want to know if name is unique or repeated). You would use

select name, count(*) as total from mytable group by name having total > 1;

The above will return the number of names which are repeated and the number of times.

If you just want one query to get your answer as true or false, you can use a nested query, e.g.

select if(count(*) >= 1, True, False) from (select name, count(*) as total from mytable group by name having total > 1) a;

The above should return true, if your table has duplicate rows, otherwise false.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号