sanity check: count(*) vs count(pkColumn)_问答_开发者

Today my boss told me to use count(pkColumn) instead of count(*), saying it will be faster. I'm pretty sure this isn't true, but I'm less a DBA than a Java developer so I wanted to check.

I ran some query plans:

EXPLAIN SELECT COUNT(*) FROM "Foo";
EXPLAIN SELECT COUNT("FooUID") FROM "Foo";
EXPLAIN SELECT COUNT(1) FROM "Foo";

With the results:

Aggregate  (cost=1421.01..1421.02 rows=1 width=0)
  ->  Seq Scan on "Foo"  (cost=0.00..1296.81 rows=49681 width=0)

Aggregate  (cost=1421.01..1421.02 rows=1 width=16)
  ->  Seq Scan on "Foo"  (cost=0.00..1296.81 rows开发者_运维技巧=49681 width=16)

Aggregate  (cost=1421.01..1421.02 rows=1 width=0)
  ->  Seq Scan on "Foo"  (cost=0.00..1296.81 rows=49681 width=0)

They seem the same. Can someone just confirm that I'm correct about this? I realize this has been discussed for other languages, but PostgreSQL handles COUNT differently as I understand, and I couldn't find a PostgreSQL equivalent question.

Note that "FooUID" will never be NULL.

EDIT: using version 9.0.4

Your boss is misguided (at best).

Yes, PostgreSql (at least the ones I've used, up to 8.1) does treat count slightly differently in that it has to go to the actual table to get some information rather than just using the index alone (with MVCC, it stores "soft" deletion information in the table so each row has to be retrieved to see if it's been deleted but not yet removed from an index).

But it has to do that regardless of whether you use * or pk_column for the count. The advantage of using * is that the DBMS is free to choose any non-null column for the count, which may result in less disk I/O for the index processing.

If your boss is worried about the performance, there are a couple of possible "tricks" you can use.

First, if you're only interested in the existence of rows rather then the number, you can use change code like:

select count(*) into myCount from myTable where myCondition;
if myCount > 0 then
    do something
end if;

into:

if exists ( select * from myTable where myCondition ) then
    do something
end if;

Secondly, you can use insert and delete triggers which maintain two counters in a separate table for rows added and rows deleted. You can then work out the rowcount by simply subtracting one from the other. The use of two separate counters prevents serialisation on the insert/delete operations but it's still a massive kludge and probably unnecessary for the vast number of cases.

You also have to periodically scale back the counters (for example, modify 1000 inserts + 200 deletes to be 800 inserts and 0 deletes).

Thirdly, you can get an estimate of the row count (based on the last analyse operation) by using:

select reltuples from pg_class where relname = 'myTable';

Theoretically, count(column) can be faster than count(*) because when the SQL parser parses the * it must look up in the catalog what the column names are for the table, which is what * represents. By using count(column) you potentially avoid this catalog look up.

This "advantage" is so insignificant compared to the reduced readability it causes, and denies the database the chance to optimize your intention by some other mechanism (eg counting an indexed column and thus doing an index-only search).

Use count(*).

In PostgreSQL:

count(*) equals count(PK)

As Bohemian has said, there is a little delay in translating the * into a field; however there is also a delay in legitimizing PK as a valid field. Regardless, they are negligible performance hits.

Additional Reading:

http://wiki.postgresql.org/wiki/Slow_Counting

PostgreSQL Docs:

Note: Users accustomed to working with other SQL database management systems might be disappointed by the performance of the count aggregate when it is applied to the entire table. A query like:

SELECT count(*) FROM sometable;

will be executed by PostgreSQL using a sequential scan of the entire table.