So I have a table:
CREATE TABLE TABLE_NAME 开发者_如何学Python(
COLUMN_1 char(12) NOT NULL,
COLUMN_2 char(2) NOT NULL,
COLUMN_3 char(1) NOT NULL,
COLUMN_4 int NOT NULL,
COLUMN_5 char(2) NOT NULL,
COLUMN_6 money NOT NULL,
COLUMN_7 smallint NOT NULL,
COLUMN_8 varchar(10) NOT NULL,
COLUMN_9 smallint NOT NULL,
COLUMN_10 datetime NOT NULL
Primary Key (COLUMN_1, COLUMN_2, COLUMN_3)
)
SELECT COUNT(*)
returns a different value than SELECT DISTINCT COUNT(*)
. How can this be possible?
I also tried
SELECT COUNT(*) FROM (
SELECT
COLUMN_1,
COLUMN_2,
COLUMN_3,
COLUMN_4,
COLUMN_5,
COLUMN_6,
COLUMN_7,
COLUMN_8,
COLUMN_9,
COLUMN_10
FROM TABLE_NAME
) TMP
which returned the same count as the distinct query.
I can't see how with a primary key and all fields being NOT NULL, there can be a different total count than the count of unique records.
BTW, this is on Sybase ASE 15.
The discrepancy is a hundred or so records out of a half million. I'm also seeing this problem in several other tables, but chose just one for the example.
Edit
I should mention for the sake of completeness that I discovered this problem when writing a simple job to completely copy this table to a remote database. My application was recording a certain number of read/write operations, but failed QA because the number of records in the source database differed from the number of records in the target database. Both values were obtained via COUNT(*)
; the count returned from the target (Oracle 10g) was the same as the number of read/write operations recorded by my app. As all fields on the source table are defined NOT NULL and a primary key is defined, I was at a loss to explain how my application was losing a tiny number of records.
This is when I started using the alternate queries listed above, both of which agreed with my apps read/write count, as well as the COUNT(*)
value returned from the target. In other words, the only value that did not match was the COUNT(*)
on the source database.
In most databases that support it, count(*)
doesn't actually retrieve all records and count them -- instead it fetches some metadata field that just tracks the number of rows (or approximate number of rows) presently stored in the table. On the other hand, when you do something that requires working with actual data, the dbms is going to fetch the rows anyway, and it will count them as you would expect it to.
Of course, it's reasonable to expect that, regardless of how it's implemented, the result of count(*)
would be the same as more a complex but equivalent query. That would suggest then, that (maybe) your table's metadata is corrupted somehow. (I'd say this one is a good bet -- I'm not familiar with sybase specifically, but most dbms have a way to force rebuild the table metrics... that might be worth a try here).
Another possible explanation is that the database's internal table row counter is actually not designed to be 100% accurate. (this second possibility is pure educated speculation... I don't actually know whether this is true of Sybase's row counter or not, but it might be worth further investigation).
good luck!
This is a Sybase ASE Answer for a Sybase ASE Question
It applies to any Standard SQL Compliant DBMS; it does not apply to others.
The answer you have chosen is not correct.
COUNT()
is an ISO/IEC/ANSI Standard SQL expression. It is required to physically count the rows, which means it will be slow on a large table. It is required to not use the internal memory resident tables or catalogue tables ("metadata"). This is a run-time value, so if the table is active, it will keep changing for every execution
What you place within the brackets is very important.
COUNT(*)
returns the number of rows including nullsCOUNT(column)
returns the number of rows wherecolumn
is Not NullYes, the placement of
DISTINCT
is also important. This forces the use of a work table. If you need to count a column that is not Unique, then you do not have a choice; but for unique columns, there is no need to useDISTINCT
DISTINCT
applies to a column or an expression derived from a column;
COUNT (DISTINCT *)
is meaningless ("distinct all columns"), and ASE 15 has a substantially improved parser, which catches such things (previous version woul dhave returned a less accurate error message).the rows actually read will depend on your ISOLATION LEVEL (the correct count will be returned for the level specified) and the current activity on the database
the cleanest method, that avoids the weird results you are getting, is to use
COUNT(PK_column)
(Since this is a CW) Never use any form of
COUNT()
for an existence check, as it physically counts the rows. Always useIF EXISTS
with the correctWHERE
clause, because it will use the index only.
- If you need an accurate count but do not want to read all the rows, there is a function to read catalogue table
systabstats
, which has a count of rows in each table. This returns instantaneously, regardless of table size. The currency of those values depends on how your server is configured for performance, flushing, checkpointing, etc.systabstats
is updated from the memory-resident tables by two commands: UPDATE STATISTICS and "flush stats". Try this:EXEC sp_flushstats SELECT ROW_COUNT (DB_ID(), OBJECT_ID("table_name") )
Response to Comments
This Section is Not Relevant to Sybase
I have provided a clean explanation re your problem, and the subject, without explaining why the other two answers are incorrect, or why your second query returns unpredictable results.
Here is the Sybase ASE 15.0 Reference/Building Blocks Manual,
COUNT()
is on page 121. Note that icyrock has misquoted it, and both of you have mis-interpreted it, inadvertently of course. Your starting point was confusion, lack of distinction re*
andDISTINCT
, hence my attempt at a clear answer.I made this a Community Wiki, therefore my answer is complete re the subject, normalised, so that it can stand alone as a complete answer to any question re
COUNT()
.In response to comments, for those people who have not heard, SQL is a Standard language, invented and progressed by IBM in the 1970's and accepted as a Standard in the 1980's by:
International Standards Organisation,
International Electrotechnical Commission and (Book Format),
and copied by American National Standards Institute (free publication thanks to Digital Equipment Corp) somewhat later.
None of the Open Source or freeware "SQL" comply with the Standard. They provide some components (language structures, facilities, commands) of the Standard. And of those they provide, they seldom provide the Standard Requirement (eg. ACID Transaction handling; security; etc).
The second difference is Architecture, and in the implementation of SQL as a genuine language (consistency, callable from any code segment). A single genuine multi-threaded process with no context switching cannot be compared with a herd of hundreds of programs using Unix to do all its "multi-threading" and context switching.
Therefore what SAP/Sybase and DB2 (rigid regarding Standards) do, and to a lesser extent what MS ("flexible" about implementations) do, because (a) they are Standard-Compliant (without arguing the small variations), and (b) they have a genuine Server Architecture, is light years away from the MySQLs and the PostgreSQLs of the world. The result is the hardware requirement that is 100 times that of an architected Server. You get what you pay for.
Oracle has a special place because it is commercial and common, however it is not SQL compliant (in language or in ACID Transactions, and it stretches the definitions). Further, it has no architecture, which places it in the freeware category.
Then there is non-Standard or anti-Standard category.
MySQL/MyISAM providesCOUNT()
in a manner that is specifically against the Standard (this is plainly evident from the MySQL Manual link provided by Lee; good for non-transactional apps).
MySQL/InnoDB & BDB provideCOUNT()
in the Standard-compliant manner.PostgreSQL is the worst, despite the heavy marketing. In addition to not being SQL compliant, it has no concept of ACID Transactions, and the "SQL" is not implemented as a language (Rule 4 in Codd's Twelve Rules). Unfortunately with each major release, as it implements some increment of SQL, you have to re-write your code.
All Standard-SQL vendors provide a large array of Extensions to the Standard
The
NOT NULL
in the table definition cannot be immediately trusted, because the table may well have had null columns in it before the current definition was implemented. Nevertheless, Sybase and DB2 will count throws physical, as per requirement for Standard compliance. You can prove with with a series of counts:SELECT COUNT(column_1) from table_name
, and then compare the counts.The second query will further confuse you, yes, because when the inner table is created, and populated, the count will be accurate. Since you created it with an expectation, it fulfilled your expectation. But that does not prove anything about that original table.
If I'm not mistaken, judging from this:
- http://manuals.sybase.com/onlinebooks/group-as/asg1250e/sqlug/@Generic__BookTextView/8007;pt=7889
you should use select count(distinct *)
. I'd expect select distinct count(*)
to always return 1, as it says "give me the distinct rows, each of which is a count(*)
", so there's always going to be one row, while select count(distinct *)
gives you a count of distinct rows.
FWIW, the above seems to be for v12.5 (though I don't see any differences), here are the 15.0 docs:
- http://infocenter.sybase.com/help/index.jsp?topic=/com.sybase.help.ase_15.0.blocks/html/blocks/blocks128.htm
It explicitly says the following:
count(*) finds the number of rows. count(*) does not take any arguments, and cannot be used with distinct. All rows are counted, regardless of the presence of null values.
You can use select count(distinct column_1)
or so, but not select count(distinct *)
.
精彩评论