开发者

SQL: Making COUNT(*) > 1 Efficient

开发者 https://www.devze.com 2023-02-09 13:59 出处:网络
if you want to know if COUNT(*) > 0 then you can use EXISTS to make the query more efficient. Is there a way I can make a query more efficient when I want to know if COUNT(*) > 1?

if you want to know if COUNT(*) > 0 then you can use EXISTS to make the query more efficient. Is there a way I can make a query more efficient when I want to know if COUNT(*) > 1?

(Needs to be compatible with both SQL Server and Oracle.)

Thanks, Jamie

Edit:

I am trying to improve the performance of some code. There are some lines similar to:

if (SQL('SELECT COUNT(*) FROM table WHERE a = b') > 0) then...

and

if (SQL('SELECT COUNT(*) FROM table WHERE a = b') > 1) then...

The first line is easy enough to switch to an EXISTS statement, but can I make the second line more efficient? From the comments and my own thoughts I have the following ideas, would any of them be more efficient?

if (SQLRecordCount('SELECT TOP 2 1 FROM table WHERE a = b') > 1) then...

(I can use ROWNUM for Oracle.)

if (SQL('SELECT 1 FROM table WHERE a = b HAVING COUNT(*) > 1') = 1) then...

The following doesn't doesn't work in SQL Server:

开发者_Go百科SELECT COUNT(*) FROM (SELECT TOP 2 FROM table WHERE a = b)

But this does with Oracle:

SELECT COUNT(*) FROM (SELECT 1 FROM table WHERE a = b AND ROWNUM < 3)

Thanks for all your help so far.


Something like this could work:

select myDate
from myTable
where myColumn = myCondition
group by myDate
having count(*) > 1

Although if I had your exact query, or a reasonable facsimilie, I could help ya out more.

As far as the actual keyword being more efficient, as far as I know, there's not much you can do as the SQL programmer about it. It's going to be a function as to how well your RDBMS handles the actual counting. If it sees that it's going to return the row if there are 2 occurrences and stops counting at 2, great. If it's not smart enough and keeps track of another 1,000 occurrences, not so great.

If you're using this in a join or sub-query you can control the number of rows returned at various points in your query or stored procedure. The earlier you can filter out rows that are destined to never be returned, the better.


Your question at the moment is a bit abstract. Can you provide a bit more context?

I'm thinking that if you have a composite index on foo, id then the below could be satisfied by two index seeks.

SELECT CASE WHEN MAX(id)= MIN(id) THEN 0 ELSE 1 END
FROM yourtable 
WHERE foo='bar'

Or maybe to force the plan a bit more explicitly

SELECT CASE WHEN COUNT(*) = 2 THEN 1 ELSE 0 END FROM 
(
    SELECT MAX(id)
    FROM yourtable 
    WHERE foo='bar'
    UNION
    SELECT MIN(id)
    FROM yourtable 
    WHERE foo='bar'
) AS T


It shouldn't matter too much if indexed

Example:

2 million row table, quite wide, 900MB on disk, virtual SQL Server 2005.

This gives 17,876 rows

SELECT COUNT(*), ThingID FROM dbo.TwoMillion IT GROUP BY ThingID HAVING COUNT(*) > 1

  |--Filter(WHERE:([Expr1002]>(1)))
       |--Compute Scalar(DEFINE:([Expr1002]=CONVERT_IMPLICIT(int,[Expr1005],0)))
            |--Hash Match(Aggregate, HASH:([IT].[ThingID]) DEFINE:([Expr1005]=COUNT(*)))
                 |--Index Scan(OBJECT:([MyDB].[dbo].[TwoMillion].[IX_Thing] AS [IT]))

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0
Table 'TwoMillion'. Scan count 1, logical reads 8973, physical reads 3, read-ahead reads 8969... all zeroes

On 2nd run

 Table 'Worktable'. = same
 Table 'TwoMillion'. Scan count 1, logical reads 8973, ... all zeroes

 CPU time = 453 ms,  elapsed time = 564 ms.


Completely ignoring the cross compatibility requirement in SQL Server you can use TOP to explicitly limit the number of rows scanned. In some scenarios this may be of benefit as in the below (somewhat contrived) example.

USE tempdb

CREATE TABLE Orders
(
OrderId INT IDENTITY(1,1) PRIMARY KEY,
Blah VARCHAR(10)
)
INSERT INTO Orders 
SELECT TOP 10 LEFT(name,10)
FROM sys.objects

CREATE TABLE OrderItems
(
OrderItemId INT IDENTITY(1,1) PRIMARY KEY,
OrderId INT REFERENCES Orders(OrderId)
)
CREATE NONCLUSTERED INDEX ix ON OrderItems(OrderId)

INSERT INTO OrderItems (OrderId)
SELECT TOP 1000000 1+ ROW_NUMBER() OVER (ORDER BY (SELECT 0))% 10 
FROM sys.all_columns c1, sys.all_columns c2

SET STATISTICS IO ON
SET STATISTICS TIME ON

SELECT o.OrderId, o.Blah
FROM Orders o JOIN OrderItems oi ON o.OrderId = oi.OrderId
GROUP BY o.OrderId, o.Blah
HAVING COUNT(*) > 1

/*
Table 'Orders'. Scan count 0, logical reads 20
Table 'OrderItems'. Scan count 1, logical reads 1742
*/


SELECT o.OrderId, o.Blah
FROM Orders o 
CROSS APPLY 
(SELECT TOP 2 OrderItemId FROM 
OrderItems oi WHERE o.OrderId = oi.OrderId) CA
GROUP BY o.OrderId, o.Blah
HAVING COUNT(*) > 1

/*
Table 'OrderItems'. Scan count 10, logical reads 30
Table 'Orders'. Scan count 1, logical reads 2
*/

DROP TABLE OrderItems
DROP TABLE Orders

SQL: Making COUNT(*) > 1 Efficient


I found that the following line significantly improved the performance for SQL Server, going from about 40ms to about 5ms in my tests.

SELECT COUNT(*) FROM (SELECT TOP 2 1 AS x FROM table Where a = b) AS y

Note the aliases, they are necessary to get the query to work.

Unfortunately the following query does not seem to improve performance in Oracle:

SELECT COUNT(*) FROM table WHERE a = b AND ROWNUM < 3


First you should not use asterisks if you want to optimize your queries.

Maybe it would better to create a query with limit? You aren't interested in the count or something like this. You only want to know if there are more than one entries:

select id
from mytable
where ...
limit 2

This should be very fast. Than call countRows which are given to you to get the answer you need.

0

精彩评论

暂无评论...
验证码 换一张
取 消