I have a SQL Server database with a large amount of data (65 million rows mostly of text, 8Gb total). The data gets changed only once per week. I have an ASP.NET web application that will run several SQL queries on this data that will count the number of rows satisfying va开发者_运维知识库rious conditions. Since the data gets changed only once per week, what is the most efficient way to store both the SQL queries and their counts for the week? Should I store it in the database or in the application?
If the data is only modified once a week, as part of and at the end of that (ETL?) process, perform your "basic" counts and store the results in a table in the database. Thereafter, rather than lengthy queries on the big tables, you can just query those small summary tables.
If you do not need 100% up-to-the-minute accurate row counts, you could query SQL Server's internal info:
Select so.name as 'TableName', si.rowcnt as 'RowCount'
from sysobjects so
inner join sysindexes si on so.id = si.id
where so.type = 'u' and indid < 2
Very quick to execute and no extra tables required. Not accurate where many updates are occurring but might be accurate enough in your intended usage. [Thank you to commenters!]
Update: did a bit of digging and this does produce accurate counts (slower due to the sum, but still quick):
SELECT OBJECT_SCHEMA_NAME(ps.object_id) AS SchemaName,
OBJECT_NAME(ps.object_id) AS ObjectName,
SUM(ps.row_count) AS row_count
FROM sys.dm_db_partition_stats ps
JOIN sys.indexes i ON i.object_id = ps.object_id
AND i.index_id = ps.index_id
WHERE i.type_desc IN ('CLUSTERED','HEAP')
AND OBJECT_SCHEMA_NAME(ps.object_id) <> 'sys'
GROUP BY ps.object_id
ORDER BY OBJECT_NAME(ps.object_id), OBJECT_SCHEMA_NAME(ps.object_id)
Ref.
Remember that the stored count information was not always 100% accurate in SQL Server 2000. For a new table created on 2005 the counts will be accurate. But for a table that existed in 2000 and now resides on 2005 through a restore or update, you need to run (only once after the move to 2005) either sp_spaceused @updateusage = N'true' or DBCC UPDATEUSAGE with the COUNT_ROWS option.
The queries should be stored as stored procedures or views, depending on complexity.
For your situation I would look into indexed views.
They let you both store a query AND the result set for things like aggregation that otherwise cannot be indexed.
As a bonus, the query optimizer "knows" it has this data as well, so if you check for a count or something else stored in the view index in another query (even one not referencing the view directly) it can still use that stored data.
精彩评论