Most efficient way to store queries and counts of large SQL data_问答_开发者

Most efficient way to store queries and counts of large SQL data

开发者 https://www.devze.com 2023-03-26 11:31 出处：网络

I have a SQL Server database with a large amount of data (65 million rows mostly of text, 8Gb total). The data gets changed only once per week. I have an ASP.NET web application that will run several

I have a SQL Server database with a large amount of data (65 million rows mostly of text, 8Gb total). The data gets changed only once per week. I have an ASP.NET web application that will run several SQL queries on this data that will count the number of rows satisfying va开发者_运维知识库rious conditions. Since the data gets changed only once per week, what is the most efficient way to store both the SQL queries and their counts for the week? Should I store it in the database or in the application?

If the data is only modified once a week, as part of and at the end of that (ETL?) process, perform your "basic" counts and store the results in a table in the database. Thereafter, rather than lengthy queries on the big tables, you can just query those small summary tables.

If you do not need 100% up-to-the-minute accurate row counts, you could query SQL Server's internal info:

Select so.name as 'TableName', si.rowcnt as 'RowCount'
from sysobjects so
inner join sysindexes si on so.id = si.id 
where so.type = 'u' and indid < 2

Very quick to execute and no extra tables required. Not accurate where many updates are occurring but might be accurate enough in your intended usage. [Thank you to commenters!]

Update: did a bit of digging and this does produce accurate counts (slower due to the sum, but still quick):

SELECT OBJECT_SCHEMA_NAME(ps.object_id) AS SchemaName, 
       OBJECT_NAME(ps.object_id) AS ObjectName, 
       SUM(ps.row_count) AS row_count
FROM sys.dm_db_partition_stats ps
JOIN sys.indexes i ON i.object_id = ps.object_id
                      AND i.index_id = ps.index_id
WHERE i.type_desc IN ('CLUSTERED','HEAP')
AND OBJECT_SCHEMA_NAME(ps.object_id) <> 'sys'
GROUP BY ps.object_id
ORDER BY OBJECT_NAME(ps.object_id), OBJECT_SCHEMA_NAME(ps.object_id)

Ref.

Remember that the stored count information was not always 100% accurate in SQL Server 2000. For a new table created on 2005 the counts will be accurate. But for a table that existed in 2000 and now resides on 2005 through a restore or update, you need to run (only once after the move to 2005) either sp_spaceused @updateusage = N'true' or DBCC UPDATEUSAGE with the COUNT_ROWS option.

The queries should be stored as stored procedures or views, depending on complexity.

For your situation I would look into indexed views.

They let you both store a query AND the result set for things like aggregation that otherwise cannot be indexed.

As a bonus, the query optimizer "knows" it has this data as well, so if you check for a count or something else stored in the view index in another query (even one not referencing the view directly) it can still use that stored data.