开发者

Create clustered index and/or partitioning on non-unique column?

开发者 https://www.devze.com 2023-04-01 18:10 出处:网络
I have a table containing log entries for a single week for about a thousand web servers. Each server writes about 60,000 entries per day to the table, so there are 420,000 entries per week for each s

I have a table containing log entries for a single week for about a thousand web servers. Each server writes about 60,000 entries per day to the table, so there are 420,000 entries per week for each server. The table is truncated weekly. Each log entry contains the servername, which is a varchar (this cannot be changed).

The main operation is to select * from table where servername = 'particular', so as to retrieve the 420,000 records for a server, and a C# program then analyzes the data from that server once selected.

Should I create a clustered index on the servername column to speed up开发者_JS百科 the read operation? (It currently takes over half an hour to execute the above SQL statement.)

Would partitioning help? The computer has only two physical drives.

The query is run for each server once per week. After the query is run for all servers, the table is truncated.


The "standard" ideal clustered key is something like an INT IDENTITY that keeps increasing and is narrow.

However, if your primary use for this table is the listed query, then I think a clustered index on servername makes sense. You will see a big increase in speed if the table is wide, since you will eliminate an expensive key/bookmark lookup that runs on a SELECT * from a nonclustered index (unless you include all the fields in the table).

EDIT:

KM pointed out this will slow down inserts, which is true. For this scenario you may want to consider a two-field key on servername, idfield where idfield is an INT Identity. This would still allow access based only on servername in your query but will insert new records at the end PER SERVER. You will still have fragmentation and reordering.


based on:

The query is run for each server once per week. After the query is run for all servers, the table is truncated.

and

for about a thousand web servers

I'd change the c# program to just run a single query one time:

select * from table Order By servername,CreateDate

and have it handle "breaking" on a server name changes.

One table scan is better than 1,000. I would not slow down the main application's INSERTS into a log table (with a clustered index) just so your once a week queries run faster.


Yes, it would be a good idea to create a clustered index on servername column as now database has to do table scan to find out which records satisfy the criteria of servername = 'particular'.

Also horizontally partition the table by date would help the cause further. So, at a time the database would need to worry about only a day's data for all servers.

Then make sure that you fire date-based queries:

SELECT * FROM table
WHERE date BETWEEN '20110801' AND '20110808'
      AND servername = 'particular'
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号