I am trying to see if using a custom index for a specific type of data might reduce fragmentation in my database.
[Edit: we are using MS SQL Server 2008 R2]
I have an SQL database containing timestamped measurement data. Lots of data is inserted all the time, but once inserted it practically never needs to be updated. These timestamps are, however, not unique, as several devices (around 50 of them) measure the data at the same time.
This means that every 50 rows in 开发者_运维技巧the table contain equal timestamp values. This data is received more or less simultaneously, although I could take additional care to ensure that rows are written as sequentially as possible (if that would help), perhaps by keeping them in memory for some time and then writing only when I get the data from all the devices for a single timestamp.
We are using NHibernate with Guid.Comb to avoid index lookups we would have with plain bigint IDs. As opposed to plain GUIDs, this should reduce fragmentation, but for so many inserts, fragmentation nevertheless happens very soon.
Since my data is timestamped, and data is inserted almost sequentially (increasing timestamps), I am wondering if there is a more clever way to create a primary key with a unique clustered index for this table. Timestamp column is basically a bigint number (.NET DateTime ticks).
I have also noticed that a non-clustered index over that same timestamp column also gets pretty fragmented. So what index strategy would you recommend to reduce heap fragmentation in this case?
Maybe take a look at this answer, HiLo looks interesting.
Also, maybe your fragmentation is not result of the discrepancy between the ordering of the index values and the order in which they are added, but natural file growth effect (as explained here)?
A seperate column for a key doesn't make a lot of sense for this table since you won't be updating any of the data. I imagine you'll be doing a lot of queries though, probably based on that timestamp column.
You could try making the primary key a combination of the timestamp column and a device id column. You could try making that clustered. That should allow you to write nearly as fast as possible. If you query by device however, you may need another index on device id and timestamp (the reverse). I wouldn't make the reverse the clustered one though, as that will make the writes happen all over the table rather than on the trailing pages. And if most queries involve a date range and more than one device, clustering on timestamp first should give you the best performance.
精彩评论