In SQL Server, I'm trying to do a comparative analysis between two different table structures with regard to insert performance given different keys. Does it matter if I use a table variable to do this testing, or should I use a temporary table? Or do I need to go to the trouble of actually creating the tables and indexes?
Specifically, I'm currently using the following script:
DECLARE @uniqueidentifierTest TABLE
(
--yes, this is terrible, but I am looking for numbers on how bad this is :)
tblIndex UNIQUEIDENTIFIER PRIMARY KEY CLUSTERED,
foo INT,
blah VARCHAR(100)
)
DECLARE @intTest TABLE
(
tblindex INT IDENTITY(1,1) PRIMARY KEY CLUSTERED,
foo INT,
blah VARCHAR(100)
)
DECLARE @iterations INT = 250000
DECLARE @ctrl INT = 1
DECLARE @guidKey UNIQUEIDENTIFIER
DECLARE @intKey INT
DECLARE @foo INT = 1234
DECLARE @blah VARCHAR(100) = 'asdfjifsdj fds89fsdio23r'
SET NOCOUNT ON
--test uniqueidentifier pk inserts
PRINT 'begin uniqueidentifier insert test at ' + CONVERT(VARCHAR(50), GETDATE(), 109)
WHILE @ctrl < @iterations
BEGIN
SET @guidKey = NEWID()
INSERT INTO @uniqueidentifierTest (tblIndex, foo, blah)
VALUES (@guidKey, @foo, @blah)
SET @ctrl = @ctrl + 1
END
PRINT 'end uniqueidentifier insert test at ' + CONVERT(VARCHAR(50), GETDATE(), 109)
SET @CTRL = 1
--test int pk inserts
PRINT 'begin int insert test at ' + CONVERT(VARCHAR(50), GETDATE(), 109)
WHILE @ctrl < @iterations
BEGIN
INSERT INTO @intTest (foo, blah)
VALUES (@foo, @blah)
SET @ctrl = @ctrl + 1
END
PRINT 'end int insert test at ' +开发者_JS百科 CONVERT(VARCHAR(50), GETDATE(), 109)
SET NOCOUNT OFF
If you want to compare actual performance, you need to create the tables and indexes (and everything else involved). While a temp table will be a much better analog than a table variable, neither is a substitute for an actual permanent table structure if you're seeking performance metrics.
All of that being said, however, you should avoid using uniqueidentifier
as a primary key, or, at the very least, use newsequentialid()
rather than newid()
. Having a clustered index means that the rows will actually be stored in physical order. If an inserted value is out of sequence, SQL Server will have to rearrange the rows in order to insert it into its proper place.
First of all never ever cluster on a uniqueidentifier when using newid()
, it will cause fragmentation and thus page splits, if you have to use a GUID then do it like this
create table #test (id uniqueidentifier primary key defualt newsequentialid())
newsequentialid()
won't cause page splits
Still an int is still better as the PK since now all your non clustered indexes and foreign keys will be smaller and now you need less IO to get the same numbers of rows back
I dunno why but I'd like to cite Remus Rusanu [1]:
First of all, you need to run the query repeatedly under each [censored] and average the result, discarding the one with the maximum time. This will eliminate the buffer warm up impact: you want all runs to be on a warm cache, not have one query warm the cache and pay the penalty in comparison.
Next, you need to make sure you measure under realistic concurrency scenario. IF you will have updates/inserts/deletes occur under real life, then you must add them to your test, since they will impact tremendously the reads under various isolation level. The last thing you want is to conclude 'serializable reads are fastest, lets use them everywhere' and then watch the system melt down in production because everything is serialized.
1) Running the query on a cold cache is not accurate. Your production queries will not run on a cold cache, you'll be optimizing an unrealistic scenario and you don't measure the query, you are really measuring the disk read throughput. You need to measure the performance on a warm cache as well, and keep track of both (cold run time, warm run times).
How relevant is the cache for a large query (millions of rows) that under normal circumstances runs only once for particular data? Still very relevant. Even if the data is so large that it never fits in memory and each run has to re-read every page of the table, there is still the caching of non-leaf pages (ie. hot pages in the table, root or near root), cache of narrower non-clustered indexes, cache of table metadata. Don't think at your table as an ISAM file
[1] Why better isolation level means better performance in SQL Server
Why better isolation level means better performance in SQL Server
精彩评论