bulk-select into a varbinary_问答_开发者_运维开发者技术经验分享

开发者 https://www.devze.com 2023-01-13 11:22 出处：网络

I have an application which needs to create a snapshot of record identifiers which can be persisted in the database server.I currently have a table that looks like this:

相关专题：sql-server

I have an application which needs to create a snapshot of record identifiers which can be persisted in the database server. I currently have a table that looks like this:

CREATE TABLE Workset (
  Workset_id int,
  Sequence int,
  Record_id int
)

which I insert into using something like:

INSERT INTO WorkSet (Workset_id, Sequence, Record_id)
SELECT TOP 100 
  @Workset_id,
  ROW_NUMBER() OVER (
    ORDER BY -- this may be a complex ordering clause
  ),
  Record_id    
FROM SalesRecords
-- WHERE some conditions are met

Later on, I can query for the record identifier of a specific entry in the working set by the identifier and the row number

SELECT SalesRecords.* FROM SalesRecords
JOIN WorkSet ON WorkSet.Record_Id = SalesRecords.Record_id
WHERE Workset_id = @Workset_id AND Sequence = @Sequence

The problem with this is that as the snapshot gets large the amount of data that I have to write into the workset data grows quickly. It wouldn't be uncommon to have a workset in the millions of records and if each of those items r开发者_如何学JAVAequires 12 bytes of row storage it adds up quickly.

It seems that a more compact way to represent the data would be to store simply a Workset_id and a varbinary column containing all of the record identifiers in sequence order. This would avoid the overhead of repeating the workset identifier for every row in the same workset, and avoide the need to store the sequence numbers (because they are implied by the position in the varbinary).

Is there any reasonable way that I can transform my INSERT query into something that generates a varbinary of the record identifiers in order?

Something like this might work:

-- create a generic numbers table for demonstration purposes and populate it
IF OBJECT_ID('tempdb..#numbers') IS NOT NULL DROP TABLE #numbers;
CREATE TABLE #numbers (number INT PRIMARY KEY);
WITH Ten AS (SELECT * FROM (VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) a (number))
INSERT #numbers (number) SELECT a.number + b.number*10 + c.number*100 + d.number*1000 + e.number*10000
FROM Ten a, Ten b, Ten c, Ten d, Ten e;

-- concatenate the ints in a particular order
DECLARE @varbinary VARBINARY(MAX);
SET @varbinary = 0x;
WITH Ids AS (SELECT number AS Id FROM #numbers)
SELECT @varbinary += convert(BINARY(4),Id)
FROM (
  SELECT Id, Seq = ROW_number() OVER (ORDER BY NEWID())
  FROM Ids WHERE Id BETWEEN 1000 AND 1099
  ) a
ORDER BY Seq;

--split them back out by position
;WITH Positions AS (SELECT number AS Position FROM #numbers)
SELECT Position, Id = CONVERT(INT,substring(@varbinary,Position*4+1,4))
FROM Positions WHERE Position*4 < DATALENGTH(@varbinary);

You might want to dump the record ids into a temp table first, with a clustered primary key on Seq, before concatenating, for extra safety. The ordered concatenation behavior seems fairly reliable, but it is not documented behavior AFAIK, and appears to depends on an ordered spool or clustered index in the execution plan.

A static cursor w/ order by clause would also work. It will materialize the results in tempdb, and then iterate over them. Not too slow, considering.

Make sure to use the .WRITE clause of the UPDATE statement for incremental "inserts", or you will bring the server down.

Alternatively:

Create a separate table for each WorkSet. This reduces your row size by 4, and it's still a regular table.