weird question. i am inserting 10,000 records or so in a table and the primary key is not an Identity field. so when inserting all 10,000 if some are duplicate, is there a way to skip to next record in sql server insert and make sure the non-duplicate开发者_高级运维s go in? i really dont care bout the duplicates not getting inserted.
Use the "Ignore duplicate key" option.
The simplest way to do this is to delete the Primary Key in SQL Server Management Studio.
Then create a new Index, of type "Index", set Is Unique to "Yes", and set "Ignore Duplicate Keys" to "Yes". Then insert your records. It will insert them all except the duplicates. When you are done, you can delete this index, and recreate your Primary Key.
If you want a TSQL method, see the IGNORE_DUP_KEY option in the CREATE INDEX call:
CREATE INDEX (Transact-SQL)
EDIT:
Another way would be to use a LEFT JOIN between your source table and the records you are going to insert, and a GROUP BY clause, only inserting records that don't exist in your source. The GROUP BY will eliminate your duplicates in the new records.
you need to define your primary key to ignore duplicates:
CREATE TABLE [dbo].[t2](
[n] [int] NOT NULL,
PRIMARY KEY CLUSTERED
(
[n] ASC
)WITH (IGNORE_DUP_KEY = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
Using this option may hurt performance:
If your data has a small percentage of duplicates, then IGNORE_DUP_KEY may speed up your inserts. For larger amounts of duplicates, IGNORE_DUP_KEY may slow them down significantly. I set up two tables, stripping down all the irrelevant details, as follows:
CREATE TABLE t1(n INT NOT NULL PRIMARY KEY)
GO
CREATE TABLE [dbo].[t2](
[n] [int] NOT NULL,
PRIMARY KEY CLUSTERED
(
[n] ASC
)WITH (IGNORE_DUP_KEY = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
If the incoming data had no duplicates, the performance of both inserts was consistently the same:
INSERT t1(n)
SELECT n FROM dbo.Numbers
INSERT t2(n)
SELECT n FROM dbo.Numbers
(Note that dbo. Numbers has 1 million rows.) Of course, I always truncated both tables between my tests.
If the incoming data had 1% of duplicates, the insert with IGNORE_DUP_KEY consistently performed approximately 5% faster:
INSERT t1(n)
SELECT DISTINCT n FROM(
SELECT n FROM dbo.Numbers
UNION ALL
SELECT n FROM dbo.Numbers WHERE n <10000
) AS t
INSERT t2(n)
SELECT n FROM dbo.Numbers
UNION ALL
SELECT n FROM dbo.Numbers WHERE n <10000
On the other hand, if the incoming data had 100% of duplicates, the insert with IGNORE_DUP_KEY consistently performed at least 300% slower, both for a large set of 2 million rows:
INSERT t1(n)
SELECT DISTINCT n FROM(
SELECT n FROM dbo.Numbers
UNION ALL
SELECT n FROM dbo.Numbers
) AS t
INSERT t2(n)
SELECT n FROM dbo.Numbers
UNION ALL
SELECT n FROM dbo.Numbers
As well as for a smaller set of 200K rows:
INSERT t1(n)
SELECT DISTINCT n FROM(
SELECT n FROM dbo.Numbers WHERE n<100000
UNION ALL
SELECT n FROM dbo.Numbers WHERE n<100000
) AS t
INSERT t2(n)
SELECT n FROM dbo.Numbers WHERE n<100000
UNION ALL
SELECT n FROM dbo.Numbers WHERE n<100000
Overall, I decided not to use IGNORE_DUP_KEY in my particular case. I decided that small savings for a small amount of duplicates do not justify the risk of a huge performance drop for larger amounts of duplicate data.
This example query skip duplicate rows by PK1:
INSERT INTO Dest (PK1, Field2)
SELECT s.PK1, s.F2
FROM Source s
WHERE
(
SELECT TOP 1 d.PK1
FROM Dest d
WHERE d.PK1 = s.PK1
) IS NULL
A couple of ways spring to mind, as I don't know what mechanism you're using to do the inserting.
1) Bulk load all the records into a new empty table, then run an INSERT into the real table from that staging table where the record doesn't already exist in the main table. e.g.
INSERT MyRealTable (PKField, Field1)
SELECT x.PKField, x.Field1
FROM MyStagingTable x
LEFT JOIN MyRealTable r ON x.PKField = r.PKField
WHERE r.PKField IS NULL
2) wrap each insert in a TRY...CATCH block to swallow the PK constraint error should one happen (if you're using SQL 2005 or later).
精彩评论