SQL Server - Find duplicates in table_问答_开发者

开发者 https://www.devze.com 2023-04-08 18:40 出处：网络

We got big table with nearly 100+ millions rows. Can someone help how to find duplicate data within the table and may be move it to ARCHIVE

相关专题：sql sql-server

We got big table with nearly 100+ millions rows. Can someone help how to find duplicate data within the table and may be move it to ARCHIVE

Table Name: CustomerData

NumberofFields: 10

Latest one should stay (which is identified by END_DATE mentioned as NULL in that record)

Reg开发者_开发百科ards

You just need to move the rows where END_DATE isn't NULL?

In a single transaction:

INSERT INTO archive (column1, column2, ... column10)
SELECT column1, column2, ..., column10
FROM CustomerData
WHERE END_DATE IS NOT NULL

DELETE CustomerData
WHERE END_DATE IS NOT NULL

Assuming CustomerData Table structure as: CustomerDAta(cust_id,Cust_name,Address_ID,start_time,End_Date,.....,other 7 columns);

And assuming 2 customers have SAme Address ID to get Duplicates.

To insert Into Archive Table:-

INSERT INTO archive (column1, column2, ... column10)
SELECT cust_id, start_Date, ...,End_Date
FROM CustomerData
WHERE END_DATE IS NOT NULL 
AND Address_ID IN(
        SELECT Address_ID FROM
            (
            SELECT Address(ID),count(Address_ID)
            FROM customerDAta
            GROUP BY Address_ID
            HAVING count(Adddress_ID)>1
            )
        )                       
                        )

To Delete From:- CustomerDAt Table:-

DELETE CustomerData
WHERE END_DATE IS NOT NULL
    AND
    Address_ID IN(
            SELECT Address_ID FROM
            (
            SELECT Address(ID),count(Address_ID)
            FROM customerDAta
            GROUP BY Address_ID
            HAVING count(Adddress_ID)>1
            )
        )

INNER SubQuery to Extract Duplicates BAsed on same Address_ID column similar to DeptID column in employees table provided with oracle database.