We got big table with nearly 100+ millions rows. Can someone help how to find duplicate data within the table and may be move it to ARCHIVE
Table Name: CustomerData
NumberofFields: 10Latest one should stay (which is identified by END_DATE mentioned as NULL in that record)
Reg开发者_开发百科ards
You just need to move the rows where END_DATE isn't NULL?
In a single transaction:
INSERT INTO archive (column1, column2, ... column10)
SELECT column1, column2, ..., column10
FROM CustomerData
WHERE END_DATE IS NOT NULL
DELETE CustomerData
WHERE END_DATE IS NOT NULL
Assuming CustomerData Table structure as: CustomerDAta(cust_id,Cust_name,Address_ID,start_time,End_Date,.....,other 7 columns);
And assuming 2 customers have SAme Address ID to get Duplicates.
To insert Into Archive Table:-
INSERT INTO archive (column1, column2, ... column10)
SELECT cust_id, start_Date, ...,End_Date
FROM CustomerData
WHERE END_DATE IS NOT NULL
AND Address_ID IN(
SELECT Address_ID FROM
(
SELECT Address(ID),count(Address_ID)
FROM customerDAta
GROUP BY Address_ID
HAVING count(Adddress_ID)>1
)
)
)
To Delete From:- CustomerDAt Table:-
DELETE CustomerData
WHERE END_DATE IS NOT NULL
AND
Address_ID IN(
SELECT Address_ID FROM
(
SELECT Address(ID),count(Address_ID)
FROM customerDAta
GROUP BY Address_ID
HAVING count(Adddress_ID)>1
)
)
INNER SubQuery to Extract Duplicates BAsed on same Address_ID column similar to DeptID column in employees table provided with oracle database.
精彩评论