I hav开发者_运维技巧e few thousands of records with few 100 fields in a MySQL Table.
Some records are duplicates and are marked as such. Now while I can simply delete the dupes, I want to retain any other possible valuable non-null data which is not present in the original version of the record. Hope I made sense.
For instance :
a b c d e f key dupe
--------------------
1 d c f k l 1 x
2 g h j 1
3 i h u u 2
4 u r t 2 x
From the above sample table, the desired output is :
a b c d e f key dupe
--------------------
2 g c h k j 1
3 i r h u u 2
If you look at it closely, the duplicate is determined by using the key (it is the same for 2 records, so the one that has an 'x' for dupe field is the one to be deleted by retaining some of the fields from the dupe (like c, e values for key 1).
Please let me know if you need more info about this puzzling problem.
Thanks a tonne!
p.s : If it is not possible using MySQL, a PERL/Python script sample would be awesome! Thanks!
If I understood you properly, you need 1) to merge 2 records 2) delete marked records (it is straightforward).
To accomplish the first task you can do use something like
UPDATE table1 t1
INNER JOIN table1 t2 ON (t1.key = t2.key AND t2.dupe='x')
SET t1.b= IFNULL(t1.b,t2.b), t1.c=IFNULL(t1.c,t2.c), ...etc
WHERE t1.dupe IS NULL
P.S. This query assumes that any record has 0 or 1 duplicate; if you have more, the query needs to be modified.
Using code you can merge the lines and then delete the dupe.
if you want to stay inside sql code you need to run a stored procedue on every column of the table that update the source row only if it's null and the destination row is not null.
iterating on all the dupes pairs using some kind of cursor.
This is a general Idea, I hope someone here can help you with more specific code...
The answer very depend on what you actually want to do. Do you want inplace edit duplicate records, or you want create temporary table with desired result. If you want inplace edit records, i'm think you better use python to perform several sequental queryes one like a1ex07 say and then simple 'delete from ... where dupe = 'x' '. If you can use temporary table, you can use one (not very) simple mysql 'INSERT INTO ... SELECT ', merging dupe records with IFNULL or CAOLESCE and actual igonoring records with dupe = 'x'
import string, os, sys
import csv
import MySQLdb
import pickle
from EncryptedFile import EncryptedFile
enc = EncryptedFile( os.getenv("HOME") + '/.py-encrypted-file')
user = enc.getValue("user")
pw = enc.getValue("pw")
db = MySQLdb.connect(host="127.0.0.1", user=user, passwd=pw,db=user)
cursor = db.cursor()
cursor2 = db.cursor()
cursor.execute("select * from delThisTable")
rows = cursor.fetchall()
data = dict()
for row in rows:
key, primary = row[0], row[1]
if key not in data:
data[key] = list(row[:-1])
else:
for i in range(len(row)-1):
if data[key][i] is None or (not primary and row[i] is not None):
data[key][i] = row[i]
Answer by Messa in this thread...
精彩评论