I've got a large dataset, some of which is is duplicate records, which are identifiable by dupes in two fields.
To find these records, the following query works:
SELECT * FROM supplierstuffs
GROUP BY "Supplier Code", "Cost ex Tax"
HAVING count("Des开发者_开发百科cription") > 1
Basically what I want to do is cat together all the values of "Description" to form a single row, then replace all of the duplicated rows with the single row.
This is my half broken query so far, it's kludgy and horrid. My primary goal is to get this working - but if I learn some new tricks in sql along the way that's not at all a bad thing.
UPDATE supplierstuffs SET "Description" =
(SELECT array_to_string(array_accum("Description"), ', ') FROM supplierstuffs
GROUP BY "Supplier Code", "Cost ex Tax"
HAVING count("Description") > 1)
WHERE .....
This is far as I've gotten. What should I be reading to get a bit further? I've read a couple of books and a lot of webpages on the topic. However in this case I think my problem is not limited lack of SQL (ok, it's not my only problem) but more approaching the problem the wrong way.
EDIT 1:
'Name'; 'Supplier Code'; 'Desciption';
"7CPS PODIUM S/SLV CRICKET POLO";"7CPS";"04 -14, S - 3XL"
"7CP PODIUM CRICKET PANT ";"7CP";"08 -14, S - 2XL"
"7CPT PODIUM 3/4 SLV CRICKET POLO";"7CPT";"04 -14, S - 3XL"
"7CPL PODIUM L/SLV CRICKET POLO";"7CPL";"04 -14, S - 3XL"
"T444MS Cool dry breathable sporty T-shirts";"T444MS";"XS - 2XL, XS - 2XL"
"T232RG Raglan Sleeve Tee";"T232RG";"XS - 3XL, 8-16"
^^ is what I want to create from vv
"T232RG Raglan Sleeve Tee";"T232RG";"XS - 3XL"
"T232RG Raglan Sleeve Tee";"T232RG";"XS - 3XL"
"T232RG Raglan Sleeve Tee";"T232RG";"S - 3XL"
"T232RG Raglan Sleeve Tee";"T232RG";"XS - 3XL"
"T232RG Raglan Sleeve Tee";"T232RG";"XS - 3XL"
"T232RG Raglan Sleeve Tee";"T232RG";"XS - 3XL"
"T232RG Raglan Sleeve Tee";"T232RG";"XS - 3XL"
"T232RG Raglan Sleeve Tee";"T232RG";"XS - 3XL"
"T232RG Raglan Sleeve Tee";"T232RG";"8-16"
"T232RG Raglan Sleeve Tee";"T232RG";"XS - 3XL"
"T232RG Raglan Sleeve Tee";"T232RG";"XS - 3XL"
"T232RG Raglan Sleeve Tee";"T232RG";"XS - 3XL"
"T232RG Raglan Sleeve Tee";"T232RG";"XS - 3XL"
"T232RG Raglan Sleeve Tee";"T232RG";"XS - 3XL"
"T232RG Raglan Sleeve Tee";"T232RG";"XS - 3XL"
"T232RG Raglan Sleeve Tee";"T232RG";"XS - 3XL"
"T444MS Cool dry breathable sporty T-shirts";"T444MS";"XS - 2XL"
"T444MS Cool dry breathable sporty T-shirts";"T444MS";"XS - 2XL"
"T444MS Cool dry breathable sporty T-shirts";"T444MS";"XS - 2XL"
"T444MS Cool dry breathable sporty T-shirts";"T444MS";"XS - 2XL"
"7CP PODIUM CRICKET PANT ";"7CP";"08 -14"
"7CP PODIUM CRICKET PANT ";"7CP";"S - 2XL"
"7CPL PODIUM L/SLV CRICKET POLO";"7CPL";"04 -14"
"7CPL PODIUM L/SLV CRICKET POLO";"7CPL";"S - 3XL"
"7CPS PODIUM S/SLV CRICKET POLO";"7CPS";"04 -14"
"7CPS PODIUM S/SLV CRICKET POLO";"7CPS";"S - 3XL"
"7CPT PODIUM 3/4 SLV CRICKET POLO";"7CPT";"04 -14"
"7CPT PODIUM 3/4 SLV CRICKET POLO";"7CPT";"S - 3XL"
^^ noting that the lines not having more than one description line need to remain untouched.
I've so far created the new records in a new table with:
INSERT INTO tmptable
SELECT "Name" , "Supplier Code", array_to_string(array_accum("Description"), ', ')
FROM supplierstuffs
GROUP BY "Name", "Supplier Code", "Description"
HAVING count("Description") > 1
So now all that remains is to delete the records that were caught by the cat command. It seems I can't DELETE FROM
with a having clause? I'm thinking that DELETE FROM table WHERE oid IN (SELECT OID's using having clause)
Will work?
EDIT 2:
SELECT array_accum(oid)
FROM supplierstuffs
GROUP BY "Name", "Supplier Code", "Colour", "Cost ex Tax"
HAVING count("Description") > 1
returns a few arrays of 2 oids, all of which need to be delorted. I feel I'm very close, and yet so far. Thanks in advance
The following approach will work
- Identify only the duplicate rows and store them in a new table.
- Delete duplicate rows from parent table
- Concatenate the description column in the table containing only duplicate rows. Concatenate using a group by clause.
- Insert all rows from the result of step 3 into the original table.
So what you currently have is something like this ...
DESCRIPTION SUPPLIER_CODE COST_EX_TAX
Widget X23 42.00
Brass gadget X23 42.00
Flange X42 23.00
Flange, steel X42 23.00
... and what you want is ...
DESCRIPTION SUPPLIER_CODE COST_EX_TAX
Brass gadget, Widget X23 42.00
Flange, Flange, steel X42 23.00
This still doesn't seem like the right approach. That concatenated DESCRIPTION seems wrong to me. However you know your data and your customer's requirements better than I do.
精彩评论