Removing duplicates from mysql using php_问答_开发者

Removing duplicates from mysql using php

开发者 https://www.devze.com 2023-03-15 05:02 出处：网络

This conserns two tables: \"wp_extra_data\" and \"wp_posts\". wp_extra_data has the fields \"id\", \"po开发者_StackOverflow社区st_id\" and \"post_title_md5\". Now I need to figure out how to retrieve

相关专题：php sql

This conserns two tables: "wp_extra_data" and "wp_posts". wp_extra_data has the fields "id", "po开发者_StackOverflow社区st_id" and "post_title_md5". Now I need to figure out how to retrieve duplicates from "post_title_md5" and then *delete from wp_posts where id = wp_extra_data->post_id except for the oldest post_id (aka the one with the lowest ID*

Example: wp_extra_data may contain:

ID post_id post_title_md5
23 45      322a85c7502dfb2fce5bbd6794c00f81
56 92      322a85c7502dfb2fce5bbd6794c00f81
75 12      322a85c7502dfb2fce5bbd6794c00f81

In which case I'd want the script to "remove from wp_posts where id = 92 and 12". So in short; how do I efficiently retrieve duplicates from wp_extra_data and then remove all of them except the one with the lowest key(ID). May be worth to note that I do not know what "post_title_md5" will be up front.

mysql_query("Delete from wp_posts where ID not in
(select post_id from wp_extra_data where id in(select min(ID)
from wp_extra_data
group by post_title_md5))")

Retrieving the duplicate hashes is easy:

SELECT post_title_md5
FROM wp_extra_data
GROUP BY post_title_md5
HAVING COUNT(post_title_md5) > 1

Then you can retrieve the post_id's using the above as part of a subquery:

SELECT post_id
FROM wp_extra_data
WHERE (post_title_md5 IN (... previous query here ...));

However, this'll retrieve all the IDs with the duplicated hash. So you'd have to do some filter on this PHP-side and delete only the left-over IDs after you've eliminated the "first" one.