Given a full table, how do I cha开发者_开发知识库nge the collation from utf8_bin to utf8_unicode_ce? The normal "alter" query does not work, because of "duplicate entry errors". For example there are two entries
David Hussa
and
David Hußa
I know they are the same. Is there an elegant way to tell MySQL to "merge" the entrys? I should mention, that the id of the entries are used in other tables as reference so this has to be respected too by MySQL. Or do I have to do this the long and annoying way: Means merging every duplicate manually and then change the collation?
The table looks like this:
delimiter $$
CREATE TABLE `authors` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(100) COLLATE utf8_bin NOT NULL,
`count` int(11) NOT NULL DEFAULT '1',
PRIMARY KEY (`id`),
UNIQUE KEY `name_UNIQUE` (`name`),
FULLTEXT KEY `name_FULLTEXT` (`name`)
) ENGINE=MyISAM AUTO_INCREMENT=930710 DEFAULT CHARSET=utf8 COLLATE=utf8_bin COMMENT='Stores all authors from dblp.xml.'$$
You can delete the duplicate entries:
DELETE a2
FROM authors a1
JOIN authors a2
ON a2.name COLLATE UTF8_GENERAL_CI = a1.name COLLATE UTF8_GENERAL_CI
AND a2.id < a1.id
Note that this may take long time if your table is large.
It would be better to do this:
Drop the
UNIQUE
constraintChange the collation
Create a plain, non-unique index on
name
Run the query (without
COLLATE
clause):DELETE a2 FROM authors a1 JOIN authors a2 ON a2.name = a1.name AND a2.id < a1.id
Drop the index
- Recreate the
UNIQUE
constraint.
To update the referencing tables, run this queries before deleting the entries:
UPDATE child c
JOIN (
(
SELECT name COLLATE utf8_unicode_ci AS name_ci, MAX(id) AS mid
FROM authors
GROUP BY
name_ci
) pa
JOIN authors a
ON a.name COLLATE utf8_unicode_ci = name_ci
)
ON c.author = a.id
SET author = mid;
on all referencing tables.
精彩评论