开发者

Is there a MySQL utf8 collation that will not conflate accented characters?

开发者 https://www.devze.com 2023-03-19 12:33 出处:网络
I have a utf8 table and am storing data like: +-------+--------+ name| gender | +-------+--------+ ESMÉ| F|

I have a utf8 table and am storing data like:

+-------+--------+
| name  | gender |
+-------+--------+
| ESMÉ  | F      | 
| ESME  | F      | 
+-------+--------+

However, when I try to add a unique key on (name, gender) these two rows violate the constraint. I开发者_StackOverflow社区 have been able to achieve my goal using the utf8_bin collation but then I lose case-insensitivity.

So I guess what I'm really looking for is a way to store utf8 data, but not have accented and unaccented characters be seen as equivalent in my unique key.


I don't think what you want is directly possible. The collation determines the way your date is sorted and compared. For your key you want two values to be seen as distinct, but in your normal comparison you wish to have them be seen as the same.

There is a trick around this though when specifying your queries: Just cast your results to utf8 without specifying the collation. This will cast the result and set the collation back to the default.

The following example shows you what I mean:

mysql> show create table test_col;
+----------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table    | Create Table                                                                                                                                                        |
+----------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| test_col | CREATE TABLE `test_col` (
  `name` varchar(200) COLLATE utf8_bin DEFAULT NULL,
  UNIQUE KEY `ixuniq` (`name`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_bin |
+----------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

mysql> insert into test_col (name) values ('Y'),(unhex('c39d'));Query OK, 2 rows affected (0.00 sec)
Records: 2  Duplicates: 0  Warnings: 0

mysql> select * from test_col;
+------+
| name |
+------+
| Y    |
| Ý   |
+------+
2 rows in set (0.00 sec)

mysql> select * from test_col where name='Y';
+------+
| name |
+------+
| Y    |
+------+
1 row in set (0.00 sec)

mysql> select * from test_col where convert(name using 'utf8')='y';+------+
| name |
+------+
| Y    |
| Ý   |
+------+
2 rows in set (0.00 sec)


I have exactly the same problem, the collation exist in latin, it is latin1_general_ci but none exist in utf8. You have 3 solutions :

  • Use latin
  • create or find someone who has created the collation http://dev.mysql.com/doc/refman/5.0/en/ldml-collation-example.html (and let me know ;-)
  • use a UPPERCASE when comparing characters (or inserting them)
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号