I am trying to achieve accent and case-insensitive sorting in MySQL. Following the instructions in the manual, this is sup开发者_如何学Cposed to work with the utf8 character set and utf8_general_ci collation.
When I follow the example in the manual (http://dev.mysql.com/doc/refman/5.1/en/charset-collation-implementations.html) under "Collations for Unicode multi-byte character sets" I do not get the same results:
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 679877
Server version: 5.1.41-log MySQL Community Server (GPL) by Remi
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql> SET NAMES 'utf8' COLLATE 'utf8_general_ci';
Query OK, 0 rows affected (0.00 sec)
mysql> SELECT 'a' = 'A', 'a' = 'À', 'a' = 'á';
+-----------+-----------+-----------+
| 'a' = 'A' | 'a' = 'À' | 'a' = 'á' |
+-----------+-----------+-----------+
| 1 | 0 | 0 |
+-----------+-----------+-----------+
1 row in set (0.00 sec)
mysql>
In the example in the manual, those are all 1.
It also fails to treat accented characters equally when I try to set the collation directly in a query. In this example, the table is using latin1 and I'm converting to utf8.
mysql> select * from test;
+----------+
| k |
+----------+
| Cárdenas |
| Cardozo |
| Corbin |
| Cabrero |
+----------+
mysql> select k from test order by convert(k using utf8) collate utf8_general_ci
;
+----------+
| k |
+----------+
| Cabrero |
| Cardozo |
| Corbin |
| Cárdenas |
+----------+
4 rows in set (0.00 sec)
It should be ignoring the accent over the 'a' in the last entry and sorting it second. Any ideas what I'm doing wrong?
It works on my default MySQL installation. Since you haven't provided a SHOW FULL COLUMNS from test (@fsb comment) it's still possible that the collation in your table structure is incorrect.
Is the the collation of column 'k' set to something other than utf8_general_ci?
Check whether SELECT k from Names ORDER BY k ASC is giving the right answer
Has the MySQL installation Index.xml file been modified to change the meaning of utf8_general_ci?
Relevant part of my installation for comparison:
<charset name="utf8">
<family>Unicode</family>
<description>UTF-8 Unicode</description>
<alias>utf-8</alias>
<collation name="utf8_general_ci" id="33">
<flag>primary</flag>
<flag>compiled</flag>
</collation>
<collation name="utf8_bin" id="83">
<flag>binary</flag>
<flag>compiled</flag>
</collation>
</charset>
- Has the compiled code been modified to change the meaning of utf8_general_ci?
Conceivably someone else has tweaked one of these things for some nefarious purpose...
I might be missing something here...but can't you just make a function (say removeAccents
) that takes a string and returns the non-accent equivalent string, and then sort by removeAccents(field)
. I believe you can create an index for that as well, which should help with the performance.
精彩评论