These two querys 开发者_如何学编程gives me the exact same result:
select * from topics where name='Harligt';
select * from topics where name='Härligt';
How is this possible? Seems like mysql translates åäö to aao when it searches. Is there some way to turn this off?
I use utf-8 encoding everywhere as far as i know. The same problem occurs both from terminal and from php.
Yes, this is standard behaviour in the non-language-specific unicode collations.
9.1.13.1. Unicode Character Sets
To further illustrate, the following equalities hold in both utf8_general_ci and utf8_unicode_ci (for the effect this has in comparisons or when doing searches, see Section 9.1.7.7, “Examples of the Effect of Collation”):
Ä = A Ö = O Ü = U
See also Examples of the effect of collation
You need to either
use a collation that doesn't have this "feature" (namely
utf8_bin
, but that has other consequences)use a different collation for the query only. This should work:
select * from topics where name='Harligt' COLLATE utf8_bin;
it becomes more difficult if you want to do a case insensitive LIKE
but not have the Ä = A
umlaut conversion. I know no mySQL collation that is case insensitive and does not do this kind of implicit umlaut conversion. If anybody knows one, I'd be interested to hear about it.
Related:
- Looking for case insensitive MySQL collation where “a” != “ä”
- MYSQL case sensitive search for utf8_bin field
Since you are in Sweden I'd recommend using the Swedish collation. Here's an example showing the difference it makes:
CREATE TABLE topics (name varchar(100) not null) CHARACTER SET utf8;
INSERT topics (name) VALUES ('Härligt');
select * from topics where name='Harligt';
'Härligt'
select * from topics where name='Härligt';
'Härligt'
ALTER TABLE topics MODIFY name VARCHAR(100) CHARACTER SET utf8 COLLATE utf8_swedish_ci;
select * from topics where name='Harligt';
<no results>
select * from topics where name='Härligt';
'Härligt'
Note that in this example I only changed the one column to Swedish collation, but you should probably do it for your entire database, all tables, all varchar columns.
While collations are one way of solving this, the much more straightforward way seems to me to be the BINARY
keyword:
SELECT 'a' = 'ä', BINARY 'a' = 'ä'
will return 1|0
In your case:
SELECT * FROM topics WHERE BINARY name='Härligt';
See also https://www.w3schools.com/sql/func_mysql_binary.asp
you want to check your collation settings, collation is the property that sets which characters are identical.
these 2 pages should help you
http://dev.mysql.com/doc/refman/5.1/en/charset-general.html
http://dev.mysql.com/doc/refman/5.1/en/charset-mysql.html
Here you can see some collation charts. http://collation-charts.org/mysql60/. I'm no sure which is the used utf8_general_ci though.
Here is the chart for utf8_swedish_ci. It shows which characters it interprets as the same. http://collation-charts.org/mysql60/mysql604.utf8_swedish_ci.html
精彩评论