开发者

mysql doesn't take accents into account validating uniqueness of index key

开发者 https://www.devze.com 2022-12-15 18:48 出处:网络
I have a table with just a single field named \"tag\", which is a varchar(250) the field is defined with collation \'latin1_spanish_ci\', because I\'m planning to save words in spanish...

I have a table with just a single field named "tag", which is a varchar(250)

the field is defined with collation 'latin1_spanish_ci', because I'm planning to save words in spanish...

the problem is that the following query

select * from test where tag = 'unó'

gives me exactly the same result as

select * from test where tag = 'uno'

that is, for mysql, 'uno' = 'unó'... and 'unò', and 'unö'... etc...

this field is supossed to have a unique key...

I tried with others collations, all the "bin" collations seem to work, and also latin1_general_ci too...

I'd just like to know if this would be the appropiate collation to choose for every field in the database, or if I might face any trouble choosing it... take into account that I'm planing to store spanish data in that db...

saludos

sas

ps: anyway, it seems really odd that in a spanish collation accents co开发者_高级运维uld be considered meaningless...

--

edit: I did a couple of tests, I entered data with á Á é É ñ Ñ, etc, and it seems like mysql can really handle them ok...


You could use latin1_general_ci as your default database/table/column collation and specify latin1_spanish_ci on a per-select basis where needed:

select * from test order by tag collate latin1_spanish_ci;
select * from test where tag = 'uno' collate latin1_spanish_ci;

Because 'o', 'ó' etc. are considered equal in latin1_spanish_ci, the latter statement returns multiple results if the table contains e.g. 'uno', 'unó' and 'unò'.


The collation is not used to define in which language is the data in the column. It is used to define how different letters or multiple letters/codepoints must be defined as the same for comparisons (ordering and equality).

The ci suffix of latin1_spanish_ci means case insensitive.

If you want to be able to distinguish any accentuated letter and any case, the collation named after a language name are not appropriate. Use instead the collations with the _bin suffix.

You can also attach a specific collation to a value to force the use of a specific collation over the collation of the column:

SELECT * FROM test WHERE tag = ('uno' COLLATE latin1_bin);


You can use a UTF-8 collation to differentiate the value.


this is a quick fix to this issue.

'unó' should be saved as 'unó' on your database.

Then your search should look like

SELECT * from database where field = 'unó'

Using the UNICODE HTML ENTITY is the key https://www.compart.com/en/unicode/U+00F3 ... Cheers !

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号