my SQL query
SELECT keyword
FROM table
WHERE MATCH (keyword)
AGAINST ('eco*' IN BOOLEAN MODE);
matches cells with these words:economy
,ecology
,echoscopy
(why?),echo
(why?) etc.
other SQL query
SELECT keyword
FROM table
WHERE MATCH (keyword)
AGAINST ('eci*' IN BOOLEAN MODE);
matches cell with the word:echidna
.
Yet both queries don't match word ectoplasm
开发者_如何学Go.
Why does echo
, echoscopy
match 'eco*'
and echidna
matches 'eci*'
?
I see key element in this problem being letter combination "ch".
Why does it work this way and how can I avoid this kind of matching?
The problem (feature?) was in the collation. "c" and "ch" were treated equal because of utf8_lithuanian_ci collation.
Edit:
Changing collation to utf8_unicode_ci fixes only certain issues.
The real solution is to use utf8_bin, which matches binary values of each character, meaning it's:
- case sensitive
- diacritics sensitive
The reason that it matches is that MATCH ... AGAINST is using regular expressions, and the * means, that the preceding char ("o") can be there from 0 to 9999999999999999999^ times. What you meant to match is
eco.*
Will match "eco" and "ecology" but not "echo".
eco.+
Will match "ecology" and "eco system" but not "eco" nor "echo".
Maybe you can try this
SELECT keyword FROM table WHERE keyword LIKE 'eco%';
精彩评论