Even "ch" appears as two letters, in czech its considered as one letter and its order in alphabet is after H let开发者_运维问答ter (so correct order is a,b,c,d,e,f,g,h,ch,i,j (I skipped some national characters). But when I do substr (colname, 1, 1) on column containing words begining with ch Im getting only "C"
this sql: SELECT SUBSTRING(title, 1, 1) AS title_truncated FROM node node WHERE node.type in ('termin') GROUP BY title_truncated ORDER BY title_truncated ASC"
returns: A, B, C, D, E, F, G, H, I, J (so no ch).
btw database is using utf8_czech_ci
Ch
is not a character in itself in Unicode, it is a digraph.
As such, it seems impossible for a database collation to properly map the difference. What @Ladislav says in the comment, and the user in this mySQL internals discussion, seems to support this.
You will probably have to work around this manually, e.g. in your example, using a IF
clause that tests for the presence of "Ch", and returns two characters if that is the case.
Reference: utf8_czech_ci
collation table (mySQL 6)
Even though ch
is considered a single sorting "letter" in Czech, it isn't considered a single "character" in any other way. It is stored and printed as two characters whenever it is encountered.
The collation setting in MySQL affects how strings are sorted, trying to sort individual characters is not very meaningful in many languages. E.g. č
comes after (IIRC) c
but ně
and ne
are equivalent so word ordering depends on the following letters.
I don't understand the underlying problem that you are trying to solve but I think the easiet approach might be to avoid using substring
and to sort by title
and only output the first "letter" if it changes when you are processing the results.
As a workaround, you could modify your definition of title_truncated
like this:
CASE SUBSTRING(title, 1, 2)
WHEN 'ch' THEN SUBSTRING(title, 1, 2)
ELSE SUBSTRING(title, 1, 1)
END AS title_truncated
精彩评论