开发者

how is czech "ch" letter stored in mysql and how to get it using substr?

开发者 https://www.devze.com 2023-02-16 21:23 出处:网络
Even \"ch\" appears as two letters, in czech its considered as one letter and its order in alphabet is after H let开发者_运维问答ter (so correct order is a,b,c,d,e,f,g,h,ch,i,j (I skipped some nationa

Even "ch" appears as two letters, in czech its considered as one letter and its order in alphabet is after H let开发者_运维问答ter (so correct order is a,b,c,d,e,f,g,h,ch,i,j (I skipped some national characters). But when I do substr (colname, 1, 1) on column containing words begining with ch Im getting only "C"

this sql: SELECT SUBSTRING(title, 1, 1) AS title_truncated FROM node node WHERE node.type in ('termin') GROUP BY title_truncated ORDER BY title_truncated ASC"

returns: A, B, C, D, E, F, G, H, I, J (so no ch).

btw database is using utf8_czech_ci


Ch is not a character in itself in Unicode, it is a digraph.

As such, it seems impossible for a database collation to properly map the difference. What @Ladislav says in the comment, and the user in this mySQL internals discussion, seems to support this.

You will probably have to work around this manually, e.g. in your example, using a IF clause that tests for the presence of "Ch", and returns two characters if that is the case.

Reference: utf8_czech_ci collation table (mySQL 6)


Even though ch is considered a single sorting "letter" in Czech, it isn't considered a single "character" in any other way. It is stored and printed as two characters whenever it is encountered.

The collation setting in MySQL affects how strings are sorted, trying to sort individual characters is not very meaningful in many languages. E.g. č comes after (IIRC) c but and ne are equivalent so word ordering depends on the following letters.

I don't understand the underlying problem that you are trying to solve but I think the easiet approach might be to avoid using substring and to sort by title and only output the first "letter" if it changes when you are processing the results.


As a workaround, you could modify your definition of title_truncated like this:

CASE SUBSTRING(title, 1, 2)
  WHEN 'ch' THEN SUBSTRING(title, 1, 2)
  ELSE SUBSTRING(title, 1, 1)
END AS title_truncated
0

精彩评论

暂无评论...
验证码 换一张
取 消