I'd like to store strings also in a more queryable slug-like format to the database, forcing it to lowercase, replacing the accented letters with their latin counterparts (ä -> a, ö -> o, ç -> c etc.) and replacing other开发者_运维问答 special characters with e.g. dashes. Is there a standard for these kind of format? What would be preferable means to achieve it in Java?
The database can do this for you through collations. Collations specify which characters in a specific character set can be considered equivalent with each other when compared.
Have a look at this for visual example of a collation:
http://www.collation-charts.org/mysql60/mysql604.utf8_general_ci.european.html
Here's a good description of how collations work from the MySQL manual:
http://dev.mysql.com/doc/refman/5.0/en/charset-syntax.html
This is the solution that I've found working best so far:
return Normalizer
.normalize(src.trim().toLowerCase(Locale.ENGLISH),
Normalizer.Form.NFD)
.replaceAll("\\p{InCombiningDiacriticalMarks}+", "")
.replaceAll("[^\\p{ASCII}]+", "-")
.replaceAll("[^a-z0-9]+", "-").replaceAll("(^-|-$)+", "");
This converts: ¿Qué? to que, Cool!!!!1 to cool-1 and åæø to a.
精彩评论