Removing accent marks (diacritics) from Latin characters for comparison [duplicate]_问答_开发者

Removing accent marks (diacritics) from Latin characters for comparison [duplicate]

开发者 https://www.devze.com 2023-01-06 22:10 出处：网络

This question already has answers here: Remove diacritical marks (ń ǹ ň ñ ṅ ņ ṇ ṋ ṉ ̈ ɲ ƞ ᶇ ɳ ȵ) from Unicode chars

This question already has answers here: Remove diacritical marks (ń ǹ ň ñ ṅ ņ ṇ ṋ ṉ ̈ ɲ ƞ ᶇ ɳ ȵ) from Unicode chars (12 answers) Closed 8 years ago.

I need to compare the names of European places that are written using the L开发者_StackOverflow社区atin alphabet with accent marks (diacritics) on some characters. There are lots of Central and Eastern European names that are written with accent marks like Latin characters on ž and ü, but some people write the names just using the regular Latin characters without accent marks like z and u.

I need a way to have my system recognize for example mšk žilina being the same as msk zilina, and similar for all the other accented characters used. Is there a simple way to do this?

You can make use of java.text.Normalizer and a little regex to get rid of the diacritical marks.

public static String removeDiacriticalMarks(String string) {
    return Normalizer.normalize(string, Form.NFD)
        .replaceAll("\\p{InCombiningDiacriticalMarks}+", "");
}

Usage example:

String text = "mšk žilina";
String normalized = removeDiacriticalMarks(text);
System.out.println(normalized); // msk zilina

Removing accent marks (diacritics) from Latin characters for comparison [duplicate]

精彩评论

关注公众号

热门标签

图文推荐

Removing accent marks (diacritics) from Latin characters for comparison [duplicate]

更多 问答 相关资讯：

精彩评论

关注公众号

热门标签

图文推荐

更多问答相关资讯：