开发者

downgrade non-ascii symbols to closest 7-bit ASCII equivalent (preferrably Java)

开发者 https://www.devze.com 2023-01-09 09:22 出处:网络
is there any simple/lightweight solution to change at least some of non-ASCII symbols to respective ASCII analogs? For example this string

is there any simple/lightweight solution to change at least some of non-ASCII symbols to respective ASCII analogs? For example this string

abc-åäö.txt

should be changed to

abc-aao.txt

A bit of background: Zip-tools do not reliably support UTF-8, hence the need to downgrade. AFAICR Google "download attachments as single zip file" feature replaces any non-ascii symbols with the '_' character.

PS: the code might as well be in s开发者_开发知识库ome other language, if it's more or less understandable I'll port that to Java. PPS: my first question so far, so please don't minus me below the ground okay?


Have a look at java.text.Normalizer. It can help you with transforming equivalent characters: http://en.wikipedia.org/wiki/Unicode_equivalence


Maybe this would do?


Looks like the problem is solved here -

[solution][howto] Convert special characters to normal chars (é to e) http://www.ramonfincken.com/permalink/topic192.html


If you would consider using python, there is a pretty good python package called unidecode, which can get the ASCII transliterations of Unicode text.


Okay, found something more or less working in this question: PHP: Replace umlauts with closest 7-bit ASCII aequivalent in an UTF-8 string

0

精彩评论

暂无评论...
验证码 换一张
取 消