开发者

Levenshtein distance on non-English strings

开发者 https://www.devze.com 2022-12-20 21:54 出处:网络
Will the Levenshtein distance algorithm work well for non-English language strings too? Up开发者_如何学Godate: Would this work automatically in a language like Java when comparing Asian characters?O

Will the Levenshtein distance algorithm work well for non-English language strings too?

Up开发者_如何学Godate: Would this work automatically in a language like Java when comparing Asian characters?


Only if language is letter based. For example Russian, German,... but hieroglyph (China for example) or syllable (like Laos) - not.


Yes. But you have to treat the non-english characters as "1 character", not as multiple characters (for example with utf-8). For example, in python you would use the unicode class to represent the string (and characters).


Levenshtein doesn't care about languages, it just tells you how many characters need to be changed (added, removed, exchanged) to get from one string to the other.

So: yes, but you'll have to check your charset, some foreign "single" characters my otherwise be treated as two (or more) characters.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号