开发者

Fast Levenshtein distance in R?

开发者 https://www.devze.com 2023-01-05 23:49 出处:网络
Is there a package that conta开发者_运维技巧ins Levenshtein distance counting function which is implemented as a C or Fortran code? I have many strings to compare and stringMatch from MiscPsycho is to

Is there a package that conta开发者_运维技巧ins Levenshtein distance counting function which is implemented as a C or Fortran code? I have many strings to compare and stringMatch from MiscPsycho is too slow for this.


And stringdist in the stringdist package does it too, even faster than levenshteinDist under certain conditions (1)


levenshteinDist (from the RecordLinkage package) calls compiled C code. Give it a try.


You could try stringDist from Biostrings as well


You could also use levenshtein_distance() from the textTinyR package. I got 'calloc' memory errors with all other packages when it came to larger character vectors of around 30k characters. Only textTinyR worked for me!

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号