Is there a package that conta开发者_运维技巧ins Levenshtein distance counting function which is implemented as a C or Fortran code? I have many strings to compare and stringMatch
from MiscPsycho
is too slow for this.
And stringdist
in the stringdist
package does it too, even faster than levenshteinDist
under certain conditions (1)
levenshteinDist (from the RecordLinkage
package) calls compiled C code. Give it a try.
You could try stringDist
from Biostrings
as well
You could also use levenshtein_distance()
from the textTinyR
package. I got 'calloc' memory errors with all other packages when it came to larger character vectors of around 30k characters. Only textTinyR
worked for me!
精彩评论