I have:
- Correct numerical ID such as Phone number / Social-security number / etc.
- Another number, from some data-entry form
The 2nd number is similar, but not equal to the 1st number. Both numbers are valid.
I want to calculate how probable it is that the 2nd number is actually a typing error of the 1st number.
Such errors may include:
- Off by a few digits
- Transposed digits
- Mis开发者_开发百科interpreted digits (1-7, 4-9, 3-8, 2-5)
Does anyone know about existance of such algorithm / code?
Edit:
I'm not looking for a general string-similarity algorithm. I'm looking for an algorithm optimized for human number-entry typing errors, or for some research about this topic.
There are several algorithms to measure a string similarity.
You could implement some variant of the Levenshtein distance or Damerau-Levenshtein distance that rates the types of errors differently.
Treat the numbers as a sequence of digits and Calculate the similarity ratio between the two numbers.
2.0*M / T.
Where T is the number of digits in both numbers
M is the number of matches in the 2 numbers
a similarity ratio of 0.6 and above means the 2 numbers are similar
Note that the ratio is 1 if the numbers are identical, and 0 if they have no digit in common.
精彩评论