I am trying to test SOLR for my application for finding percentage match between strings.
I configured solr and defined schema only for first_name matching for now, I used text_general datatype in schema (solr 3.3).
In my document/csv I kept word "rushik" and in solr query I am trying to search with "rushk" - intentionally removed "i"
Ideally with levenshtein algorithm the distance is 1 between above two strings thus percentage match between strings should be (1 - distance/maxLen(string1, string2)) which is (1 - 1/6) = 0.83 - that means both strings are 83% match.
But in solr its matching till I give rushk~0.79 in开发者_运维技巧 query - when I am using ~0.80, 0.81 etc its not matching with document.
Not sure if my calculation of levenshtein string match is incorrect or how exactly I can determine where the problem is.
Any help here is highly appreciated.
Thanks, Rushik.
The fuzzy percentage calculation for fuzzy query is -
distance = 1 - ((double)dist / (double)Math.min(textlen, targetlen));
return (distance > FUZZY_THRESHOLD);
In your case it would be 1 - 1/5 = 0.8 So this seems valid.
精彩评论