开发者

Solr Lucene fuzzy match returning wrong results

开发者 https://www.devze.com 2023-04-05 20:52 出处:网络
I am trying to test SOLR for my application for finding percentage match between strings. I configured solr and defined schema only for first_name matching for now, I used text_general datatype in sc

I am trying to test SOLR for my application for finding percentage match between strings.

I configured solr and defined schema only for first_name matching for now, I used text_general datatype in schema (solr 3.3).

In my document/csv I kept word "rushik" and in solr query I am trying to search with "rushk" - intentionally removed "i"

Ideally with levenshtein algorithm the distance is 1 between above two strings thus percentage match between strings should be (1 - distance/maxLen(string1, string2)) which is (1 - 1/6) = 0.83 - that means both strings are 83% match.

But in solr its matching till I give rushk~0.79 in开发者_运维技巧 query - when I am using ~0.80, 0.81 etc its not matching with document.

Not sure if my calculation of levenshtein string match is incorrect or how exactly I can determine where the problem is.

Any help here is highly appreciated.

Thanks, Rushik.


The fuzzy percentage calculation for fuzzy query is -

distance = 1 - ((double)dist / (double)Math.min(textlen, targetlen));
return (distance > FUZZY_THRESHOLD);

In your case it would be 1 - 1/5 = 0.8 So this seems valid.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号