Lucene fuzzy matching uses a basic editDistance algorithm to implement fuzzy matching. Are there other implementations of fuzzy matching for Lucene which use other similarity metrics? They should identi开发者_如何学Cfy homphones also. Also please compare various fuzzy matching approaches for lucene.
Don't think Lucene offers any other string matching algorithms, you can however add one yourself. Here is a good library that contains most well known string comparison algorithms.
Something that I've been doing is pretty simple, and works in most scenarios (In my scenario, I have 6.7 million event names, from a dirty table that has slightly altered or drilled-down versions of event names, and the table I'm fuzzy matching with has all the clean event names)
``select distinct a.Column, b.Column
from tableA a
inner join tableB b
on '%' + SUBSTRING(b.Column, x, y) + '%' = '%' + SUBSTRING(a.Column, x, y) + '%'
order by a.Column asc;``
My problem is that if I simply did a fuzzy match with no substring, I was only getting about 11 results because of how obscure the naming conventions between the two were. This solution shows all of the drill-down-esque events being matched up with their broader counterparts in the clean table.
精彩评论