开发者

Weighing the tokens generated from Lucene

开发者 https://www.devze.com 2023-03-25 20:52 出处:网络
I need 开发者_如何转开发a suitable weighing algo to return the most relevant tokens for a query...i hv generated the tokens using Lucene 3.0 ..i m thinking of using the tf-idf concept?can someone sugg

I need 开发者_如何转开发a suitable weighing algo to return the most relevant tokens for a query ...i hv generated the tokens using Lucene 3.0 ..i m thinking of using the tf-idf concept?can someone suggest a better algo or a modified tf-idf ?


Lucene already implements a TF-IDF variant for weighting. See: http://lucene.apache.org/java/2_9_0/api/core/org/apache/lucene/search/Similarity.html

However the weighting is not State-of-the-Art anymore and lacks some performance on term bursts. There are attempts to introduce pluggable algorithms in solr 4.0 as far as i am uptodate. For some versions there are patches for bm25 or some of the newer algorithms available.

0

精彩评论

暂无评论...
验证码 换一张
取 消