I need 开发者_如何转开发a suitable weighing algo to return the most relevant tokens for a query ...i hv generated the tokens using Lucene 3.0 ..i m thinking of using the tf-idf concept?can someone suggest a better algo or a modified tf-idf ?
Lucene already implements a TF-IDF variant for weighting. See: http://lucene.apache.org/java/2_9_0/api/core/org/apache/lucene/search/Similarity.html
However the weighting is not State-of-the-Art anymore and lacks some performance on term bursts. There are attempts to introduce pluggable algorithms in solr 4.0 as far as i am uptodate. For some versions there are patches for bm25 or some of the newer algorithms available.
精彩评论