开发者

lucene to add extra boost on first occurrence of term

开发者 https://www.devze.com 2023-02-06 23:18 出处:网络
i\'m working on a sys开发者_Go百科tem which will using apache lucene to analysis and rank a group of web page content, from different source,

i'm working on a sys开发者_Go百科tem which will using apache lucene to analysis and rank a group of web page content, from different source,

the problem what im facing now was, the result always show a group of pages came from the same source first, when the source had more that 1 pages having the better weight.

is that possible for me to use lucene option to further refine the result, so only the first occurrence for the source with be listed, while the remaining will be drag down to the end of the result list, so at least user can see different result from different source first, instead of seeing a full list of the content from the same source at the first few pages....


The latest (unreleased) version of Solr (which is built on top of Lucene) has a feature called field / result collapsing, which will group together results based a value of a field. Looks like this is what you're looking for:

http://wiki.apache.org/solr/FieldCollapsing

If you don't want to use Solr, then you'll have to implement this yourself by iterating through the result set and reordering it based on your criteria. You'll probably need to utilize FieldCache for your "source" field to make this perform well enough.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号