开发者

Lucene: getting the full collection documents as results

开发者 https://www.devze.com 2023-02-13 22:42 出处:网络
When开发者_StackOverflow中文版 I perform a query in Lucene (topDocs = searcher.search(booleanQuery, 220000);) I get 170 hits as retrieved doc. Which is correct but I would like to have the full list o

When开发者_StackOverflow中文版 I perform a query in Lucene (topDocs = searcher.search(booleanQuery, 220000);) I get 170 hits as retrieved doc. Which is correct but I would like to have the full list of docs in the results even if the scores are very low.

Is there a way to force lucene to get the full list of documents of all my collection and not just the relevant ones ?

Or maybe it means that all other docs score is 0 ?

thanks


Since Lucene 3.x, you can use TotalHitCountCollector to retrieve the total hits of a query. Then you can retrieve all documents for your query with the total hit count. Be careful with the case without any hits.

TotalHitCountCollector collector = new TotalHitCountCollector();
searcher.search(booleanQuery, collector);
topDocs = searcher.search(booleanQuery, Math.max(1, collector.getTotalHits()));


please specify q=*:* as a search term


This question is old now, but I think what OP was looking for is MatchAllDocsQuery class.


You can add some field to all docs like test:1 and then search like [your_query] OR test:1.


It should work if you search for '*' and allow leading * in wildcard queries. Just did a test in Luke on a 501 document index, which returns 501 results for this query.


Lucene does not do any filtering based on score. If a query has 170 hits, then it means that only 170 documents matched the query. The rest of the documents did not match and can be presumed to have a score of 0.


I have the same question and I couldn't find a satisfactory answer anywhere. I had read that you could just use IndexSearcher.search(query, Integer.MAX_VALUE), however this seemed to be very slow so I presume memory is being allocated for the result set somewhere. I really don't know why Lucene doesn't already provide a way to get the entire result set, but here's my solution...

    TotalHitCollector collector = new TotalHitCollector();
    indexSearcher.search(query, collector);
    if (collector.getTotalHits() != 0) {
        for (int i = 0; i < collector.getTotalHits(); i++) {
            Document doc = indexSearcher.doc(collector.getDoc(i));
        }
    }

... and the TotalHitCollector class...

    public static class TotalHitCollector extends SimpleCollector {

    private int base;
    private final List<Integer> docs = new ArrayList<>();

    public int getTotalHits() {
        return docs.size();
    }

    public int getDoc(int i) {
        return docs.get(i);
    }

    @Override
    public void collect(int doc) {
        doc += this.base;
        docs.add(doc);
    }

    @Override
    protected void doSetNextReader(LeafReaderContext context) {
        this.base = context.docBase;
    }

    @Override
    public ScoreMode scoreMode() {
        return ScoreMode.COMPLETE_NO_SCORES;
    }
}
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号