开发者

Is there any way to extract all the tokens from solr?

开发者 https://www.devze.com 2023-03-20 09:36 出处：网络

How can one extract all开发者_JS百科 the tokens from solr? Not from one document, but from all the documents indexed in solr?

相关专题：lucene solr

How can one extract all开发者_JS百科 the tokens from solr? Not from one document, but from all the documents indexed in solr?

Thanks!

You may do something like this(This sample is approved to be working on a lucene 4.x index):

IndexSearcher isearcher = new IndexSearcher(dir, true);
IndexReader reader = isearcher.getIndexReader();
Fields fields = MultiFields.getFields(reader);
Collection<String> cols = reader.getFieldNames(IndexReader.FieldOption.ALL);
for (String col : cols) {
Terms te = fields.terms(col);
if (te != null) {
    TermsEnum tex = te.getThreadTermsEnum();
    while (tex.next() != null)
        // do something 
        tex.getTerm().text();
    }
}

This iterates over all columns and also over every term per col. You may lookup the methods provided by TermsEnum like getTerm().