开发者

How can I read a Lucene document field tokens after they are analyzed?

开发者 https://www.devze.com 2023-02-20 04:38 出处:网络
If I create a document and add a field that is both stored and analyzed, how can I then read this field back as a list of tokens? I have the following:

If I create a document and add a field that is both stored and analyzed, how can I then read this field back as a list of tokens? I have the following:

            Document doc = new Document();
            doc.add(new Field("url", fileName, Store.YES, Index.NOT_ANALYZED));
            doc.add(new Field("text", fileContent, Store.YES, Index.ANALYZED));
            // add the document to the index
            writer.addDocument(doc);

So the fileContext is a String containing a lot of text. It is analyzed whereby it is tokenized when it is stored in the index. However, how can I get these tokens? I can retrieve the document from the index after it is stored, and I can read the "text" field from the document, but this is returned as a string. I would like to get the tok开发者_C百科ens if possible. My 'writer' is an IndexWriter instance and it uses a StandardAnalyzer. Any pointers would be very much welcomed.

Thank you very much


Check out document.getField("name").tokenStreamValue().

EDIT: Actually this question gives you the full solution using the above TokenStream.

0

精彩评论

暂无评论...
验证码 换一张
取 消