开发者

can't find any results from valid index with either PhraseQuery or WildcardQuery?

开发者 https://www.devze.com 2023-03-05 02:48 出处:网络
For some reason I can\'t find any results from my valid index of 3552 items. Please see the code below, followed by the console output of the program when I run it. 3552 is the number of indexed docu

For some reason I can't find any results from my valid index of 3552 items.

Please see the code below, followed by the console output of the program when I run it. 3552 is the number of indexed documents. /c:/test/stuff.txt is the correct indexed path that is retrieved from document 5 as a test. And all the text at the bottom is the full text (in XML type output) of the test file. What am I missing that my simple query does not produce results?

Maybe my WildcardQuery syntax is bad? I thought this would be inefficient (due to the wildcard at the beginning and end), but that it would at least return this document from the index...

import java.io.File;
import java.io.IOException;

import org.apache.lucene.document.Document;
import org.apache.lucene.document.Fieldable;
import org.apache.lucene.index.CorruptIndexException;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.Term;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.search.WildcardQuery;
import org.apache.lucene.store.FSDirectory;


public class Searcher
{

    /**
    * @param args
    * @throws IOException 
    * @throws CorruptIndexException 
    */
    public static void main(String[] args) throws CorruptIndexException, IOException
    {

        System.out.println("Begin searching test...");

        IndexSearcher searcher = new IndexSearcher(FSDirectory.open(new File(args[0])));

        // termContainsWildcard is shown to be true here when debugging
        // numberOfTerms is 0
  开发者_StackOverflow中文版      WildcardQuery query = new WildcardQuery(new Term("contents", "*stuff*"));

        System.out.println("Query field is: " + query.getTerm().field());
        System.out.println("Query field contents is: " + query.getTerm().text());

        TopDocs results = searcher.search(query, 5000);

        // no results returned :(
        System.out.println("Total results from index " + args[0] + ": " + results.totalHits);

        for (ScoreDoc sd : results.scoreDocs)
        {
            System.out.println("Document matched. Number: " + sd.doc);
        }

        System.out.println();

        System.out.println("Begin reading test...");

        // now read from the index to see if I am crazy
        IndexReader reader = IndexReader.open(FSDirectory.open(new File(args[0])));

        // correctly shows the number of documents in the local index
        System.out.println("Number of indexed documents: " + reader.numDocs());

        // pick out a random, small document and check its fields
        Document d = reader.document(5);

        for (Fieldable f : d.getFields())
        {
            System.out.println("Field name is: " + f.name());
            System.out.println(new String(f.getBinaryValue()));
        }
    }
}  

CONSOLE OUTPUT WHEN RUN

Begin searching test...

Query field is: contents

Query field contents is: *stuff*

Total results from index C:\INDEX2: 0

Begin reading test...

Number of indexed documents: 3552

Field name is: path

/c:/test/stuff.txt

Field name is: contents

<?xml version="1.0" encoding="UTF-8"?>

<html xmlns="http://www.w3.org/1999/xhtml">

<head>

<meta name="Content-Length" content="8"/>

<meta name="Content-Encoding" content="UTF-8"/>

<meta name="Content-Type" content="text/plain"/>

<meta name="resourceName" content="stuff.txt"/>

<title/>

</head>

<body>

<p>stuff &#13;

</p>

</body>

</html>


You might try using Luke to run your queries & test some different queries. You can also use Luke to browse the indexed terms, which might give you a clue as to what's going on. The code you used to index documents might also give some hints: for example, are your fields indexed? You are getting a binary value out of contents, which may mean that it was never tokenized and thus indexed.


By default, prefix wildcard queries (wildcard queries with a leading *) are disabled in Lucene. See the Lucene FAQ for more info. If you want to enable prefix wildcard queries, try:

QueryParser.setAllowLeadingWildcard(true)
0

精彩评论

暂无评论...
验证码 换一张
取 消