For some reason I can't find any results from my valid index of 3552 items.
Please see the code below, followed by the console output of the program when I run it. 3552 is the number of indexed documents. /c:/test/stuff.txt is the correct indexed path that is retrieved from document 5 as a test. And all the text at the bottom is the full text (in XML type output) of the test file. What am I missing that my simple query does not produce results?
Maybe my WildcardQuery syntax is bad? I thought this would be inefficient (due to the wildcard at the beginning and end), but that it would at least return this document from the index...
import java.io.File;
import java.io.IOException;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Fieldable;
import org.apache.lucene.index.CorruptIndexException;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.Term;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.search.WildcardQuery;
import org.apache.lucene.store.FSDirectory;
public class Searcher
{
/**
* @param args
* @throws IOException
* @throws CorruptIndexException
*/
public static void main(String[] args) throws CorruptIndexException, IOException
{
System.out.println("Begin searching test...");
IndexSearcher searcher = new IndexSearcher(FSDirectory.open(new File(args[0])));
// termContainsWildcard is shown to be true here when debugging
// numberOfTerms is 0
开发者_StackOverflow中文版 WildcardQuery query = new WildcardQuery(new Term("contents", "*stuff*"));
System.out.println("Query field is: " + query.getTerm().field());
System.out.println("Query field contents is: " + query.getTerm().text());
TopDocs results = searcher.search(query, 5000);
// no results returned :(
System.out.println("Total results from index " + args[0] + ": " + results.totalHits);
for (ScoreDoc sd : results.scoreDocs)
{
System.out.println("Document matched. Number: " + sd.doc);
}
System.out.println();
System.out.println("Begin reading test...");
// now read from the index to see if I am crazy
IndexReader reader = IndexReader.open(FSDirectory.open(new File(args[0])));
// correctly shows the number of documents in the local index
System.out.println("Number of indexed documents: " + reader.numDocs());
// pick out a random, small document and check its fields
Document d = reader.document(5);
for (Fieldable f : d.getFields())
{
System.out.println("Field name is: " + f.name());
System.out.println(new String(f.getBinaryValue()));
}
}
}
CONSOLE OUTPUT WHEN RUN
Begin searching test...
Query field is: contents Query field contents is:*stuff*
Total results from index C:\INDEX2: 0
Begin reading test...
Number of indexed documents: 3552 Field name is: path /c:/test/stuff.txt Field name is: contents<?xml version="1.0" encoding="UTF-8"?>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta name="Content-Length" content="8"/>
<meta name="Content-Encoding" content="UTF-8"/>
<meta name="Content-Type" content="text/plain"/>
<meta name="resourceName" content="stuff.txt"/>
<title/>
</head>
<body>
<p>stuff
</p>
</body>
</html>
You might try using Luke to run your queries & test some different queries. You can also use Luke to browse the indexed terms, which might give you a clue as to what's going on. The code you used to index documents might also give some hints: for example, are your fields indexed? You are getting a binary value out of contents, which may mean that it was never tokenized and thus indexed.
By default, prefix wildcard queries (wildcard queries with a leading *) are disabled in Lucene. See the Lucene FAQ for more info. If you want to enable prefix wildcard queries, try:
QueryParser.setAllowLeadingWildcard(true)
精彩评论