Possible Duplicate:
Lucene search and underscores
I am using Lucene full text search for searching in my application.
But for example, if I search for 'Turbo_Boost' it returns 0 results.
For other text it works fine.
Any Idea?
Assuming you are using the StandardTokenizer
, it will split on the underscore character.
You can get around this by providing your own Tokenizer
which will keep the underscore in the Token
that's returned (either through a combination of Filter
instances or TokenFilter
instances).
A general rule of thumb with Lucene is to tokenize your search queries using the same Tokenizer/Analyzer you used to index the data.
see http://wiki.apache.org/lucene-java/LuceneFAQ#Why_is_it_important_to_use_the_same_analyzer_type_during_indexing_and_search.3F
I can only think of a few reasons why your query would fail:
First, and probably the least likely, considering other text searches fine, you didn't set the document's field to be analyzed. It won't be tokenized, so you can only search against the exact value of the whole field. Again, this one is probably not your issue.
The second (related to the third), and fairly likely, would depend on how you're executing the search. If you are not using the QueryParser
(which analyzes your text the same way you index it if constructed properly) and instead say you are using a TermQuery
like:
var tq = new TermQuery("Field", "Turbo_Boost");
That could cause your search to possibly fail. This has to do with the Analyzer
you used to index the document splitting or changing the case of "Turbo_Boost" when it was indexed, causing the string comparison at search-time to f
The third, and even more likely, has to do with the Analyzer
class you're using to index your items, versus the one you're using to search with. Using the same analyzer is important, because each analyzer uses a different Tokenizer
that splits the text into searchable terms.
Let me give you some examples using your own Turbo_Boost
query on how each analyzer will split the text into terms:
KeywordAnalyzer, WhitespaceAnalyzer -> Field:Turbo_Boost
SimpleAnalyzer, StopAnalyzer -> Field:turbo Field:boost
StandardAnalyzer -> Field:turbo Field:boost
You'll notice some of the Analyzers
are splitting the term on the underscore character, while KeywordAnalyzer
keeps it. It is extremely important that you use the same analyzer when you search, because you may not get the same results. It can also cause issues where sometimes the query will find results and other times it won't, all this depending on the query used.
As a side note, if you are using the StandardAnalyzer
, it's also important that you pass it the same Version
to the IndexWriter
and QueryParser
, because there are differences in how the parsing is done depending on which version of Lucene you expect it to emulate.
My guess your issue is one of those above reasons.
精彩评论