开发者

Lucene wildcard queries

开发者 https://www.devze.com 2022-12-22 14:17 出处:网络
I have this question relating to Lucene. I have a form and I get a text from it and I want to perform a full text search in several fields. Suppose I get from the input the text \"textToLook\".

I have this question relating to Lucene.

I have a form and I get a text from it and I want to perform a full text search in several fields. Suppose I get from the input the text "textToLook".

I have a Lucene Analyzer with several filters. One of them is lowerCaseFilter, so when I create 开发者_C百科the index, words will be lowercased.

Imagine I want to search into two fields field1 and field2 so the lucene query would be something like this (note that 'textToLook' now is 'texttolook'):

field1: texttolook* field2:texttolook*

In my class I have something like this to create the query. I works when there is no wildcard.

String text = "textToLook";
String[] fields = {"field1", "field2"};
//analyser is the same as the one used for indexing
Analyzer analyzer = fullTextEntityManager.getSearchFactory().getAnalyzer("customAnalyzer");
MultiFieldQueryParser parser = new MultiFieldQueryParser(fields, analyzer);
org.apache.lucene.search.Query queryTextoLibre = parser.parse(text);

With this code the query would be:

field1: texttolook field2:texttolook

but If I set text to "textToLook*" I get

field1: textToLook* field2:textToLook*

which won't find correctly as the indexes are in lowercase.

I have read in lucene website this:

" Wildcard, Prefix, and Fuzzy queries are not passed through the Analyzer, which is the component that performs operations such as stemming and lowercasing"

My problem cannot be solved by setting the behaviour case insensitive cause my analyzer has other fields which for examples remove some suffixes of words.

I think I can solve the problem by getting how the text would be after going through the filters of my analyzer, then I could add the "*" and then I could build the Query with MultiFieldQueryParser. So in this example I woud get "textToLower" and after being passed to to these filters I could get "texttolower". After this I could make "textotolower*".

But, is there any way to get the value of my text variable after going through all my analyzer's filters? How can I get all the filters of my analyzer? Is this possible?

Thanks


Can you use QueryParser.setLowercaseExpandedTerms(true)?

http://wiki.apache.org/lucene-java/LuceneFAQ#Are_Wildcard.2C_Prefix.2C_and_Fuzzy_queries_case_sensitive.3F

** EDIT **

Okay, I understand your issue now. You actually want the wildcarded term to be stemmed before it's run through the wildcard query.

You can subclass QueryParser and override

protected Query getWildcardQuery(String field, String termStr) throws ParseException

to run termStr through the analyzer before the WildcardQuery is constructed.

This might not be what the user expects, though. There's a reason why they've decided not to run wildcarded terms through the analyzer, per the faq:

The reason for skipping the Analyzer is that if you were searching for "dogs*" you would not want "dogs" first stemmed to "dog", since that would then match "dog*", which is not the intended query.

0

精彩评论

暂无评论...
验证码 换一张
取 消