开发者

Complex search query in lucene (querying fields which are indexed as numeric, analyzed or not-analyzed using a simple analyzer)

开发者 https://www.devze.com 2023-01-14 20:11 出处:网络
Hi I am building a search application using lucene. Some of my queries are complex. For example, My documents contain the fields location and population where location is a not-analyzed field and popu

Hi I am building a search application using lucene. Some of my queries are complex. For example, My documents contain the fields location and population where location is a not-analyzed field and population is a numeric field. Now I need to return all the documents that have location as "san-francisco" and population between 10000 and 20000. If I combine these two fields and build a query like this:

location:san-francisco AND population:[10000 TO 20000], i am not getting the correct result. Any suggestions on why this could be happening and what I can do.

Also while building complex queries some of the fields that I am including are analyzed while others are not analyzed. 开发者_如何学编程For instance the location field is not analyzed and contains terms like chicago, san-francisco and so on. While the summary field is analyzed and it generally contains a descriptive paragraph.

Consider this query:

location:san-francisco AND summary:"great restaurants"

Now if I use a StandardAnalyzer while searching I do not get the correct results when the location field contains a term like san-francisco or los-angeles (i.e it cannot handle the hyphen in between) but if I use a keyword analyzer for the query I do not get correct results either because it cannot search for the phrase "great restaurants" in the summary field.


First, I would recommend tackling this one problem at a time. From my reading of your post, it sounds like you have multiple issues:

  1. You're unsure why a particular query is not returning any results.

  2. You're unsure why some fields are not being analyzed.

  3. You're having problems with the built-in analyzers dealing with hyphens.

That's how your post reads. If that's correct, I would suggest you post each question separately. You'll get better answers if the question is precise. It's overwhelming trying to answer your question in the current format.

Now, let me take a stab in the dark at some of your problems:

For your first problem, if you're getting into really complex queries in Lucene, ask yourself whether it makes sense to be doing these queries here, rather than in a proper database. For a more generic answer, I'd try isolating the problem by removing parts of the query until you get results back. Once you find out what part of the query is causing no results, we can debug that further.

For the second problem, check the document you're adding to Lucene. Lucene provides options to store data but not index it. Make sure you've got the right option specified when adding fields to the document.

For the third problem, if the built-in analyzers don't work out for you, breaking on hyphens, just build your own analyzer. I ran into a similar issue with the '@' symbol, and to solve the problem, I wrote a custom analyzer that dealt with it properly. You could do the same for hyphens.


You should use PerFieldAnalyzerWrapper. As the name suggests, you can use different analyzers for different field. In this case, you can use KeywordAnalyzer for city name and StandardAnalyzer for text.

0

精彩评论

暂无评论...
验证码 换一张
取 消