开发者

Lucene - How to index a value with special characters

开发者 https://www.devze.com 2022-12-28 10:14 出处:网络
I have a value I am trying to index that looks like this: Test (Test) Using a StandardAnalyzer, I attempted to add it to my document using:

I have a value I am trying to index that looks like this:

Test (Test)

Using a StandardAnalyzer, I attempted to add it to my document using:

Field.Store.YES, Field.Index.TOKENIZED

When I do a search with the value of 'Test (Test)' my QueryParser generates the following tags:

+Name:test +Name:test

This operates as I expect because I am not escaping special characters.

However, if I do QueryParser.Escape('Test (Test)') while indexing my value, it creates the terms:

[test] and [test]

Then when I do 开发者_StackOverflow社区a search like such:

 QueryParser.Escape('Test (Test)')

I get the same two terms (as I expect). The problem is if I have two documents indexed with the names:

Test
Test (Test)

It matches on both. If I specify a search value of 'Test (Test)' then I want to just get the second document. I am curious as to why escaping the special characters does not preserve them in the created terms. Is there an alternate Analyzer I should look at? I looked at WhitespaceAnalyzer and KeywordAnalyzer. WhitespanceAnalyzer is case sensitive and KeywordAnalyzer stores it as a single term of:

[Test (Test)]

Which means that if I do a search for just 'Test' I will not be able to return both documents.

Any ideas on how to implement this? It doesn't seem like it should be that difficult.


If you search for 'Test (Test)' and you want to retrieve documents that contains that exact expression, you must enclose the search expression between "..." so that Lucene knows that you want to do a phrase search.

See the Lucene documentation for details:
http://lucene.apache.org/java/3_0_1/queryparsersyntax.html#Terms

0

精彩评论

暂无评论...
验证码 换一张
取 消