Lucene (.NET) Document stucture and performance suggestions_问答_开发者

Lucene (.NET) Document stucture and performance suggestions

开发者 https://www.devze.com 2022-12-30 15:22 出处：网络

I am indexing about 100M documents that consist of a few string identifiers and a hundred or so numaric terms.. I won\'t be doing range queries, so I haven\'t dugg too deep into Numaric Field but I\'m

I am indexing about 100M documents that consist of a few string identifiers and a hundred or so numaric terms.. I won't be doing range queries, so I haven't dugg too deep into Numaric Field but I'm not thinking its the right choose here.

My problem is that the query performance degrades quickly when I start adding OR criteria to my qu开发者_StackOverflow社区ery.. All my queries are on specific numaric terms.. So a document looks like StringField:[someString] and N DataField:[someNumber].. I then query it with something like DataField:((+1 +(2 3)) (+75 +(3 5 52)) (+99 +88 +(102 155 199))).

Currently these queries take about 7 to 16 seconds to run on my laptop.. I would like to make sure thats really the best they can do.. I am open to suggestions on field structure and query structure :-).

Thanks

Josh

PS: I have already read over all the other lucene performance discussions on here, and on the Lucene wiki and at lucid imiagination... I'm a bit further down the rabbit hole then that...

Since you have mentioned that you are doing specific number queries and not range queries, I will not suggest you to take a look at really-fast numeric range queries in Lucene 3.0.

Going by your description, I suppose, scoring is causing the problem. When you have so many nested boolean queries, scoring keeps on getting complex. And scores being floating point numbers, arithmetic is slower. If you don't care about scores, writing custom Collector is a good idea. You can see the example, in javadoc I have linked, to write your own.