I've been using Lucene to great effect to provide a solution where my users can query a lot of records (100 million+) very quickly. Users have a large form with a lot of different fields they can choose from. They also have an "advanced search" option where they can construct their own queries which support nested logic with AND, OR and NOT operators.
I use MSSQL as my main data store and then I index the data in Lucene. A Lucene query returns me a list of IDs that I then query directly from the MSSQL database, thus avoiding complicated (slow) query plans that would be the result of trying the equivalent query directly against the database. With a bit of planning and design, Lucene has shown itself to be highly capable of performing very fast queries where the query has a significant amount of complexity e.g. ((A AND B) OR (B AND C AND D)) OR (A[X TO Y] AND K) OR (Q,W,E,R,T,Y,U,I,O)
. You get the picture.
The problem I have run into is a relational one. When a record has related attributes K
, each of which have their own attributes J
, and a user tries to perform a search specifyin开发者_如何转开发g multiple conditions of J against a single K and more than one of those conditions is numerical in nature, suddenly the need for a relational store becomes apparent as there isn't really an effective way to tokenize the relationship between one numerical attribute and another.
Obviously there are some great solutions out there for storing huge amounts of data and still being fast to query at a basic level. What I want to know is if you have any recommendations as to which of these solutions is also capable of performing very fast lookups when the query often has a certain level of complexity as described earlier.
As best I can tell, there's no really good unified solution for this. My solution is:
- MongoDB for big data storage and fast key-based lookups
- Lucene for super fast, complex queries
In my index I store document IDs that I then retrieve from the database as needed.
精彩评论