So I'm looking to run Sphinx over a NoSQL system such as MongoDB, HBase, Cassandra, etc.
Right now, we're comparing all the NoSQL systems out there. Basically, we need to query 50+ Million rows of product data with fulltext searches thousands of times a second, so we're trying to find the most efficient NoSQL system.
Here is ou开发者_StackOverflow中文版r question, though. If we use any NoSQL system with Sphinx, when we perform the actual searches, will the search have any interaction with the NoSQL system itself, or will Sphinx be doing the work as it has the data indexed? If it's only Sphinx, then wouldn't the performance of the NoSQL system be only secondary?
Thanks!
Using the latest string attribute, you can cut of the database part of the search completely, that will be much more efficient.
As my understanding, I think you can do it. Because I'm only familiar with mongodb and hbase, i can only talk about this question based on the 2 databases. You need to do some work on the indexer and build the data/attributes into the sphinx index file, and to include the primary key(which mark the sole record in the database) into it too(for mongodb, it's object_id, for hbase, it's row key), then after you do the fulltext search, you can get the whole data/attributes from databases by the primary key.
Besides, another full-text search engine supports no-sql db very well, it's solr. you can try it if the performance of it can satisfy your request.
精彩评论