开发者

HBase vs Hyptertable vs Lucene

开发者 https://www.devze.com 2023-02-10 00:06 出处:网络
I am using an search system in lucene.By default it is not distributed, so I am thinking of moving to something like HBase or Hadoop.

I am using an search system in lucene. By default it is not distributed, so I am thinking of moving to something like HBase or Hadoop.

Do solutions like HBase or Hypertable have a built-in search capability or will I开发者_运维技巧 need to implement Lucene on top of them?


Lucene is very different from BigTable clones like HBase or Hypertable. If you are simply looking for a distributed Lucene, then you should look at projects such as Elastic Search or Katta.

Solr/Lucene also has the ability to operate over a cluster, but the partitioning is not automatic. You have to create shards and replicas manually to match the distribution of that data you are looking for. If your underlying data is stored in something like HBase this is much easier to set up, modify, and update.

Fundamentally HBase and Lucene solve different problems. Lucene is an index that allows keyword and other types of searches to return quickly. HBase is a data repository that can serve individual rows in real time; however, HBase does not have a online query capability. For best results, you have to combine them. One example in this area is Lily (http://outerthought.org/site/products/lily.html)


You may also want to look at Lucandra, the Lucene with a Cassandra backend:

https://github.com/tjake/Lucandra


Another technology to look at is Katta or Distributed Lucene which can operate over HDFS


Lucene provides two main features: structured search and full-text search. Hbase doesn't provide any of those, structured search can be done with hbase in a relatively easy way, it's what Lilly does I think. But rebuilding a full text search would be more difficult. To scale you Lucene you can still try to partitioned you index by looking to an attribute that can split your data in separate area (you won't be able to do cross area search). Then you can have one cluster per area.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号