开发者

Storing objects and relationships between them in HBase

开发者 https://www.devze.com 2023-02-02 18:35 出处:网络
I\'m starting a personal project that involves storing a large database of objects and the relationships 开发者_如何学Cbetween objects. I chose Hadoop and HBase because it will need to be multi node a

I'm starting a personal project that involves storing a large database of objects and the relationships 开发者_如何学Cbetween objects. I chose Hadoop and HBase because it will need to be multi node and much of the data is sparse.

Coming from an RDBMS world I spent a lot of time reading over HBase's column oriented structure and given the current documentation I'm having trouble figuring out how to store objects and relationships between objects.

The objects themselves can have unlimited number of relationships with other objects, and an unlimited number of arbitrary attributes. Relationships can also have attributes. My goal is to have two "Person" objects that are linked by a "Married" relationship, and the Married relationship has an attribute "Date", I would like to (in the future) be able to write a MapReduce to quickly find all persons married between x and y.


There are 2 steps to follow (according to me).

  1. Storing the relationship
  2. Searching for data.

Storing the relationship

  • Option A: Store relation along with the data itself. That is your case the Person table will hold its own marriage relationship. For this every marriage of a person will need a unique id, unique in the person's space only. E.g. Persons A, B and C. A was married to B from 1/1/2000 - 1/1/2002 and A is married to C from 1/1/2003 till today. From A's perspective cell entries would look like - marriage:1:to - B, marriage:1:start - 1/1/2000, marriage:1:end - 1/1/2002, marriage:2:to - C, marriage:2:start - 1/1/2003. This is design is suitable if update is not too often.
  • Option B: Store relation in its own space (table). Suitable if relationship is changing fast.

Searching for data

If the search result can wait for a MapReduce to finish then its fine, but if you need more swift results, I would and actually am using another tool for all sort of searching, e.g. Elastic Search, Apache Solr, Apache Lucene, etc. Range queries are pretty easy in search tools such as Solr and the result will be faster than a MapReduce. Another reason to choose search tools is to get sort order as required.

0

精彩评论

暂无评论...
验证码 换一张
取 消