开发者

Data Visualization & HBase

开发者 https://www.devze.com 2023-01-25 20:18 出处:网络
Greetings, I have been looking through the questions on this site and I haven\'t found any related questions.

Greetings,

I have been looking through the questions on this site and I haven't found any related questions.

I have currently built a Flex/PHP/MySQL app where I take an extract from my Hadoop cluster and dump to a MySQL table. There are several problems with this as my开发者_如何学Go data set continues to grow.

I am looking for a much more robust open-source solution, and therefore have started to examine HBase and how to leverage PHP or Java to extract my data to a visualization app.

Have any of you built any visualization platforms on top of Hadoop or HBase?

Thank you!


I am not entirely sure whether you are referring to fetching of information from HBase or not. I am assuming that you want to build a aggregational application which does 'sum', 'count', 'avg' etc. data mining like operations on data stored in HBase to generate graphs/visualizations.

In that case specific answer would depend on the nature of data you are trying to analyze. One such application would http://opentsdb.net from StumpleUpon.

Its pretty easy to write data summarizers on HBase, as it can be achieved through MapReduce. http://hbase.apache.org/docs/r0.89.20100726/apidocs/org/apache/hadoop/hbase/mapred/package-summary.html

In our organization we are using Solr to perform queries and aggregation functions for financial reports and then we are storing them in a CMS for rendering. Thus allowing us to customize rendering for same dataset. If you are interested in storing it into a CMS on HBase+Solr the followings will be interesting -

  • http://www.lilyproject.org/lily/index.html
  • http://kenai.com/projects/smart-cms/pages/Home

And if you are looking to access your data just as accessing a persistent storage and interested in an ORM you may the following then relevant else please ignore it. The following is copied from - Java ORM for Hbase Another answer by me.

The strength of HBase as I see it is in keeping dynamic columns into static column families. From my experience developing applications with HBase I find that it is not as easy as SQL to determine cell qualifiers and values.

For example, a book as many authors, depending on your access patterns, author edits, app-layer cache implementation you might want to choose to save whole author in the book table (that is author resides in 2 table, author table and book table) or just the author id. Further more the collection of author can be saved into one cell as XML/JSON or individual cells for individual authors.

With this understanding I concluded writing a full-blown ORM such as Hibernate will not only be very difficult might not actually be conclusive. So I took a different approach, much more like as iBatis is to Hibernate.

  • My mini-framework: http://github.com/smart-it/smart-dao [smart-hbase]
  • Usage: https://github.com/smart-it/smart-cms [content-spi-impl module has the usages]
  • Usage: https://github.com/smart-it/jetty-session-hbase [hbase-impl module has the usages]

Let me try to explain how it works. For this I will use source codes from here and here.

  1. The first and foremost task is to implement a ObjectRowConverter interface, in this case SessionDataObjectConverter. The abstract class encapsulates basic best practices as discussed and learnt from the HBase community. The extension basically gives you 100% control on how to convert your object to HBase row and vice-versa. For this only restriction from the API is that your domain objects must implement the interface PersistentDTO which is used internally to create Put, Delete, do byte[] to id object and vice versa.
  2. Next task is to wire the dependencies as done in HBaseImplModule. Please let me know if you interested I will go through the dependency injections.

And thats it. How they are used are available here. It basically uses CommonReadDao, CommonWriteDao to read and write data to and from HBase. The common read dao implements multithreaded row to object conversion on queries, multithreaded get by ids, get by id and has its Hibernate Criteria like API to query to HBase via Scan (no aggregation functions available). Common write dao implements common write related code with some added facilities, such as optimistic/pessimistic locking, cell override/merge checking entity (non)-existence on save, update, delete etc.

This ORM has been developed for our internal purpose and I have been upto my neck and hence can not yet do some documentation. But if you are interested let me know and I will make time for documentation with priority.


Check out metatron discovery: https://github.com/metatron-app/metatron-discovery. They use Druid and Hive for their OLAP & data store. It's an open source so that you can check their code. It might be helpful.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号