We want to design a data storage with Relational database keeping the request message(http/s,xmpp etc.) logs. For generating logs we use a solution based on Apache synapse esb. However since we want to store the logs and rea开发者_运维问答d the logs only for maintenance issues the read/write ratio will be low. (write count will be intensive since system will receive many messages to be logged. ) We thought of using Cassandra for its distributed nature and clustering capabilities. However with Cassandra database schemas search queries with filter are difficult, always requiring secondary indexes.
To keep it short my question is whether should we try the clustering solutions of mysql or using Cassandra with suitable schema design for search queries with filters?
If you wish to do real time analytics over your semi-structured or unstructured data you can go with Cassandra + Hadoop cluster. Since Cassandra wiki itself suggests Datastax Brisk edition, for such kind of architecture. It is worth giving it a try
On the other side if you wish to do realtime queries over raw logs for small set of data. Ex.
select useragent from raw_log_table where id='xxx'
Then you should do a lots of research over you row key and column key design. Because that decides the complexity of the query. Better have a look at the case studies of people here http://www.datastax.com/cassandrausers1
Regards, Tamil
精彩评论