开发者

What to use for real-time log aggregation and querying?

开发者 https://www.devze.com 2023-02-26 07:09 出处:网络
I\'m searching for tool/database/solution that can help me with aggregating real-time logs and can query them also in real-time.

I'm searching for tool/database/solution that can help me with aggregating real-time logs and can query them also in real-time.

Basic requirement is ability to deliver results as soon as possible, keeping in mind, that there might be many of events to query (possibly billions), but logs would have many 'columns' and each query would set some conditions on those columns, so final result will be some kind of aggregation, or only small subset of rows would be returned.

Right now I was lookin开发者_运维问答g at HDFS+HBase which seems like a good solution. Are there any alternatives? Can you recommend anything?


You can check Flume: https://github.com/cloudera/flume/wiki .


You can have a look at calamaris. In the commercial world there's Splunk.


If you try to parse/collect logs in real-time, and do something about it then my recomendation is the following:

# tail --follow=name --retry /var/log/logfile.log | sendxmpp -i -u username -p password -j somejabberserver.com sendloglineto@somejabberserver.com

That would send each line in the log as it appears as XMPP message to the jabber user sendloglineto@somejabberserver.com. That jabber user would be one connected via client/software written by you (I prefer perl and Net::Jabber). You can program the client to do whatever you want it to do with each XMPP message (e.g. store in database). If you store it in CouchDB, you can use _changes API to track updates of particular database served by CouchDB.


Eventhough, its old question, I am posting the answer with technical stack which are available now...

  1. Data Ingestion : Apache Flume or Spark streaming or Spring XD or Kafka

  2. Data Storage and processing: HBASE(rawdata in staging table and aggregated data in final tables based on the requirements, based on the ranges of search ,can design rowkeys) + SparkonHbase

  3. Real time search : Hbase with solr indexes

  4. Reporting(optional) : tableu or Banana(open source)

  5. Overall : Lambda architecture


Try Apache Kafka. It should be helpful for your case

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号