Superfeedr is a feed-parsing on demand service. We want to provide analytics to our users and we're investigating what would be the best strategy to do so.
In a nutshell, we want to track the number of operations (events, like : new entry in a given feed) in our system as well as agregated data (number of subscriber for feed).
Of course, the agregated data can be "computed" based on the the events. (the number of susbcribers to a feed is the sum of subscriptions, minus the sum of unsubscriptions). Yet, since we want to study that over time (number of susbcribers on a daily basis), the evented approach may be sub-optimal, since we would re-compute the same thing over and over.
How would one build such a component in your app? What information flow? What data-stores? What graphing solution? etc...
I know this is 开发者_Go百科quite an open question, but I am sure we're not the first ones with such a need!
[UPDATE]: Infrastructure : We have a set of workers, that are XMPP clients and interact all together. They are based out of EventMachine, which means that they do not block on IO. Desired target : we must be able to collect massive amounts of data. Currently, we are already at about 200-300 msg/sec and we aim at 10x-100x that.
It's tough to say without more information about your infrastructure and desired scaling targets. You may find this slide deck about How Twitter Uses Hadoop to be instructional. It was presented by Kevin Weil at the recent NoSQL East conference.
Borrowing ideas from what Twitter is doing you could consider an architecture split into collection, analysis and render phases.
Collection Phase: Super low latency. Very scalable. Lots of binding choices. Developed at facebook.
Processing Node Log Event -> Scribe -> HDFS
Analysis Phase: SQL-like query language that will allow you to do exploratory ad-hoc queries as well.
HDFS -> Pig -> MySQL
Render Phase: Implemented in your current web framework
MySQL -> JSON -> Memcached -> Flash Charting
There have been some posts here on SO regarding choice of Flash charting components for thew web. I personally have had good success with AmCharts.
- What are the best solutions for flash charts and graphs?
- What are some good toolsets for graphing/charting in a web application?
精彩评论