I'm working on a micro-forum of sorts, whereby a quick (close to tweet-size) topic message is posted by a special user, which subscribers can respond to with like-sized messages of their own. Straightforward, no 'digging' or voting of any sort, just a chronological flow of responses for each topic message. But with high traffic expected.
We would like to flag topic messages according to the response buzz they atract, using a scale of 0 to 10.
Been googling for trend algorithms and open source community application examples for a while, and so far have gleaned two interesting references, which I don't fully grok yet:
Understanding algorithms for measuring trends, a discussion on comparing wikipedia pageviews using the Baseline Trend Algorithm, here on SO.
The Britney Spears Problem, an in-depth article on how to rank search terms, while processing large streams of data.
From the first I understand the need to check the slope in activity, and to balance the weight between two items that differ greatly in scale of activity. But how do I compare many items, growing in number quickly across time? And then, how do I break the items within "buzz grades" from 0 to 10?
The second reference is fascinating, but over my head at this point. From a first pass I've understood the need to keep memory usage stable while keeping counters and storing references to items if necessary. But I haven't figured a fitting algorithm for my specific use case from it, yet.
It's worth noting that I come from a non-computer-science and definitely non-statistics bac开发者_如何学JAVAkground. Please bear with me :) Any help and code samples (specially in Ruby) would be greatly appreciated.
Intuition says that a solution to this problem doesn't need a lot of statistics, by ranking the topics based on some simple measures may already provide you with an interesting selection of "trending topics."
One way is to order the topics by number comments generated in the last hour/day/week... and to select the top ones.
Another way is to count the number of comments for each of the topics and divide this by the "age" of the topic. New topics that immediately generate comments will be considered trending, while older topics with many comments will be less trending as they grow older.
These implementations can easily be created in Ruby/Rails and can even be done in an SQL query, provided that the tables contain publish dates and numbers of comments.
精彩评论