I tried searching through on stackoverflow as well as googling around a lot, but am not able to find answers to my problem (I guess I'm searching for the wrong keywords / terms).
We are in the process of building a recommendation engine, and while we are initially logging all user activity in custom logs (we use ruby / rails), we need to do an EOD scanning of that file and arrange according to the user. We also have some other user data coming in from some other places (his fb activity, twitter timeline, etc), and hence by EOD we want all data for a particular user to be saved somewhere and then run our analyzer code on all of the user's data to generate the recommendations.
The problem is that we are generating a lot of data, and while for the time being we are using a mysql table to store all this data, we are not sure till how much time can we continue to do this, as our user-base grows (we are still testing it out internally with about 10 users with a lot of activity). Plus, as eager developers we would like to try out something new that can suffice our needs.
Any pointers in this directi开发者_StackOverflowon will be very helpful.
Check out Amazon Elastic Map Reduce. It was built for this very type of thing.
精彩评论