Simple Suggestion / Recommendation Algorithm_问答_开发者

Simple Suggestion / Recommendation Algorithm

开发者 https://www.devze.com 2022-12-13 18:17 出处：网络

I am looking for a simple suggestion algorithm to implement in to my Web App. Much like Netflix, Amazon, etc... But simpler. I don\'t need teams of Phd\'s working to get a better suggestion metric.

I am looking for a simple suggestion algorithm to implement in to my Web App. Much like Netflix, Amazon, etc... But simpler. I don't need teams of Phd's working to get a better suggestion metric.

So say I have:

User1 likes Object1.
User2 likes Object1 and Object2.

I want to suggest to User1 they might also开发者_开发技巧 like Object2.

I can obviously come up with something naive. I'm looking for something vetted and easily implemented.

There are many simple and not so simple examples of suggestion algorithms in the excellent Programming Collective Intelligence

The Pearson correlation coefficient (a little dry Wikipedia article) can give pretty good results. Here's an implementation in Python and another in TSQL along with an interesting explanation of the algorithm.

try a Slope One algorithm, it's one of the most used for this kind of problem.

here's a sample implementation in t-sql

I would go with K nearest neighbors. The wikipedia entry explains it well, and has links to reference implementations.

You may wanna look at Association rule learning and Apriori algorithm. The basic idea behind is is that you create rules like "if User like Object1, than User likes Object2" and check how well they describe (your) reality. In your concrete example, this rule would have a Support of 2 (as two Users like Object1) and a confidence of 50% a (as the rule is true in 1 of 2 cases). I've just implemented a basic proofe of concept myself (actually my first steps on Hadoop) and it's not too difficult to do.

Alternatively, you may wanna look at Apache Mahout - Taste. I did't ever use it myself though.

k-nearest neighbor algorithm

I created a suggested articles algorithm that used keywords (as opposed to "product purchases") to determine correlation. It takes a keyword, and runs through all other articles where that keyword occurs and produces results based on which articles have the most matching keywords.

Besides the obvious need for caching such information, is there something wrong with him using a similar method?