开发者

Basic Pseudocode for using SVD with Movielens/Netflix type data set

开发者 https://www.devze.com 2023-02-16 09:52 出处:网络
I\'m struggling to figure out how exactly to begin using SVD with 开发者_JAVA百科a MovieLens/Netflix type data set for rating predictions. I\'d very much appreciate any simple samples in python/java,

I'm struggling to figure out how exactly to begin using SVD with 开发者_JAVA百科a MovieLens/Netflix type data set for rating predictions. I'd very much appreciate any simple samples in python/java, or basic pseudocode of the process involved. There are a number of papers/posts that summarise the overall concept but I'm not sure how to begin implementing it, even using a number of the suggested libraries.

As far as I understand, I need to convert my initial data set as follows:

Initial data set:

    user    movie   rating
    1       43      3
    1       57      2
    2       219     4

Need to pivot to be:

user        1   2
movie   43  3   0
        57  2   0
        219 0   4

At this point, do I simply need to inject this Matrix into an SVD algorithm as provided by available libraries, and then (somehow) extract results, or is there more work required on my part?

Some information I've read:

http://www.netflixprize.com/community/viewtopic.php?id=1043

http://sifter.org/~simon/journal/20061211.html

http://www.slideshare.net/NYCPredictiveAnalytics/building-a-recommendation-engine-an-example-of-a-product-recommendation-engine

http://www.slideshare.net/bmabey/svd-and-the-netflix-dataset-presentation

.. and a number of other papers

Some libraries:

LingPipe(java)

Jama(java)

Pyrsvd(python)

Any tips at all would be appreciated, especially on a basic data set. Thanks very much, Oli


See SVDRecommender in Apache Mahout. Your question about input format entirely depends on what library or code you're using. There's not one standard. At some level, yes, the code will construct some kind of matrix internally. For Mahout, the input for all recommenders, when supplied as a file, is a CSV file with rows like userID,itemID,rating.


Data set: http://www.grouplens.org/node/73

SVD: why not just do it in SAGE if you don't understand how to do SVD? Wolfram alpha or http://www.bluebit.gr/matrix-calculator/ will decompose the matrix for you, or it's on Wikipedia.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号