I'm struggling to figure out how exactly to begin using SVD with 开发者_JAVA百科a MovieLens/Netflix type data set for rating predictions. I'd very much appreciate any simple samples in python/java, or basic pseudocode of the process involved. There are a number of papers/posts that summarise the overall concept but I'm not sure how to begin implementing it, even using a number of the suggested libraries.
As far as I understand, I need to convert my initial data set as follows:
Initial data set:
user movie rating
1 43 3
1 57 2
2 219 4
Need to pivot to be:
user 1 2
movie 43 3 0
57 2 0
219 0 4
At this point, do I simply need to inject this Matrix into an SVD algorithm as provided by available libraries, and then (somehow) extract results, or is there more work required on my part?
Some information I've read:
http://www.netflixprize.com/community/viewtopic.php?id=1043
http://sifter.org/~simon/journal/20061211.html http://www.slideshare.net/NYCPredictiveAnalytics/building-a-recommendation-engine-an-example-of-a-product-recommendation-engine http://www.slideshare.net/bmabey/svd-and-the-netflix-dataset-presentation .. and a number of other papersSome libraries:
LingPipe(java) Jama(java) Pyrsvd(python)Any tips at all would be appreciated, especially on a basic data set. Thanks very much, Oli
See SVDRecommender in Apache Mahout. Your question about input format entirely depends on what library or code you're using. There's not one standard. At some level, yes, the code will construct some kind of matrix internally. For Mahout, the input for all recommenders, when supplied as a file, is a CSV file with rows like userID,itemID,rating
.
Data set: http://www.grouplens.org/node/73
SVD: why not just do it in SAGE
if you don't understand how to do SVD? Wolfram alpha or http://www.bluebit.gr/matrix-calculator/ will decompose the matrix for you, or it's on Wikipedia.
精彩评论