mahout collaborative-filtering input binary dataset_问答_开发者

mahout collaborative-filtering input binary dataset

开发者 https://www.devze.com 2023-04-10 06:11 出处：网络

i am new to mahout. I have already used mahout\'s item based algorithm with a loglikelihood similarity measure. I read in past threads that it is better to use loglikelihood similarity when the reco

i am new to mahout.

I have already used mahout's item based algorithm with a loglikelihood similarity measure. I read in past threads that it is better to use loglikelihood similarity when the recommender handles binary values (like or dislike). I also read that mahout uses three values (like, dislike, non exist ). So i get confused a little bit, about the format of the input dataset file.

Does the input file format have to be like this ?

 userId, itemID

where the preference by default is 1?

I would like to know if there is a way to put the dislike info in the dataset.

I would except for example the input dataset file, be something like this :

userid, itemid, binaryPrefere开发者_运维问答nce 1, 15, 1.0

2, 35, 0

1, 25, 1.0 ......

Help me please! Thanx in advance!

I am not sure where you read that, but it's wrong. There is no three-state "boolean" preference in Mahout. You either have ratings in your data, or you don't, in which case you have boolean preferences, which either exist or do not exist. There is no third state.

As strange as it may seem, I'd encourage you try to treating "like" and "dislike" as the same, to start. It might work well.

You can later try incorporating artificial ratings on a scale of -1 to 1 or something to represent like, dislike and shades in between. You could then try other similarity metrics like Euclidean distance to see how it does.

A third possibility is to build two recommenders: one has the "like" associations and the other has a data model with "dislike" associations. You could use the output of the "like" recommender, and filter or modify the results by the results of the 'dislike' recommender. This would require some coding, but isn't hard.

user@mahout.apache.org would be a good place to follow up on this.