I'm using Mahout with the Pearson Correlation algorithm to compare and find similar users based on their preferences for several 开发者_Go百科items. The problem I'm running into is that Mahout and/or Pearson is ignoring users that select the same preference for every item. Does anyone know if there is a way to configure Mahout to NOT ignore people that select the same preference value for every item.
It is not a question of configuration. The Pearson correlation is undefined in this case, so there can be no similarity computed between them using this metric.
Essentially -- Pearson is the ratio of the two preference series' covariance to the product of their standard deviations. But when one or both sequences are identical, the standard deviation is 0, as is the covariance, so the correlation is 0/0.
(This and a few other Pearson gotchas are covered in Chapter 4 of Mahout in Action, and I'm author of this part of the book and code.)
精彩评论