recently I came to study clustering in data-mining an开发者_运维技巧d I've studied sequential clustering and hierarchical clustering and k-means.
I also read about a statement that distinguishes k-means from the other two clustering technique,saying k-means is not very good at dealing with nominal attributes,but the text didn't explain this point.So far,the only difference that I can see is that for K-means,we will know in advance we will need exactly K clusters while we don't know how many clusters we need for other two clustering methods.
So could anybody give me some idea here on why such statement exists,i.e.,k-means has this problem when dealing with examples of nominal attributes and is there a way to overcome this?
Thanks in advance.
The k-means algorithm calculates cluster centroids by taking the mean values of all the points in the cluster. If a parameter is nominal then you can't take an mean value.
Sometimes nominal values can be put into a kind of order and then mapped to real values. For example, days of the week could be mapped onto the range [1.0 - 7.0], but then again sometimes that isn't possible, for example an attribute with values [Windows, Linux, OSX].
精彩评论