Can the fuzzy c-means applied on non numerical data sets ? i.e categorical or mixed numerical and categorical.. if yes (I hope 开发者_开发技巧so :( ):
- how we calculate cluster centers ?
If NO , what is the alternative .. how to fuzzy clusters these data ?
I need the response please help
NOTE: I've used the Jacard's coefficient to calculate the distance between 2 points but still didn't get the way to calculate the cluster centers see the attachements
You'll have to transform your data into a numeric form. There are various ways of doing that, two of them being:
- use vectors of feature counts (common in, e.g., text categorization)
- use a one-hot representation, where a categorical feature that can take on n distinct values is represented as string of n bits, with only the i'th bit set if a feature has the i'th value in its allowed range.
Both are very common transformations that many machine learning programs do under the hood. Also, you might want to experiment with a different metric than the Euclidean one. Esp. with one-hot representation, but depending on the data, the L1 norm (Manhattan/city block distance) may be more appropriate.
Apart from that, just apply the given formulas to your transformed dataset.
精彩评论