I wonder what kind of seed selection methods I can apply to K-means 开发者_如何学编程algorithm. Google search wasn't that helpful. Any suggestions?
The seeds depend on the domain. For example, if your data items are words, your seeds should be the most frequent words. Otherwise, you could cluster a small sample and use that as a seed.
Here is an example of a more sophisticated algorithm:
Single Pass Seed Selection Algorithm for k-Means. K. Karteeka Pavan, Allam Appa Rao, A.V. Dattatreya Rao and G.R. Sridhar. Journal of Computer Science 6 (1): 60-66, 2010. pdf
Google for "supervised" k means clustering & k++ means.... also specify your performance needs ( whats your k? how many input points?)
In general, a few thousand points can easily be clustered w a naive k means algorithm implementation... So I would try that first.
Also, if your not sure what K should be, try MCL clustering first to get a good estimate.
精彩评论