I'm currently developing a machine learning toolkit for GPU clusters. I tested logistic regression classifier on multiple GPUs.
I'm using a Master-Worker approach , where a master CPU creates several POSIX threads and matrices are divided among GPUs.
But the problem I have is how to store large matrices which can't be stored on a single machine. Ar开发者_StackOverflow中文版e there any libraries or approaches to share data among nodes?
I'm not sure how big are your matrices but you should check CUDA 4.0 that was released a couple of weeks ago. One of the main features is shared memory across multiple CUDA devices/GPUs
精彩评论