I have an Academic project to do which relates to block lanczos algorith (Montengro's version). I have a problem designing the algorithm 开发者_如何学JAVAfor the implementation of block lanczos, can anyone suggest me what path should I take for the sparse matrices that arise in this algo to multiply. They can be large ranging around 1M X 1M. I have gt 330m cuda enabled gpu with me.
Have you looked at CUSPARSE (included with the CUDA Toolkit) and/or CUSP (open source)?
精彩评论