开发者

How can I sort a large sparse matrix and then export the result in matlab?

开发者 https://www.devze.com 2023-01-17 21:34 出处:网络
I have to process a large sparse matrix whose size is 6004*17842 (doc*terms). The function开发者_开发技巧 find() has been tried to get its rows, cols and values and the result has been save in ascii f

I have to process a large sparse matrix whose size is 6004*17842 (doc*terms). The function开发者_开发技巧 find() has been tried to get its rows, cols and values and the result has been save in ascii form. But the terms are not sorted in each document. Could anyone suggest me a way to sort the matrix and export the sorted result please?


It sounds like you have a question about how find returns the non-zero entries in the sparse matrix. For example consider the following Matlab commands

  m = 6004;
  n = 17842;
  A = sprand(m,n,0.000001);
  [i, j, x] = find(A);

Because Matlab stores its sparse matrix in compressed sparse column format, the non-zero entries returned by find are sorted by column. That is, the i, j, and x vectors first contain all the non-zero entries in the first column, then all non-zero entries in the second column, and so on. Since your matrix is a term x document matrix, this means that you see all the terms in the first document, then all the terms in the second document, and so on. Within each column (document) the row (term) entries are sorted. Perhaps you would like to have the non-zero entries sorted by row (term). That is, you want to see all the documents that contain the first term, followed by all the documents that contain the second term, and so on. This is quite easy to do just perform find on the transpose:

  [doc, term, val] = find(A');

To export the sorted entries to a text file you can do something like:

  dlmwrite('doc-term.txt',[doc term val]);


Is there a reason the built in sort won't work?

0

精彩评论

暂无评论...
验证码 换一张
取 消