开发者

java need storing values in a multi-dimension array. what is the best method to reserve memory space?

开发者 https://www.devze.com 2022-12-13 16:06 出处:网络
at first sorry for my poor grammar. I want to build a simple hierarchical clustering algorithm in Java, so i need t开发者_如何学运维o build a similarity matrix, whose ij entry gives the similarity be

at first sorry for my poor grammar.

I want to build a simple hierarchical clustering algorithm in Java, so i need t开发者_如何学运维o build a similarity matrix, whose ij entry gives the similarity between i and j clusters.

The first thought is using int[][] for storing this matrix (each cluster has an ID of type integer).

I think that having for example initially 5000 clusters will lead to program memory crash, so any ideas for storing in another way this matrix? Maybe in another data structure?

Thank you


2000 x 2000 isn't a great deal of memory these days so you could just do

int[][] = new int[2000][2000];

If some entries don't have a similarity entry, then perhaps you could exploit the sparseness and save some memory, but unless you have space constraints I don't think it's worth the effort.


25 million ints take roughly 100Mb of memory.

adding -Xmx256m switch when executing java should be enough if yoou're going the int[][] route.

if you don't need the presition of int, go with short to cut memory to 50M.

if most values are 0, you should definitely google for a sparse matrix implementation.

edit: if similarity(i,j) always equals similarity(j,i) you could use this to shave off half as well.


Will it really crash? The matrix will hold 5000x5000 = 25 million int values. I still think, the JVM can handle this. You may need another hashtable to map from cluster to array index value, but that's not too much aswell. Just increase memory, a 32bit JVM can use 2GB RAM, that's plenty enough.

If you really need to calculate the similarity for all clusters, then each cell in the matrix will have a value and I think, there's no better data structure for the result.


Is it going to have a lot zeros? If so, you need store all zeros. There are standards for storing sparse matrices such as Compressed Row Storage format

http://www.cs.utk.edu/~dongarra/etemplates/node373.html

0

精彩评论

暂无评论...
验证码 换一张
取 消