I have a program where i generate a huge matrix and once it is calculated, i have to reuse it at later times. For that reason, i want to cache it to the local hard disk so that i can read it at later times. I am using it simply by writing data to file and then later reading it.
But is there anything special that i should take into consideration for doing such tasks in java. For example, do i need to serialize it or may be do something special. Is there something i should take care for doing such things where i store important application usage data. Should it be plain ASCII开发者_如何学Go/xml or what?
The data is not sensitive, however the integrity of the data is important.If your data is really huge, I'd recommend some binary form - this will make it smaller and faster to read and especially parse (XML or JSON are many times slower than reading/writing binary data). Serialization also brings a lot of overhead, so you might want to check DataInputStream and DataOutputStream. If you know you will be writing only numbers of specific type or you know what sequence the data will be in - these are certainly the fastest ones.
Do not forget to wrap File Streams with Buffered Streams - they will make your operations order of magnitude faster still.
Something like (8192 is example buffer size- you can tailor it to your needs):
final File file = null; // get file somehow
final DataOutputStream dos = new DataOutputStream(
new BufferedOutputStream(new FileOutputStream(file), 8192));
try {
for (int x: ....) { //loop through your matrix (might be different if matrix is sparse)
for (int y: ....) {
if (matrix[x,y] != 0.0) {
dos.writeInt(x);
dos.writeInt(y);
dos.writeDouble(matrix[x,y]);
}
}
}
} finally {
dos.writeInt(-1); // mark end (might be done differently)
dos.close();
}
and input:
final File file = null; // get file somehow
final DataInputStream dis = new DataInputStream(
new BufferedInputStream(new FileInputStream(file), 8192));
try {
int x;
while((x = dis.readInt()) != -1) {
int y = dis.readInt();
double value = dis.readDouble();
// store x,y, value in matrix
}
} finally {
dis.close();
}
as correctly pointed out by Ryan Amos, in case matrix is not sparse, it could be faster to just write values (but all of them):
Out:
dos.write(xSize);
dos.write(ySize);
for (int x=0; x<xSize; x++) {
for (int y=0; y<ySize; y++) {
value = matrix[x,y];
dos.write(value);
}
}
In:
int xSize = dis.readInt();
int ySize = dis.readInt();
for (int x=0; x<xSize; x++) {
for (int y=0; y<ySize; y++) {
double value = dis.readDouble();
matrix[x,y] = value;
}
}
(mind I have not compiled it - so you might need to correct some stuff - it is out of the top of my head).
Without buffers, you will read byte by byte which will make it slow.
One more comment - with such a huge dataset, you should consider using SparseMatrix and write/read only the elements which are non-zero (unless you really have that many of significant elements).
As wrote in the comment above - if you really want to write/read every single element in the matrix of that size, then you are already talking about hours of write rather than seconds.
You have a few options for storing your data. You can try simply stating in a header what the width is and throwing everything into a list with a separator (ex '\n'
,'\t'
,' '
,etc.). Otherwise, you can use the special ObjectOutputStream to store your data. Be wary: this will likely be more inefficient than your solution. However, it will be easier to use.
Other than that, you're free to do as you choose. I usually use a FileWriter and just write all of my data in plaintext. If you're for super-efficiency, FileOutputStream is what you need.
If your entries are numbers then you could just save each row of your matrix as a line in your file separated by some delimiter. You don't need special serialization then. :)
It all depends on how you'll output it later, or if you'll also be storing it in a database or somewhere else as well. If you're never outputting it or storing it anywhere else, then a text file would work.
If there's no need to persist the data (i.e. keep it after the java program is terminated) it would be faster to keep it in-memory in a Java variable. There are a lot of types that should meet your requirements (hashmap, arraylist...). If you need to keep the data to use it in subsequent program executions, you can store it in a file using standard file read/write methods. Plain ASCII would be faster to read/write than XML. Regarding the integrity of the files, it is OS related, because -at the end- that would be a file on your local filesystem.
精彩评论