I've tried the code below on both Windows (64bit) and Linux(32bit).
I was sure that without BufferedOutputStream the code is bound to throw OutOfMemoryException yet it didn't.
Why is that? Who is doing the {caching / buffer / steaming} to disk there?
Can you please describe, if relevant to the answer, the full flow (Java API -> system call) ?
Does this code uses NIO?
/Me confused.
import java.io.DataOutputStream;
import java.io.FileOutputStream;
import java.io.IOException;
public class WriteHugeFileToDisk {
private static int BYTE = 1;
private static int KILBYTE = BYTE * 1024;
private static int MEGABYTE = KILBYTE * 1024;
private static int GIGABYTE = MEGABYTE * 1024;
private static long TERABYTE = GIGABYTE * 1024L;
public static void main(String[] args) throws IOException {
FileOutputStream fileOutputStream = new FileOutputStream(args[0]);
DataOutputStream dataOutputStream = new DataOutputStream(fileOutputStream);
byte[] buffer = new byte[MEGABYTE];
for(int i = 0; i < buffer.length; i++) {
buffer[i] = (byte)i;
}
for(long l = 0; l < 4000; l++) {
dataOutputStream.write(buffer);
;
}
}
}
I've ran this code with Java 6. Using the following invocations:
Windows:
java WriteHugeFileToDisk %TEMP%\hi.txt
Linux:
java WriteHugeFileToDisk /mnt/hi.info
Please note: The code creates 4GB fil开发者_运维技巧e full of just for the test.
Why would it throw an OutOfMemoryException
? It's just writing to disk. I wouldn't be surprised if FileOutputStream
and DataOutputStream
had some buffering (I haven't checked) but they're certainly not required to buffer everything you write.
This code isn't using NIO directly, although I wouldn't be surprised if some of the internal stuff did. As for what system calls are involved and when - that will be implementation specific, but the important thing is to realise that neither DataOutputStream
nor FileOutputStream
are meant to buffer everything. You write some data to them, and some of that data may get written to disk. If you flush or close the stream, that should make all the data you've written so far get to the disk. If you don't flush or close the stream, I'd expect only a reasonably small amount (again, implementation-specific) to be cached, if any.
Note that BufferedOutputStream
does introduce caching - but only as much as you ask for (or a default). Again, it wouldn't buffer everything unless you asked for as much buffer as you write in terms of data.
Those two instructions consume almost no memory and open a file handle.
FileOutputStream fileOutputStream = new FileOutputStream(args[0]);
DataOutputStream dataOutputStream = new DataOutputStream(fileOutputStream);
Allocate and fill with 1MB of data a byte array which is stored in memory.
byte[] buffer = new byte[MEGABYTE];
for(int i = 0; i < buffer.length; i++) {
buffer[i] = (byte)i;
}
Write to the output file 4000 times this 1MB of data.
for(long l = 0; l < 4000; l++) {
dataOutputStream.write(buffer);
}
Conclusion : 1MB of memory is consumed and 4GB of data written to a file. So unless you have very little memory this cannot throw OutOfMemoryException
.
A buffered stream is a stream wrapper that (quite obviously) buffers data into memory before passing it to the underlying stream. This gives you better performances when used in conjunction with a file stream because there's a lot of overhead involved in reading or writing to a hard drive. Buffering allows you to significantly reduce the number of reads/writes by collapsing otherwise inefficient multiple reads or writes into a single, efficient, bigger one. However, it is not critical to the well-behaving of your application. It just helps you do less accesses to the physical devices.
Java doesn't have more direct access to your computer's devices than other languages. Between your program and the bits on your hard disk, there still are several layers that are entitled to buffer or cache whatever Java desperately tries to get from or to the disk. As far as I know, the OS can (and usually will) cache or buffer stuff, and some hardware will do it too.
Buffering, in the Java meaning of the operation, has nothing to do with the success or failure of reads or writes to devices, or for that matter, to any stream.
Who is doing the {caching / buffer / steaming} to disk there?
Nobody. It is writing directly to the disk. No incremental memory usage whatsoever.
精彩评论