I have to write an algorithm for external sort in Java, using only JVM RAM (basically, I cannot map files). So the first part that I want to do is read data from a file in chunks.
I found this tutorial.
The problem is that the tutorial is about reading byte
s, and I have to read int
s. I am not sure how IntBuffer
is implemented, but I think it's a wrapper around a byte buffer. Given that fact, am I right that the fastest thing that I can do is use the "FileChannel with direct ByteBuffer and byte array" method from the tutorial (code below) and then just create separate array with int
s, that I "manually" obtain from bytes using bit operations?
FileInputStream f = new FileInputStream( name );
FileChannel ch = f.getChannel( );
ByteBuffer bb = ByteBuffer.allocateDirect( BIGSIZE );
byte[] barray = new byte[SIZE];
long checkSum = 0L;
int nRead, nGet;
while ( (nRead=ch.read( bb )) != -1 )
{
if ( nRead == 0 )
continue;
bb.position( 0 );
bb.limit( nRead );
while( bb.hasRemaining( ) )
{
nGet = Math.min( bb.remaining( ), SIZE );
bb.get( barray, 0, nGet );
for ( int i=0; i<nGet; i++ )
checkSum += barray[i];
}
bb.clear( );
}
Also, I have a small additional question: I want to read and sort in parallel (I/O wastes a lot of time), should I use an entirely different approach, or is using this method in one thread a开发者_JAVA技巧nd sorting in the other thread good approach? I really want to fight for every nanosecond of performance.
new DataInputStream(new BufferedInputStream(new FileInputStream(file)));
and then use readInt(). This will be as fast as anything you can do with FileChannels short of a mapped file, and they are only about 20% faster than normal I/O.
Direct byte buffers won't help you either here. They are most useful when you don't want to look at or modify the data yourself, you are just copying between channels. It saves the data from crossing the JNI/Java boundary twice, just keeps it inside the JNI layer. Doesn't apply to this case.
If you want to fight for ever nano-second of performance buy faster disks e.g. using SSD or RAID N or both. An SSD drive can transfer data up to 10x faster than a moving disk. This will make far more difference than anything you can do in Java.
精彩评论