开发者

Java Gridgain application starts to fail after 1 day of stress testing

开发者 https://www.devze.com 2022-12-09 17:25 出处:网络
So I have a an application which is running on top of gridgain and does so quite successfully for about 12-24 hours of stress testing before it starts to act funny. After this period of time the appli

So I have a an application which is running on top of gridgain and does so quite successfully for about 12-24 hours of stress testing before it starts to act funny. After this period of time the application will suddenly start replying to all queries with the exception java.nio.channels.ClosedByInterruptException (full stack trace is at http://pastie.org/664717

The met开发者_JAVA技巧hod that is failing from is (edited to use @stephenc feedback)

public static com.vlc.edge.FileChannel createChannel(final File file) {
    FileChannel channel = null;
    try {
    channel = new FileInputStream(file).getChannel();
    channel.position(0);
    final com.vlc.edge.FileChannel fileChannel = new FileChannelImpl(channel);
    channel = null;
    return fileChannel;
    } catch (FileNotFoundException e) {
    throw new VlcRuntimeException("Failed to open file: " + file, e);
    } catch (IOException e) {
    throw new VlcRuntimeException(e);
    } finally {
    if (channel != null) {
        try {
        channel.close();
        } catch (IOException e){
        // noop
        LOGGER.error("There was a problem closing the file: " + file);
        }
    }
    }
}

and the calling function correctly closes the object

private void fillContactBuffer(final File signFile) {
    contactBuffer = ByteBuffer.allocate((int) signFile.length());
    final FileChannel channel = FileUtils.createChannel(signFile);
    try {
        channel.read(contactBuffer);
    } finally {
        channel.close();
    }
    contactBuffer.rewind();
}

The application basically serves as a distributed file parser so it does a lot of these types of operations (will typically open about 10 such channels per query per node). It seems that after a certain period it stops being able to open files and I'm at a loss to explain why this could be happening and would greatly appreciate any one who can tell me what could be causing this and how I could go about tracking it down and fixing it. If it is possibly related to file handle exhaustion, I'd love to hear any tips for finding out for sure... i.e. querying the JVM while it's running or using linux command line tools to find out more information about what handles are currently open.

update: I've been using command line tools to interrogate the output of lsof and haven't been able to see any evidence that file handles are being held open... each node in the grid has a very stable profile of openned files which I can see changing as the above code is executed... but it always returns to a stable number of open files.

Related to this question: Freeing java file handles


There are a couple of scenarios where file handles might not be being closed:

  1. There might be some other code that opens files.
  2. There might be some other bit of code that calls createChannel(...) and doesn't call fillContactBuffer(...)
  3. If channel.position(0) throws an exception, the channel won't be closed. The fix is to rearrange the code so that the following statements are inside the try block.

    channel.position(0);
    return new FileChannelImpl(channel);
    

EDIT: Looking at the stack trace, it seems that the two methods are in different code-bases. I'd point the finger of blame at the createChannel method. It is potentially leaky, even if it is not the source of your problems. It needs an in internal finally clause to make sure that the channel is closed in the event of an exception.

Something like this should do the trick. Note that you need to make sure that the finally block does not closes the channel on success!

public static com.vlc.edge.FileChannel createChannel(final File file) {
    final FileChannel channel = null;
    try {
        channel = new FileInputStream(file).getChannel();
        channel.position(0);
        FileChannel res = new FileChannelImpl(channel);
        channel = null;
        return res;
    } catch (FileNotFoundException e) {
        throw new VlcRuntimeException("Failed to open file: " + file, e);
    } catch (IOException e) {
        throw new VlcRuntimeException(e);
    } finally {
        if (channel != null) {
            try {
                channel.close();
            } catch (...) {
                ... 
            }
        }
    }
}

FOLLOWUP much later

Given that file handle leakage has been eliminated as a possible cause, my next theory would be that the server side is actually interrupting its own threads using Thread.interrupt(). Some low-level I/O calls respond to an interrupt by throwing an exception, and the root exception being thrown here looks like one such exception.

This doesn't explain why this is happening, but at a wild guess I'd say that it was the server-side framework trying to resolve an overload or deadlock problem.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号