开发者

Does asynchronous receive guarantee the detection of connection failure?

开发者 https://www.devze.com 2023-01-28 19:17 出处:网络
From what I know, a blocking receive on a TCP socket does not always detect a connection error (due either to a network failure or to a remote-endpoint failure) by returning a -1 value or raising an I

From what I know, a blocking receive on a TCP socket does not always detect a connection error (due either to a network failure or to a remote-endpoint failure) by returning a -1 value or raising an IO exception: sometimes it could just hang indefinitely.

One way to manage this problem is to set a timeout for the blocking receive. In case an upper bound for the reception time is known, this bound could be set as timeout and the connection could be considered lost simply when the timeout expires; when such an upper bound is not known a priori, for example in a pub-sub system where a connection stays open to receive publications, the timeout to be set would be somewhat arbitrary but its expiration could trigger a ping/pong request to verify that the connection (and the endpoint too) is still up.

I wonder whether the use of asynchronous receive also manages the problem of detecting a connection failure. In boost::asio I would call socket::asynch_read_some() registering an handler to be asynchronously called, while in java.nio I would configure the channel as non-blocking and register it to a selector wit开发者_开发技巧h an OP_READ interest flag. I imagine that a correct connection-failure detection would mean that, in the first case the handler would be called with a non-0 error_code, while in the second case the selector would select the faulty channel but a subsequent read() on the channel would either return -1 or throw an IOException.

Is this behaviour guaranteed with asynchronous receive, or could there be scenarios where after a connection failure, for example, in boost::asio the handler will never be called or in java.nio the selector will never select the channel?

Thank you very much.


I believe you're referring to the TCP half-open connection problem (the RFC 793 meaning of the term). Under this scenario, the receiving OS will never receive indication of the lost connection, so it will never notify the app. Whether the app is readding synchronously or asynchronously doesn't enter into it.

The problem occurs when the transmitting side of the connection somehow is no longer aware of the network connection. This can happen, for example, when

  • the transmitting OS abruptly terminates/restarts (power outage, OS failure/BSOD, etc.).

  • the transmitting side closes its side while there is a network disruption between the two sides and cleans up its side: e.g transmitting OS reboots cleanly during disruption, transmitting Windows OS is unplugged from the network

When this happens, the receiving side may be waiting for data or a FIN that will never come. Unless the receiving side sends a message, there's no way for it to realize the transmitting side is no longer aware of the receiving side.

Your solution (a timeout) is one way to address the issue, but it should include sending a message to the transmitting side. Again, it doesn't matter the read is synchronous or asynchronous, just that it doesn't read and wait indefinitely for data or a FIN. Another solution is using a TCP KEEPALIVE feature that is supported by some TCP stacks. But the hard part of any generalized solution is usually determining a proper timeout, since the timeout is highly dependent on characteristics of the specific application.


Because of how TCP works, you will typically have to send data in order to notice a hard connection failure, to find out that no ACK packet will ever be returned. Some protocols attempt to identify conditions like this by periodically using a keep-alive or ping packet: if one side does not receive such a packet in X time (and perhaps after trying and failing one itself), it can consider the connection dead.

To answer your question, blocking and non-blocking receive should perform identically except for the act of blocking itself, so both will suffer from this same issue. In order to make sure that you can detect a silent failure from the remote host, you'll have to use a form of keep-alive like I described.

0

精彩评论

暂无评论...
验证码 换一张
取 消