I have a client and server using boost::asio
asynchronously. I want to add some timeouts to close the connection and potentially retry if something goes wrong.
My initial thought was that any time I call an async_
function I should also start a deadline_timer
to expire after I expect the async operation to complete. Now I'm wondering if that is strictly necessary in every case.
For example:
async_resolve
presumably uses the system's resolver which has timeouts built into it (e.g.RES_TIMEOUT
inresolv.h
possibly overridden by configuration in/etc/resolv.conf
). By adding my own timer, I may conflict with how the user wants his resolver to work.For
async_connect
, theconnect(2)
sy开发者_StackOverflow中文版scall has some sort of timeout built into itetc.
So which (if any) async_
calls are guaranteed to call their handlers within a "reasonable" time frame? And if an operation [can|does] timeout would the handler be passed the basic_errors::timed_out
error or something else?
So I did some testing. Based on my results, it's clear that they depend on the underlying OS implementation. For reference, I tested this with a stock Fedora kernel: 2.6.35.10-74.fc14.x86_64
.
The bottom line is that async_resolve()
looks to be the only case where you might be able to get away without setting a deadline_timer
. It's practically required in every other case for reasonable behavior.
async_resolve()
A call to async_resolve()
resulted in 4 queries 5 seconds apart. The handler was called 20 seconds after the request with the error boost::asio::error::host_not_found
.
My resolver defaults to a timeout of 5 seconds with 2 attempts (resolv.h
), so it appears to send twice the number of queries configured. The behavior is modifiable by setting options timeout
and options attempts
in /etc/resolv.conf
. In every case the number of queries sent was double whatever attempts
was set to and the handler was called with the host_not_found
error afterwards.
For the test, the single configured nameserver was black-hole routed.
async_connect()
Calling async_connect()
with a black-hole-routed destination resulted in the handler being called with the error boost::asio::error::timed_out
after ~189 seconds.
The stack sent the initial SYN and 5 retries. The first retry was sent after 3 seconds, with the retry timeout doubling each time (3+6+12+24+48+96=189). The number of retries can be changed:
% sysctl net.ipv4.tcp_syn_retries
net.ipv4.tcp_syn_retries = 5
The default of 5 is chosen to comply with RFC 1122 (4.2.3.5):
[The retransmission timers] for a SYN segment MUST be set large enough to provide retransmission of the segment for at least 3 minutes. The application can close the connection (i.e., give up on the open attempt) sooner, of course.
3 minutes = 180 seconds, though the RFC doesn't appear to specify an upper bound. There's nothing stopping an implementation from retrying forever.
async_write()
As long as the socket's send buffer wasn't full, this handler was always called right away.
My test established a TCP connection and set a timer to call async_write()
a minute later. During the minute where the connection was established but prior to the async_write()
call, I tried all sorts of mayhem:
- Setting a downstream router to black-hole subsequent traffic to the destination.
- Clearing the session in a downstream firewall so it would reply with spoofed RSTs from the destination.
- Unplugging my Ethernet
- Running
/etc/init.d/network stop
No matter what I did, the next async_write()
would immediately call its handler to report success.
In the case where the firewall spoofed the RST, the connection was closed immediately, but I had no way of knowing that until I attempted the next operation (which would immediately report boost::asio::error::connection_reset
). In the other cases, the connection would remain open and not report errors to me until it eventually timed out 17-18 minutes later.
The worst case for async_write()
is if the host is retransmitting and the send buffer is full. If the buffer is full, async_write()
won't call its handler until the retransmissions time out. Linux defaults to 15 retransmissions:
% sysctl net.ipv4.tcp_retries2
net.ipv4.tcp_retries2 = 15
The time between the retransmissions increases after each (and is based on many factors such as the estimated round-trip time of the specific connection) but is clamped at 2 minutes. So with the default 15 retransmissions and worst-case 2-minute timeout, the upper bound is 30 minutes for the async_write()
handler to be called. When it is called, error is set to boost::asio::error::timed_out
.
async_read()
This should never call its handler as long as the connection is established and no data is received. I haven't had time to test it.
Those two calls MAY have timeouts that get propigated up to your handlers, but you might be supprised at the length of time it takes before either of those times out. (I know I have let a connection just sit and try to connect on a single connect call for over 10 minutes with boost::asio
before killing the process). Also the async_read
and async_write
calls do not have timeouts associated with them, so if you wish to have timeouts on your reads and writes, you will still need a deadline_timer
.
精彩评论