When we start a server application, we always need to speicify the port number it listens to. But how is this "listening mechanism" implemented under the hood?
My current imagination is like this:
The operating system associate the port number with some buffer. The server application's responsibiliy is to monitor this buffer. If there's no data in this buffer, the server application's listen operation will just block the application.
When some开发者_JS百科 data arrives from the wire, the operating system will know that and then check the data and see if it is targeted at this port number. And then it will fill the corresponding buffer. And then OS will notify the blocked server application and the server application will get the data and continue to run.
Question is:
If the above scenario is correct, how could the opearting system know there's data arriving from wire? It cannot be a busy polling. Is it some kind of interrupt-based mechanism?
If there's too much data arriving and the buffer is not big enough, will there be data loss?
Is the "listen to a port" operation really a blocking operation?
Many thanks.
While the other answers seem to explain things correctly, let me give a more direct answer: your imagination is wrong.
There is no buffer that the application monitors. Instead, the application calls listen() at some point, and the OS remembers from then on that this application is interested in new connections to that port number. Only one application can indicate interest in a certain port at any time.
The listen operation does not block. Instead, it returns right away. What may block is accept()
. The system has a backlog of incoming connections (buffering the data that have been received), and returns one of the connections every time accept is called. accept doesn't transmit any data; the application must then do recv() calls on the accepted socket.
As to your questions:
as others have said: hardware interrupts. The NIC takes the datagram completely off the wire, interrupts, and is assigned an address in memory to copy it to.
for TCP, there will be no data loss, as there will always be sufficient memory during the communication. TCP has flow control, and the sender will stop sending before the receiver has no more memory. For UDP and new TCP connections, there can be data loss; the sender will typically get an error indication (as the system reserves memory to accept just one more datagram).
see above: listen itself is not blocking; accept is.
- Your description is basically correct except for the blocking part. OSes normally use interrupts to handle I/O events like arriving network packets, so there is no need to block.
- Yes, if too many connection attempts happen at the same time, some will get bounced. The number of connections to queue is specified when you call
listen
or its equivalent. - No, it is not. The OS raises an event on your control socket when a connection arrives. You may choose to block while waiting for this event, or you may use some nonblocking (
select
,poll/epoll
) or asynchronous (overlapped I/O, completion ports) mechanism.
What happens when we say "listen to a port"?
The typical TCP server sequence of calls is
socket() -> bind()-> listen() -> accept() -> read()/write() -> close()
A socket created by socket
function is assumed to be an active socket (that will issue a connect()
). listen()
function converts unconnected socket to passive socket. This means that kernel should start accepting incoming connection requests. The second argument to listen()
function specifies the total queue length for a given listening socket of 2 queues -
(1) complete connection queue - 3 way handshake completed for connection
(2) incomplete connection queue - SYN received from client waiting for completion of 3 way TCP handshake
Finally accept()
is called by TCP server to return the next completed connection from the front of completed connection queue. If accept() is successful it returns a new socket descriptor that refers to the TCP connection between client and server.
Now to answer your question * The networking stack in the operating system kernel, reads each incoming IP packet, classifies the packet according to it's TCP/IP header fields. The arrival of IP packet on wire is serviced as an interrupt by an Ethernet driver and from there onwards kernel mode TCP/IP stack takes over
With respect to data if you mean the SYN packet, Posix.1g has an option to either ignore the new incoming SYN or send a RST to the client when the connection queue is full. Data that arrives after 3 way handshake completes, but before server calls
accept
should be queued by server TCP up to the size of connected socket's receive buffer.listen()
operation is a blocking call and returns after the connection state is said to passive to allow incoming TCP client connections.
Refer to Wikipedia for more details on TCP protocol -handshake, sequencing and acknowledgments for reliable transmission.
This book gives a very good details on TCP/IP Unix network programming and can provide more insight on this topic.
If the above scenario is correct, how could the operating system know there's data arriving from wire? It cannot be a busy pooling. Is it some kind of interrupt-based mechanism?
Hardware tells the OS by sending an interrupt, a hardware interrupt causes an event handler to run.
If there's too much data arriving and the buffer is not big enough, will there be data loss?
Yep, but TCP uses a windowing mechanism. The OS tells the other end how much buffers it has, it can do this dynamically. So it may start with "I have 4k of buffers". After 2k has arrived the other end can send 2k more but we can acknowledge the first 2k. If the other end sends to much to quickly our OS will discard it. It will also tell it to slowdown and re-acknowledge what it has already got. When buffers are free we can tell the other end to continue. It will resend what we have not acknowledged. The OS does all this for us when using TCP, but not for UDP.
Is the "listen to a port" operation really a blocking operation?
Yes and No. It will not return until it is done, but there is not much to do: Listen does next to nothing, just a note to the OS. "If someone tries to connect to that port it is me that will handle it". It is accept that waits for the connection. And accept that can block (as well as read/write/...).
The OS need not allocate any buffer this early. Listen wrote some meta data into an OS table. A connection comes in uses the next connection handling buffer. Later data comes in and uses a data buffer, data buffer need not be allocated per connection. Lots of pending data on one connection could cause the available buffers on other connections to be reduced. Your OS may have policies and mechanisms to make things fair.
精彩评论