开发者

http request message boundaries

开发者 https://www.devze.com 2023-02-15 01:28 出处:网络
I\'m writing a client to upload files via regular http multipart/form-data to megaupload. Now, the point is not megaupload per se, but the behaviour of their webserver.

I'm writing a client to upload files via regular http multipart/form-data to megaupload. Now, the point is not megaupload per se, but the behaviour of their webserver.

Curl could upload without any problem, while my client couldn't, even by sending the exact same request (sniffed with wireshark) -- but it was stuck waiting for the response, and eventually timing out after 30 minutes.

After playing with raw sockets and strace for a while, it turns out the only difference between the two is that curl sends the header block with only one call to sendto(2), and then the rest with other calls to sendto(2). My client, on the other hand, sends every header separately with a write(2).

Now, sendto and write should be equivalent, if send doesn't specify any flag, and it didn't. In fact I made it work with write, but only by sending the header block in a single call. Every other sequence of write calls caused the request to be stuck wa开发者_如何学Goiting.

So the question is: how is this even possible? Tcp doesn't preserve message boundaries, it being a stream protocol.

The only thing I can think of is that every write/send syscall causes a packet to be sent, and that the remote server is sniffing raw packets and lying about being apache.

Ideas? Or am I being a moron, and this is normal behaviour for a compliant http server? It sure is the first webserver to behave that way to me.


The http protocol contains mechanisms so the client/server can determine message boundaries. For uploaded data (POST, PUT) the content-length request header or chunked encoding is required. The content-length lets the server know exactly how many bytes to receive from the socket. Once those bytes have been received it'll then send in the other direction. That's effectively the message boundary here. Chunked-encoding also tell the server how many bytes; just in several pieces.

For the response, the content-length (or chunked encoding) optional. That also tells the client how many bytes to expect; this is required for persistent connections to work. If the content-length can't be determined the server simply closes the socket, then the client knows it has the whole response :)


The question pointing the difference between http and tcp. I think that all http request header should be in one tcp message. Try to get access to an debug error log of an web server

0

精彩评论

暂无评论...
验证码 换一张
取 消