开发者

Receiving HTTP headers

开发者 https://www.devze.com 2023-03-18 23:37 出处:网络
In educational purposes I\'m writing a HTTP server in C++. When receiving a request, how do I know when the client has finished sending headers? Is there an obligation that all headers must be sent in

In educational purposes I'm writing a HTTP server in C++. When receiving a request, how do I know when the client has finished sending headers? Is there an obligation that all headers must be sent in one shot? What if a client sends G, then after 5 seconds E, then T..? Should I wait a timeout and just close the connection if it takes too long? Should I start parsing as soon as I get the first bytes to know if the request is invalid?

I know there are a lot of libraries for this, I'm just reinventing the wheel to better un开发者_运维技巧derstand how the Web works at different layers. And I can't find how they deal with exactly my question.


According to the HTTP 1.1 RFC (4.1):

    generic-message = start-line
                      *(message-header CRLF)
                      CRLF
                      [ message-body ]
    start-line      = Request-Line | Status-Line

There is an extra CRLF after the message header. So once you encounter the sequence CRLF -> CRLF, the body starts.

Concering timeout: You could start parsing once receiving characters (wait for CRLF so you know a header was completed) and once the request takes longer than 5 seconds or so, send back a 408 Request Timeout.


There are two parts to this answer.

Firstly, the issue of delay and time-out: you should deal with timeouts indeed, as it's generally not possibly to detect whether a TCP connection is broken. There is more on this topic in this question: TCP socket in Unix - notify server I am done sending

Secondly, the format of an HTTP request is defined (in RFC 2616, section 5) as follows:

    Request       = Request-Line              ; Section 5.1
                    *(( general-header        ; Section 4.5
                     | request-header         ; Section 5.3
                     | entity-header ) CRLF)  ; Section 7.1
                    CRLF
                    [ message-body ]          ; Section 4.3

Essentially, you get the request line (for example GET /index.html HTTP/1.1), followed by multiple header lines (without empty lines). Then, the list of headers ends with an empty line. All ends of lines are represented with CRLF ("\r\n").

In addition to this, some requests also have a body (typically those using POST or PUT). If the request has a message body, its length will be given either by the Content-Length header or using delimiters via chunked transfer encoding.


The HTTP headers are separated from the body by \r\n\r\n, i.e. a double newline. That's the only thing you can rely upon.


I suggest you to read the HTTP protocol. Specifically, headers are bounded by double newline.

0

精彩评论

暂无评论...
验证码 换一张
取 消