I am using python sockets to receive web style and soap requests. The code I have is

import socket
svrsocket = socket.socke开发者_Go百科t(socket.AF_INET, socket.SOCK_STREAM)
host = socket.gethostname();
svrsocket.bind((host,8091))
svrsocket.listen(1)
clientSocket, clientAddress = svrsocket.accept()
message = clientSocket.recv(4096)

Some of the soap requests I receive, however, are huge. 650k huge, and this could become several Mb. Instead of the single recv I tried

message = ''
while True:
  data = clientSocket.recv(4096)
  if len(data) == 0:
   break;
  message = message + data

but I never receive a 0 byte data chunk with firefox or safari, although the python socket how to says I should.

What can I do to get round this?

Unfortunately you can't solve this on the TCP level - HTTP defines its own connection management, see RFC 2616. This basically means you need to parse the stream (at least the headers) to figure out when a connection could be closed.

See related questions here - https://stackoverflow.com/search?q=http+connection

Hiya

Firstly I want to reinforce what the previous answer said

Unfortunately you can't solve this on the TCP level

Which is true, you can't. However you can implement an http parser on top of your tcp sockets. And that's what I want to explore here. Let's get started

Problem and Desired Outcome

Right now we are struggling to find the end to a datastream. We expected our stream to end with a fixed ending but now we know that HTTP does not define any message suffix

And yet, we move forward.

There is one question we can now ask, "Can we ever know the length of the message in advance?" and the answer to that is YES! Sometimes...

You see HTTP/1.1 defines a header called Content-Length and as you'd expect it has exactly what we want, the content length; but there is something else in the shadows: Transfer-Encoding: chunked. unless you really want to learn about it, we'll stay away from it for now.

Solution

Here is a solution. You're not gonna know what some of these functions are at first, but if you stick with me, I'll explain. Alright... Take a deep breath.

Assuming conn is a socket connection to the desired HTTP server

...

    rawheaders = recvheaders(conn,end=CRLF)
    headers = dict_headers(io.StringIO(rawheaders))
    l_content = headers['Content-Length']

    #okay. we've got content length by magic

    buffersize = 4096
    while True:
        if l_content <= 0: break

        data = clientSocket.recv(buffersize)
        message += data
        
        l_content -= len(data)

...

As you can see, we enter the loop already knowing the Content-Length as l_content

While we iterate we keep track of the remaining content by subtracting the length of clientSocket.recv(buff) from l_content.

When we've read at least as much data as l_content, we are done

if l_content <= 0: break

Frustration

Note: For some these next bits I'm gonna give psuedo code because the code can be a bit dense

So now you're asking, what is rawheaders = recvheaders(conn),
what is headers = dict_headers(io.StringIO(rawheaders)),
and HOW did we get headers['Content-Length']?!

For starters, recvheaders. The HTTP/1.1 spec doesn't define a message suffix, but it does define something useful: a suffix for the http headers! And that suffix is CRLF aka \r\n.That means we know when we've recieved the headers when we read CRLF. So we can write a function like

def recvheaders(sock):
    rawheaders = ''
    until we read crlf:
        rawheaders = sock.recv()
    return rawheaders

Next, parsing the headers.

def dict_header(ioheaders:io.StringIO):
    """
    parses an http response into the status-line and headers
    """
    #here I expect ioheaders to be io.StringIO
    #the status line is always the first line
    status = ioheaders.readline().strip()
    headers = {}
    for line in ioheaders:
        item = line.strip()
        if not item:
            break
        //headers look like this 
        //'Header-Name' : 'Value'
        item = item.split(':', 1)
        if len(item) == 2:
            key, value = item
            headers[key] = value
    return status, headers

Here we read the status line then we continue to iterate over every remaining line and build [key,value] pairs from Header: Value with

    item = line.strip()
    item = item.split(':', 1)
    # We do split(':',1) to avoid cases like
    # 'Header' : 'foo:bar' -> ['Header','foo','bar']
    # when we want ---------> ['Header','foo:bar']

then we take that list and add it to the headers dict

    #unpacking
    #key = item[0], value = item[1]
    key, value = item
    header[key] = value

BAM, we've created a map of headers

From there headers['Content-Length'] falls right out.

So,

This structure will work as long as you can guarantee that you will always recieve Content-Length If you've made it this far WOW, thanks for taking the time and I hope this helped you out!

TLDR; if you want to know the length of an http message with sockets, write an http parser