How to receive HTTP messages using Socket_问答_开发者

I'm using Socket class for my web client. I can't use HttpWebRequest since it doesn't support socks proxies. So I have to parse headers and handle chunked encoding by myself. The most difficult thing for me is to determine length of content so I have to read it byte-by-byte. First I have to use ReadByte() to find last header ("\r\n\r\n" combination), then check whether body has transfer-encoding or not. If it does I have to read chunk's size etc:

public void ParseHeaders(Stream stream)
{
    while (true)
    {
        var lineBuffer = new List<byte>();
        while (true)
        {
            int b = stream.ReadByte();
            if (b == -1) return;
            if (b == 10) break;
            if (b != 13) lineBuffer.Add((byte)b);
        }
        string line = Encoding.ASCII.GetString(lineBuffer.ToArray());
        if (line.Length == 0) break;
        int pos = line.IndexOf(": ");
        if (pos == -1) throw  new VkException("Incorrect header format");
        string key = line.Substring(0, pos);
        string value = line.Substring(pos + 2);
        Headers[key] = value;
    }
}

But this approach has very poor performance. Can you suggest better solution? Maybe some open source examples or libraries that handle http request through sockets (not very big and complicated though, I'm a noob). The best would be to post link to example that reads message body and correctly handles the cases when: content has chunked-encoding, is gzip- or deflate-encoded, Content-Length header is omitted (message ends when connection is closed). Something like source code of HttpWebRequest class.

Upd: My new function looks like this:

int bytesRead = 0;
byte[] buffer = new byte[0x8000];
do
{
    try
    {
        bytesRead = this.socket.Receive(buffer);
        if (bytesRead <= 0) break;
        else
        {
            this.m_responseData.Write(buffer, 0, bytesRead);
            if (this.m_inHeaders == null) this.GetHeaders();
        }
    }
    catch (Exception exception)
    {
        throw new Exception("Read response failed", exception);
    }
}
while ((this.m_inHeaders == null) || !this.isResponseBodyComplete());

开发者_StackOverflow中文版

Where GetHeaders() and isResponseBodyComplete() use m_responseData (MemoryStream) with already received data.

I suggest that you don't implement this yourself - the HTTP 1.1 protocol is sufficiently complex to make this a project of several man-months.

The question is, is there a HTTP request protocol parser for .NET? This question has been asked on SO, and in the answers you'll see several suggestions, including source code for handling HTTP streams.

Converting Raw HTTP Request into HTTPWebRequest Object

EDIT: The rotor code is reasonably complex, and difficult to read/navigate as webpages. But still, the implementaiton effort to add SOCKS supports is much lower than implementing the entire HTTP protocol yourself. You will have something working within a few days at most that you can depend upon, that is based on a tried and tested implementation.

The request and response are read from/written to to a NetworkStream, m_Transport, in the Connection class. This is used in these methods:

internal int Read(byte[] buffer, int offset, int size) 
//and
private static void ReadCallback(IAsyncResult asyncResult)

both in http://www.123aspx.com/Rotor/RotorSrc.aspx?rot=42903

The socket is created in

private void StartConnectionCallback(object state, bool wasSignalled)

So you could modify this method to create a Socket to your socks server, and do the necessary handshake to obtain the external connection. The rest of the code can remain the same.

I gammered this info in about 30 mins looking on the pages on the web. This should go much faster if you load these files into an IDE. It may seem like a burden to have to read through this code - after all, reading code is far harder than writing it, but you are making just small changes to an already established, working system.

To be sure the changes work in all cases, it will be wise to also test when the connection is broken, to ensure that the client reconnects using the same method , and so re-establishes the SOCKS connection and sends the SOCKS request.

If the problem is a bottleneck in terms of ReadByte being too slow, I suggest you wrap your input stream with a StreamBuffer. If the performance issue you claim to have is expensive becuase of small reads, then that will solve the problem for you.

Also, you don't need this:

string line = Encoding.ASCII.GetString(lineBuffer.ToArray());

HTTP by design requires that the header is only made up of ASCII characters. You don't really want to -- or need to -- turn it into actual .NET strings (which are Unicode).

If you wanna find the EOF of the HTTP header, you can do this for good performance.

int k = 0;
while (k != 0x0d0a0d0a) 
{
    var ch = stream.ReadByte();
    k = (k << 8) | ch;
}

When the string \r\n\r\n is encoutered k will equal 0x0d0a0d0a

In most (should be all) http requests, there should be a header called content-length that will tell you how many bytes there are in the body of the request. Then it is simply a matter of allocating the appropriate amount of bytes and reading those bytes all at once.

While I would tend to agree with mdma about trying as hard as possible to avoid implementing your own HTTP stack, one trick you might consider is reading from the stream moderate-sized chunks. If you do a read and you give it a buffer that's larger than what's available, it should return you the number of bytes it did read. That should reduce the number of system calls and speed up your performance significantly. You'll still have to scan the buffers much like you do now, though.

Taking a look at another client's code is helpful (if not confusing): http://src.chromium.org/viewvc/chrome/trunk/src/net/http/

I'm currently doing something like this too. I find the best way to increase the efficiency of the client is to use the asynchronous socket functions provided. They're quite low-level and get rid of busy waiting and dealing with threads yourself. All of these have Begin and End in their method names. But first, I would try it using blocking, just so you get the semantics of HTTP out of the way. Then you can work on efficiency. Remember: Premature optimization is evil- so get it working, then optimize all of the stuff!

Also: Some of your efficiency might be tied up in your use of ToArray(). It's known to be a bit expensive computationally. A better solution might be to store your intermediate results in a byte[] buffer and append them to a StringBuilder with the correct encoding.

For gzipped or deflated data, read in all of the data (keep in mind that you might not get all of the data the first time you ask. Keep track of how much data you have read in, and keep on appending to the same buffer). Then you can decode the data using GZipStream(..., CompressionMode.Decompress).

I would say that doing this is not as difficult as some might imply, you just have to be a bit adventurous!

All the answers here about extending Socket and/or TCPClient seem to miss something really obvious - that HttpWebRequest is also a class and can therefore be extended.

You don't need to write your own HTTP/socket class. You simply need to extend HttpWebRequest with a custom connection method. After connecting all data is standard HTTP and can be handled as normal by the base class.

public class SocksHttpWebRequest : HttpWebRequest

   public static Create( string url, string proxy_url ) {
   ... setup socks connection ...

   // call base HttpWebRequest class Create() with proxy url
   base.Create(proxy_url);
   }

The SOCKS handshake is not particularly complex so if you have a basic understanding of programming sockets it shouldn't take very long to implement the connection. After that HttpWebRequest can do the HTTP heavy lifting.

Why don't you read until 2 newlines and then just grab from the string? Performance might be worse but it still should be reasonable:

Dim Headers As String = GetHeadersFromRawRequest(ResponseBinary)
   If Headers.IndexOf("Content-Encoding: gzip") > 0 Then

     Dim GzSream As New GZipStream(New MemoryStream(ResponseBinary, Headers.Length + (vbNewLine & vbNewLine).Length, ReadByteSize - Headers.Length), CompressionMode.Decompress)
ClearTextHtml = New StreamReader(GzSream).ReadToEnd()
End If                         

 Private Function GetHeadersFromRawRequest(ByVal request() As Byte) As String

        Dim Req As String = Text.Encoding.ASCII.GetString(request)
        Dim ContentPos As Integer = Req.IndexOf(vbNewLine & vbNewLine)

        If ContentPos = -1 Then Return String.Empty

        Return Req.Substring(0, ContentPos)
    End Function

You may want to look at the TcpClient class in System.Net, it's a wrapper for a Socket that simplifies the basic operations.

From there you're going to have to read up on the HTTP protocol. Also be prepared to do some zip operations. Http 1.1 supports GZip of it's content and partial blocks. You're going to have to learn quite a bit to parse them out by hand.

Basic Http 1.0 is simple, the protocol is well documented online, our friendly neighborhood Google can help you with that one.

I would create a SOCKS proxy that can tunnel HTTP and then have it accept the requests from HttpWebRequest and forward them. I think that would be far easier than recreating everything that HttpWebRequest does. You could start with Privoxy, or just roll your own. The protocol is simple and documented here:

http://en.wikipedia.org/wiki/SOCKS

And on the RFC's that they link to.

You mentioned that you have to have many different proxies -- you could set up a local port for each one.