i am writing an experimental asynchronous web server. i am wondering about the standard / 'best' way to decode HTTP requests in python?
basical开发者_JAVA技巧ly what reading from the socket gives me is a bytes representation of the incoming request raw data; how can i turn these into standard datatypes like dictionaries, lists of values, and so on? is there a good general tutorial how to do this and what to be on the watchout for (especially regarding encodings and browser specifics)?
This worked for me:
import StringIO, httplib
ucode_data = unicode( your_raw_data ,"utf-8")
str = StringIO.StringIO( ucode_data )
http_header = httplib.HTTPMessage(str,0)
http_header.readheaders()
print http_header.__dict__
but it does not decode the request (eg, GET /index.html HTTP/1.2) - it will decode the rest for you though
See
20.10.4. HTTPMessage Objects
An http.client.HTTPMessage instance holds the headers from an HTTP response. It is implemented using the email.message.Message class.
http://docs.python.org/py3k/library/http.client.html#httpmessage-objects
You should be able to use the HTTPMessage
as a standalone class without invoking urllib (or whatever Python 3 equivalent).
Don't deal with sockets; abstract! Try httplib2. It's a complete HTTP library for Python 2 and 3, and it is very intuitive, although you have to download and install it. Read its usage example for a quick introduction.
Dive Into Python 3 includes a very good chapter on installing and using httplib2, and why it's better than other alternatives, including the standard library; I recommend you read that.
精彩评论