I'm writing a basic html-proxy in python (3), and up to now I'm not using prebuild classes like http.server.
I'm just starting a socket which accepts connection:
self.listen_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
self.listen_socket.bind((socket.gethostname(), 4321))
self.listen_socket.listen(5)
(a, b) = self.listen_socket.accept()
content = a.recv(100000)
Now content stores data like:
b'GET http://www.google.com/firefox HTTP/1.1\r\nHost: www.google.com\r\nUser-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2) Gecko/20100207 Namoroka/3.6\r\nAccept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\r\nAccept-Language: en-us,en;q=0.5\r\nAccept-Encoding: gzip,deflate\r\nAccept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7\r\nKeep-Alive: 115\r\nProxy-Connection: keep-alive\r\nCookie: PREF=ID=1ac935f4d893f655:U=73a4849dc5fc23a4:TM=1266851688:LM=1267023171:S=Log1PmXRMlNjX3Of; NID=32=EnrZjTqILuW2_aMLtgsJ96FdEMF3s5FoMJSVq9GMr9dhLhTAd3F5RcQ3ImyVBiO2eYNKKMhzlGg7r8zXmeSq50EigS5sdKtCL9BMHpgCxZazA2NiyB0bTRWhp8-0BObn\r\n\r\开发者_C百科n'
How can I regexp it? Converting to string does not work for me.
Or, eventually, I need to find out the address which is inquired, like http://www.google.com/firefox
in this case. Is there a parser that I do not know? How can I achieve the result?
Thanks in advance.
You need to include an encoding when converting to a string, for example use:
>>> str(b'GET http://...', 'UTF-8')
'GET http://...'
If you don't use an encoding then as you've discovered you get something a little less helpful:
>>> str(b'GET http://...')
"b'GET http://...'"
Also, you might want to check the *HTTPServer
classes. They provide a wrapper around being HTTP servers and will also parse headers for you.
If you can't, well, at the very least they will provide source code examples on how to do it!
Methods are provided to convert between bytes and strings try str.encode() and bytes.decode()
http://python.about.com/od/python30/ss/30_strings_3.htm
精彩评论