开发者

Am I parsing this HTTP POST request properly?

开发者 https://www.devze.com 2023-01-08 06:56 出处:网络
Let me start off by saying, I\'m using the twisted.web framework. Twisted.web\'s file uploading didn\'t work like I wanted it to (it only included the file data, and not any other information), cgi.pa

Let me start off by saying, I'm using the twisted.web framework. Twisted.web's file uploading didn't work like I wanted it to (it only included the file data, and not any other information), cgi.parse_multipart doesn't work like I want it to (same thing, twisted.web uses this function), cgi.FieldStorage didn't work ('cause I'm getting the POST data through twisted, not a CGI interface -- so far as I can tell, FieldStorage tries to get the request via stdin), and twisted.web2 didn't work for me because the use of Deferred confused and infuriated me (too complicated for what I want).

That being said, I decided to try and just parse the HTTP request myself.

Using Chrome, the HTTP request is formed like this:

------WebKitFormBoundary7fouZ8mEjlCe92pq
Content-Disposition: form-data; name="upload_file_nonce"

11b03b61-9252-11df-a357-00266c608adb
------WebKitFormBoundary7fouZ8mEjlCe92pq
Content-Disposition: form-data; name="file"; filename="login.html"
Content-Type: text/html

<!DOCTYPE html>
<html>
  <head> 

...

------WebKitFormBoundary7fouZ8mEjlCe92pq
Content-Disposition: form-data; name="file"; filename=""


------WebKitFormBoundary7fouZ8mEjlCe92pq--

Is this always how it will be formed? I'm parsing it with regular expressions, like so (pardon the wall of code):

(note, I snipped out most of the code to show only what I thought was relevant (the regular expressions (yeah, nested parentheses), this is an __init__ method (the only method so far) in an Uploads class I built. The full code can be seen in the revision history (I hope I didn't mismatch any parentheses)

if line == "--{0}--".format(boundary):
    finished = True

if in_header == True and not line:
    in_header = False
    if 'type' not in current_file:
        ignore_current_file = True

if in_header == True:
    m = re.match(
        "Content-Disposition: form-data; name=\"(.*?)\"; filename=\"(.*?)\"$", line)
    if m:
        input_name, current_file['filename'] = m.group(1), m.group(2)

    m = re.match("Content-Type: (.*)$", line)
    if m:
        current_file['type'] = m.group(1)

    else:
        if 'data' not in current_file:
            current_file['data'] = line
        else:
            current_file['data'] += line

you can see that I start a new "file" dict whenever a boundary is reached. I set in_header to True to say that I'm parsing headers. When I reach a blank line, I switch it to False -- but not before checking if a Content-Type was set for that form value -- if not, I set ignore_current_file since I'm only looking for file uploads.

I know I should be using a library, but I'm sick to death of reading documentation, trying to get different solutions to work in my project, and still having the code look reasonable. I just want to get past this part -- and if开发者_如何学编程 parsing an HTTP POST with file uploads is this simple, then I shall stick with that.

Note: this code works perfectly for now, I'm just wondering if it will choke on/spit out requests from certain browsers.


My solution to this Problem was parsing the content with cgi.FieldStorage like:

class Root(Resource):

def render_POST(self, request):

    self.headers = request.getAllHeaders()
    # For the parsing part look at [PyMOTW by Doug Hellmann][1]
    img = cgi.FieldStorage(
        fp = request.content,
        headers = self.headers,
        environ = {'REQUEST_METHOD':'POST',
                 'CONTENT_TYPE': self.headers['content-type'],
                 }
    )

    print img["upl_file"].name, img["upl_file"].filename,
    print img["upl_file"].type, img["upl_file"].type
    out = open(img["upl_file"].filename, 'wb')
    out.write(img["upl_file"].value)
    out.close()
    request.redirect('/tests')
    return ''


You're trying to avoid reading documentation, but I think the best advice is to actually read:

  • rfc 2388 Returning Values from Forms: multipart/form-data
  • rfc 1867 Form-based File Upload in HTML

to make sure you don't miss any cases. An easier route might be to use the poster library.


The content-disposition header has no defined order for fields, plus it may contain more fields than just the filename. So your match for filename may fail - there may not even be a filename!

See rfc2183 (edit that's for mail, see rfc1806, rfc2616 and maybe more for http)

Also I would suggest in these kind of regexps to replace every space by \s*, and not to rely on character case.

0

精彩评论

暂无评论...
验证码 换一张
取 消