So I've been playing around with raw WSGI, cgi开发者_如何学C.FieldStorage and file uploads. And I just can't understand how it deals with file uploads.
At first it seemed that it just stores the whole file in memory. And I thought hm, that should be easy to test - a big file should clog up the memory!.. And it didn't. Still, when I request the file, it's a string, not an iterator, file object or anything.
I've tried reading the cgi module's source and found some things about temporary files, but it returns a freaking string, not a file(-like) object! So... how does it fscking work?!
Here's the code I've used:
import cgi
from wsgiref.simple_server import make_server
def app(environ,start_response):
start_response('200 OK',[('Content-Type','text/html')])
output = """
<form action="" method="post" enctype="multipart/form-data">
<input type="file" name="failas" />
<input type="submit" value="Varom" />
</form>
"""
fs = cgi.FieldStorage(fp=environ['wsgi.input'],environ=environ)
f = fs.getfirst('failas')
print type(f)
return output
if __name__ == '__main__' :
httpd = make_server('',8000,app)
print 'Serving'
httpd.serve_forever()
Thanks in advance! :)
Inspecting the cgi module description, there is a paragraph discussing how to handle file uploads.
If a field represents an uploaded file, accessing the value via the value attribute or the
getvalue()
method reads the entire file in memory as a string. This may not be what you want. You can test for an uploaded file by testing either the filename attribute or the file attribute. You can then read the data at leisure from the file attribute:
fileitem = form["userfile"]
if fileitem.file:
# It's an uploaded file; count lines
linecount = 0
while 1:
line = fileitem.file.readline()
if not line: break
linecount = linecount + 1
Regarding your example, getfirst()
is just a version of getvalue()
.
try replacing
f = fs.getfirst('failas')
with
f = fs['failas'].file
This will return a file-like object that is readable "at leisure".
The best way is to NOT to read file (or even each line at a time as gimel suggested).
You can use some inheritance and extend a class from FieldStorage and then override make_file function. make_file is called when FieldStorage is of type file.
For your reference, default make_file looks like this:
def make_file(self, binary=None):
"""Overridable: return a readable & writable file.
The file will be used as follows:
- data is written to it
- seek(0)
- data is read from it
The 'binary' argument is unused -- the file is always opened
in binary mode.
This version opens a temporary file for reading and writing,
and immediately deletes (unlinks) it. The trick (on Unix!) is
that the file can still be used, but it can't be opened by
another process, and it will automatically be deleted when it
is closed or when the current process terminates.
If you want a more permanent file, you derive a class which
overrides this method. If you want a visible temporary file
that is nevertheless automatically deleted when the script
terminates, try defining a __del__ method in a derived class
which unlinks the temporary files you have created.
"""
import tempfile
return tempfile.TemporaryFile("w+b")
rather then creating temporaryfile, permanently create file wherever you want.
Using an answer by @hasanatkazmi (utilized in a Twisted app) I got something like:
#!/usr/bin/env python2
# -*- coding: utf-8 -*-
# -*- indent: 4 spc -*-
import sys
import cgi
import tempfile
class PredictableStorage(cgi.FieldStorage):
def __init__(self, *args, **kwargs):
self.path = kwargs.pop('path', None)
cgi.FieldStorage.__init__(self, *args, **kwargs)
def make_file(self, binary=None):
if not self.path:
file = tempfile.NamedTemporaryFile("w+b", delete=False)
self.path = file.name
return file
return open(self.path, 'w+b')
Be warned, that the file is not always created by the cgi module. According to these cgi.py
lines it will only be created if the content exceeds 1000 bytes:
if self.__file.tell() + len(line) > 1000:
self.file = self.make_file('')
So, you have to check if the file was actually created with a query to a custom class' path
field like so:
if file_field.path:
# Using an already created file...
else:
# Creating a temporary named file to store the content.
import tempfile
with tempfile.NamedTemporaryFile("w+b", delete=False) as f:
f.write(file_field.value)
# You can save the 'f.name' field for later usage.
If the Content-Length
is also set for the field, which seems rarely, the file should also be created by cgi.
That's it. This way you can store the file predictably, decreasing the memory usage footprint of your app.
精彩评论