So I am trying to write a script to download a picture file with python and I found this def using google but every picture I get it to download comes out "corrupt". Any ideas...
def download(url):
"""Copy the contents of a file from a given URL
to a local file.
"""
import urllib
webFile = urllib.urlopen(url)
localFile = open(url.split('/')[-1], 'w')开发者_C百科
localFile.write(webFile.read())
webFile.close()
localFile.close()
Edit: the code tag didn't retain the indentions very nicely but I can assure you that they are there, that is not my problem.
You can simply do
urllib.urlretrieve(url, filename)
and save yourself any troubles.
You need to open the local file in binary mode:
localFile = open(url.split('/')[-1], 'wb')
Otherwise the CR/LF characters in the binary stream will be mangled, corrupting the file.
You must include the 'b' flag, if you intend on writing a binary file. Line 7 becomes:
localFile = open(url.split('/')[-1], 'wb')
It is not necessary for the code to work, but in the future you might consider:
- Importing outside of your functions.
- Using os.path.basename, rather than string parsing to get the name component of a path.
- Using the with statement to manage files, rather than having to manually close them. It makes your code cleaner, and it ensures that they are properly closed if your code throws an exception.
I would rewrite your code as:
import urllib
import os.path
def download(url):
"""Copy the contents of a file from a given URL
to a local file in the current directory.
"""
with urllib.urlopen(url) as webFile:
with open(os.path.basename(url), 'wb') as localFile:
localFile.write(webFile.read())
It's coming out corrupt because the function you're using is writing the bytes to the file, as if it was plain text. However, what you need to do is write the bytes to it in binary mode (wb
). Here's an idea of what you should do:
import urllib
def Download(url, filename):
Data = urllib.urlopen(url).read()
File = open(filename, 'wb')
File.Write(Data)
#Neatly close off the file...
File.flush()
File.close()
#Cleanup, for you neat-freaks.
del Data, File
import subprocess
outfile = "foo.txt"
url = "http://some/web/site/foo.txt"
cmd = "curl.exe -f -o %(outfile)s %(url)s" % locals()
subprocess.check_call(cmd)
Shelling out may seem inelegant but when you start encountering issues with more sophisticated sites, but curl has a wealth of logic for handling getting you through the barriers presented by web servers (cookies, authentication, sessions, etc.)
wget is another alternative.
精彩评论