I'm trying to split up a file into smaller pieces of +/- 300 kilobytes a piece. This is quite slow for a file of 300 megabytes (+/- 1000 pieces)
I'm not using any threading yet, I 'm not sure if that would make it run any faster
cs = 1
pieces = 1000
# Open the file
f = open(self.file, 'rb')
result = {}
while cs <= pieces:
#Filename
filename = str(cs).zfill(5) + '.split'
# Generate temporary filename
tfile = filename
# Open the temporary file
w = open(tfile, 'wb')
# Read the first split
tdata = f.read(maxsize)
# Write the data
w.write(tdata)
# Close the file
w.close()
# Get the hash of this chunk
result[filename] = self.__md5(tfile)
cs += 1
This is the md5 function:
def __md5开发者_运维百科(self, f, block_size=2**20):
f = open(f, 'rb')
md5 = hashlib.md5()
while True:
data = f.read(block_size)
if not data:
break
md5.update(data)
return md5.hexdigest()
So is there any way to speed things up?
You're reading the chunk, saving it to a temporary file, then reading the temporary file and computing its md5. That's unnecessary, though - you can compute the md5 while the chunk is still in memory. That means you won't have to open the temp file and read it, which should be faster.
Also I'd recommend a smaller blocksize - maybe 2^11 or 2^12.
精彩评论