What's the easiest way to do this without running out of memory?
I have a 9GB file that has 100 million lines (each is a URL).
How can I split this up into X files? I tried for f in fileinput.input('...')
, but it 开发者_JAVA技巧got "killed" for some reason.
from __future__ import with_statement
YOUR_FILENAME= 'bigfile.log'
SPLIT_NAME= 'bigfile.part%05d.log'
SPLIT_SIZE= 10000 # lines
SPLITTER= lambda t: t[0]//SPLIT_SIZE
import itertools as it
with open(YOUR_FILENAME, "r") as input_file:
for part_no, lines in it.groupby(enumerate(input_file), SPLITTER):
with open(SPLIT_NAME % part_no, "w") as out:
out.writelines(item[1] for item in lines)
Store the correct filename as YOUR_FILENAME
. Decide on how many lines each part will have (SPLIT_SIZE
). Decide on the output name (SPLIT_NAME
). Run it. You are not restricted to plain filenames in YOUR_FILENAME
and SPLIT_NAME
, obviously; you can use paths.
精彩评论