开发者

what's the quickest way to simple-merge files and what's the quickest way to split an array?

开发者 https://www.devze.com 2023-01-19 02:28 出处:网络
what\'s the quickest way to take a list of files and a name of an output file and merge them into a single file while removing duplicate lines?

what's the quickest way to take a list of files and a name of an output file and merge them into a single file while removing duplicate lines? something like

cat file1 file2 file3 | sort -u > out.file

in python.

prefer not to use system calls.

AND:

what's the quickest way to split a list in python into X chunks (list of lists) as equal as possible? (given a li开发者_开发百科st and X.)


First:

lines = set()
for filename in filenames:
    with open(filename) as inF:
        lines.update(inF)
with open(outfile, 'w') as outF:
    outF.write(''.join(lines))

Second:

def chunk(bigList, x):
    chunklen = len(bigList) / x
    for i in xrange(0, len(bigList), chunklen):
        yield bigList[i:i+chunklen]

listOfLists = list(chunk(bigList, x))


For the first:

lines = []
for filename in filenames:
    f = open(filename)
    lines.extend(f.read().split('\n')
    f.close()
lines = list(set(lines)) #remove duplicates
f = open(outfile_name, 'w')
f.write(''.join(lines))

assuming that the files are a reasonable length as all the data from the files will be stored in memory simultaneously. If you want to preserve the side effect of sort ordering the lines, then just add lines.sort() before the file is written.

And the second:

step_size = len(orig_list)/num_chunks
split_list = [orig_list[i:i+step_size] for i in range(0, len(orig_list), step_size)]
0

精彩评论

暂无评论...
验证码 换一张
取 消