开发者

python beginner - how to read contents of several files into unique lists?

开发者 https://www.devze.com 2023-04-04 20:35 出处:网络
I\'d like to read the contents from several files into unique lists that I can call later - ultimately, I want to convert these lists to sets and perform intersections and subtraction on them. This mu

I'd like to read the contents from several files into unique lists that I can call later - ultimately, I want to convert these lists to sets and perform intersections and subtraction on them. This must be an incredibly naive question, but after poring over the iterators and loops sections of Lutz's "Learning Python," I can't seem to wrap my head around how to approach this. Here's what I've written:

#!/usr/bin/env python

import sys

OutFileName = 'test.txt'
OutFile = open(OutFileName, 'w')

FileList = sys.argv[1: ]
Len = len(FileList)
print Len

for i in range(Len):
    sys.stderr.write("Processing file %s\n" % (i))
    FileNum = i
    
for InFileName in FileList:
    InFile = open(InFileName, 'r')
    PathwayList = InFile.readlines()
    print PathwayList
    InFile.close()

With a couple of simple test files, I get output like this:

Processing file 0

Processing file 1

['alg1\n', 'alg2\n', 'alg3\n', 'alg4\n', 'alg5\n', 'alg6']

['csr1\n', 'csr2\n', 'csr3\n', 'csr4\n', 'csr5\n', 'csr6\n', 'csr7\n', 'alg2\n', 'alg6']

These lists are correct, but how do I assign each one to a unique variable so that I can call them later (for example, by including the index # from 开发者_JAVA百科range in the variable name)?

Thanks so much for pointing a complete programming beginner in the right direction!


#!/usr/bin/env python

import sys

FileList = sys.argv[1: ]
PathwayList = []
for InFileName in FileList:
    sys.stderr.write("Processing file %s\n" % (i))
    InFile = open(InFileName, 'r')
    PathwayList.append(InFile.readlines())
    InFile.close()

Assuming you read in two files, the following will do a line by line comparison (it won't pick up any extra lines in the longer file, but then they'd not be the same if one had more lines than the other ;)

for i, s in enumerate(zip(PathwayList[0], PathwayList[1]), 1):
    if s[0] == s[1]:
        print i, 'match', s[0]
    else:
        print i, 'non-match', s[0], '!=', s[1]

For what you're wanting to do, you might want to take a look at the difflib module in Python. For sorting, look at Mutable Sequence Types, someListVar.sort() will sort the contents of someListVar in place.


You could do it like that if you don't need to remeber where the contents come from :

PathwayList = []
for InFileName in FileList:
    sys.stderr.write("Processing file %s\n" % InFileName)
    InFile = open(InFileName, 'r')
    PathwayList.append(InFile.readlines())
    InFile.close()  

for contents in PathwayList:
    # do something with contents which is a list of strings
    print contents  

or, if you want to keep track of the files names, you could use a dictionary :

PathwayList = {}
for InFileName in FileList:
    sys.stderr.write("Processing file %s\n" % InFileName)
    InFile = open(InFileName, 'r')
    PathwayList[InFile] = InFile.readlines()
    InFile.close()

for filename, contents in PathwayList.items():
    # do something with contents which is a list of strings
    print filename, contents  


You might want to check out Python's fileinput module, which is a part of the standard library and allows you to process multiple files at once.


Essentially, you have a list of files and you want to change to list of lines of these files...

Several ways:

result = [ list(open(n)) for n in sys.argv[1:] ]

This would get you a result like -> [ ['alg1', 'alg2', 'alg3'], ['csr1', 'csr2'...]] Accessing would be like 'result[0]' which would result in ['alg1', 'alg2', 'alg3']...

Somewhat better might be dictionary:

result = dict( (n, list(open(n))) for n in sys.argv[1:] )

If you want to just concatenate, you would just need to chain it:

import itertools
result = list(itertools.chain.from_iterable(open(n) for n in sys.argv[1:]))
# -> ['alg1', 'alg2', 'alg3', 'csr1', 'csr2'...

Not one-liners for a beginner...however now it would be a good exercies to try to comprehend what's going on :)


You need to dynamically create the variable name for each file 'number' that you're reading. (I'm being deliberately vague on purpose, knowing how to build variables like this is quite valuable and more readily remembered if you discover it yourself)

something like this will give you a start


You need a list which holds your PathwayList lists, that is a list of lists.

One remark: it is quite uncommon to use capitalized variable names. There is no strict rule for that, but by convention most people only use capitalized names for classes.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号