开发者

python style question around reading small files

开发者 https://www.devze.com 2022-12-19 09:24 出处:网络
What is the most pythonic way to read in a named file, strip lines that are either empty, contain only spaces, or have # as a first character, and then process remaining lines?Assume it all fits easil

What is the most pythonic way to read in a named file, strip lines that are either empty, contain only spaces, or have # as a first character, and then process remaining lines? Assume it all fits easily in memory.

Note: it's not tough to do this -- what I'm asking is f开发者_如何学JAVAor the most pythonic way. I've been writing a lot of Ruby and Java and have lost my feel.

Here's a strawman:

file_lines = [line.strip() for line in open(config_file, 'r').readlines() if len(line.strip()) > 0]
for line in file_lines:
  if line[0] == '#':
    continue
  # Do whatever with line here.

I'm interested in concision, but not at the cost of becoming hard to read.


Generators are perfect for tasks like this. They are readable, maintain perfect separation of concerns, and efficient in memory-use and time.

def RemoveComments(lines):
    for line in lines:
        if not line.strip().startswith('#'):
            yield line

def RemoveBlankLines(lines):
    for line in lines:
        if line.strip():
            yield line

Now applying these to your file:

filehandle = open('myfile', 'r')
for line in RemoveComments(RemoveBlankLines(filehandle)):
    Process(line)

In this case, it's pretty clear that the two generators can be merged into a single one, but I left them separate to demonstrate their composability.


lines = [r for r in open(thefile) if not r.isspace() and r[0] != '#']

The .isspace() method of strings is by far the best way to test if a string is entirely whitespace -- no need for contortions such as len(r.strip()) == 0 (ech;-).


for line in open("file"):
    sline=line.strip()
    if sline and not sline[0]=="#" :
       print line.strip()

output

$ cat file
one
#
  #

two

three
$ ./python.py
one
two
three


I would use this:

processed = [process(line.strip())
             for line in open(config_file, 'r')
             if line.strip() and not line.strip().startswith('#')]

The only ugliness I see here is all the repeated stripping. Getting rid of it complicates the function a bit:

processed = [process(line)
             for line in (line.strip() for line in open(config_file, 'r'))
             if line and not line.startswith('#')]


This matches the description, ie

strip lines that are either empty, contain only spaces, or have # as a first character, and then process remaining lines

So lines that start or end in spaces are passed through unfettered.

with open("config_file","r") as fp:
    data = (line for line in fp if line.strip() and not line.startswith("#"))
    for item in data:
        print repr(item)


I like Paul Hankin's thinking, but I'd do it differently:

from itertools import ifilter, ifilterfalse, imap

with open(r'c:\temp\testfile.txt', 'rb') as f:
    s1 = ifilterfalse(str.isspace, f)
    s2 = ifilter(lambda x: not x.startswith('#'), s1)
    s3 = imap(str.rstrip, s2)
    print "\n".join(s3)

I'd probably only do it this way instead of using some of the more obvious approaches suggested here if I were concerned about memory usage. And I might define an iscomment function to eliminate the lambda.


The file is small, so performance is not really an issue. I will go for clarity than conciseness:

fp = open('file.txt')
for line in fp:
    line = line.strip()
    if line and not line.startswith('#'):
        # process
fp.close()

If you want, you can wrap this in a function.


Using slightly newer idioms (or with Python 2.5 from __future__ import with) you could do this, which has the advantage of cleaning up safely yet is quite concise.

with file('file.txt') as fp:
    for line in fp:
        line = line.strip()
        if not line or line[0] == '#':
            continue

        # rest of processing here

Note that stripping the line first means the check for "#" will actually reject lines with that as the first non-blank, not merely "as first character". Easy enough to modify if you're strict about that.

0

精彩评论

暂无评论...
验证码 换一张
取 消