开发者

Error in Python's os.walk?

开发者 https://www.devze.com 2023-01-02 16:58 出处:网络
The os.walk documentation (http://docs.python.org/library/os.html?highlight=os.walk#os.walk), says I can skip traversing unwanted directories by removing them from the dir list. The explicit example f

The os.walk documentation (http://docs.python.org/library/os.html? highlight=os.walk#os.walk), says I can skip traversing unwanted directories by removing them from the dir list. The explicit example from the docs:

import os
from os.path import join, getsize
for root, dirs, files in os.walk('python/Lib/email'):
    print root, "consumes",
    print s开发者_JAVA百科um(getsize(join(root, name)) for name in files),
    print "bytes in", len(files), "non-directory files"
    if 'CVS' in dirs:
        dirs.remove('CVS')  # don't visit CVS directories

I see different behavior (using ActivePython 2.6.2). Namely for the code:

>>> for root,dirs,files in os.walk(baseline):
...     if root.endswith(baseline):
...             for d in dirs:
...                     print "DIR: %s" % d
...                     if not d.startswith("keep_"):
...                             print "Removing %s\\%s" % (root,d)
...                             dirs.remove(d)
...
...     print "ROOT: %s" % root
...

I get the output:

DIR: two
Removing: two
DIR: thr33
Removing: thr33
DIR: keep_me
DIR: keep_me_too
DIR: keep_all_of_us
ROOT: \\mach\dirs
ROOT: \\mach\dirs\ONE
ROOT: \\mach\dirs\ONE\FurtherRubbish
ROOT: \\mach\dirs\ONE\FurtherRubbish\blah
ROOT: \\mach\dirs\ONE\FurtherRubbish\blah\Extracted
ROOT: \\mach\dirs\ONE\FurtherRubbish\blah2\Extracted\Stuff_1
...

WTF? Why wasn't \\mach\dirs\ONE removed? It clearly doesn't start with "keep_".


Because you're modifying the list dirs while iterating over it. ONE was just skipped and never gets looked at. Compare:

>>> a = [1, 2, 3]
>>> for i in a:
    if i > 1:
        a.remove(i)


>>> a
[1, 3]


You aren't removing it from the dirs list. If you were, you'd see your "Removing" print out, wouldn't you?

Change for d in dirs to for d in list(dirs) to safely remove items from the dirs list while iterating over it.

Or you could just write:

dirs[:] = [d for d in dirs if not d.startswith("keep_")]
0

精彩评论

暂无评论...
验证码 换一张
取 消