开发者

Does python's glob function support wildcards with variable depth?

开发者 https://www.devze.com 2023-03-24 05:42 出处:网络
I\'m writing a python script that uses this awkward glob syntax. import glob F = glob.glob(\'开发者_如何转开发./www.dmoz.org/Science/Environment/index.html\')

I'm writing a python script that uses this awkward glob syntax.

import glob    
F = glob.glob('开发者_如何转开发./www.dmoz.org/Science/Environment/index.html')
F += glob.glob('./www.dmoz.org/Science/Environment/*/index.html')
F += glob.glob('./www.dmoz.org/Science/Environment/*/*/index.html')
F += glob.glob('./www.dmoz.org/Science/Environment/*/*/*/index.html')
F += glob.glob('./www.dmoz.org/Science/Environment/*/*/*/*/index.html')

Seems like there ought to be a way to wrap this is one line:

F = glob.glob('./www.dmoz.org/Science/Environment/[super_wildcard]/index.html')

But I don't know what the appropriate super wildcard would be. Does such a thing exist?


Sorry - it does not. You will have to probably write few lines of code using os.walk:

for root, dirs, files in os.walk('/starting/path/'):
    for myFile in files:
        if myFile == "index.html":
            print os.path.join(root, myFile)


I don't know if this is new, but glob CAN do this now.

For example,

F = glob.glob('./www.dmoz.org/Science/Environment/**/index.html', recursive=True)


I have just released Formic which implements exactly the wildcard you need - '**' - in an implementation of Apache Ant's FileSet and Globs.

The search can be implemented:

import formic
fileset = formic.FileSet(include="/www.dmoz.org/Science/Environment/**/index.html")
for file_name in fileset.qualified_files():
    # Do something with file_name

This will search from the current directory. I hope this helps.


It's not perfect, but works for me:

for i in range(max_depth):  
    components= ['./www.dmoz.org/Science/Environment',]+(['*']*i)+['index.html']
    fsearch=os.path.join(*components)
    fs_res=glob.glob(fsearch)
    if len(fs_res)==1:
        return fs_res[0]


10 years later ... pathlib solution

from pathlib import Path
F = Path("./www.dmoz.org/Science/Environment/").glob('**/*index.html')

Where [super_wildcard] = **.

0

精彩评论

暂无评论...
验证码 换一张
取 消