How to glob for iterable element_问答_开发者_运维开发者技术经验分享

I have a python dictionary that contains iterables, some of which are lists, but most of which are other dictionaries. I'd like to do glob-style assignment similar to the following:

myiter['*']['*.txt'开发者_开发问答]['name'] = 'Woot'

That is, for each element in myiter, look up all elements with keys ending in '.txt' and then set their 'name' item to 'Woot'.

I've thought about sub-classing dict and using the fnmatch module. But, it's unclear to me what the best way of accomplishing this is.

The best way, I think, would be not to do it -- '*' is a perfectly valid key in a dict, so myiter['*'] has a perfectly well defined meaning and usefulness, and subverting that can definitely cause problems. How to "glob" over keys which are not strings, including the exclusively integer "keys" (indices) in elements which are lists and not mappings, is also quite a design problem.

If you nevertheless must do it, I would recommend taking full control by subclassing the abstract base class collections.MutableMapping, and implement the needed methods (__len__, __iter__, __getitem__, __setitem__, __delitem__, and, for better performance, also override others such as __contains__, which the ABC does implement on the base of the others, but slowly) in terms of a contained dict. Subclassing dict instead, as per other suggestions, would require you to override a huge number of methods to avoid inconsistent behavior between the use of "keys containing wildcards" in the methods you do override, and in those you don't.

Whether you subclass collections.MutableMapping, or dict, to make your Globbable class, you have to make a core design decision: what does yourthing[somekey] return when yourthing is a Globbable?

Presumably it has to return a different type when somekey is a string containing wildcards, versus anything else. In the latter case, one would imagine, just what is actually at that entry; but in the former, it can't just return another Globbable -- otherwise, what would yourthing[somekey] = 'bah' do in the general case? For your single "slick syntax" example, you want it to set a somekey entry in each of the items of yourthing (a HUGE semantic break with the behavior of every other mapping in the universe;-) -- but then, how would you ever set an entry in yourthing itself?!

Let's see if the Zen of Python has anything to say about this "slick syntax" for which you yearn...:

>>> import this
    ...
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.

Consider for a moment the alternative of losing the "slick syntax" (and all the huge semantic headaches it necessarily implies) in favor of clarity and simplicity (using Python 2.7-and-better syntax here, just for the dict comprehension -- use an explicit dict(...) call instead if you're stuck with 2.6 or earlier), e.g.:

def match(s, pat):
    try: return fnmatch.fnmatch(s, pat)
    except TypeError: return False

def sel(ds, pat):
    return [d[k] for d in ds for k in d if match(k, pat)]

def set(ds, k, v):
    for d in ds: d[k] = v

so your assignment might become

set(sel(sel([myiter], '*')), '*.txt'), 'name', 'Woot')

(the selection with '*' being redundant if all , I'm just omitting it). Is this so horrible as to be worth the morass of issues I've mentioned above in order to use instead

myiter['*']['*.txt']['name'] = 'Woot'

...? By far the clearest and best-performing way, of course, remains the even-simpler

def match(k, v, pat):
    try:
      if fnmatch.fnmatch(k, pat):
        return isinstance(v, dict)
    except TypeError:
        return False

for k, v in myiter.items():
  if match(k, v, '*'):
    for sk, sv in v.items():
      if match(sk, sv, '*.txt'):
        sv['name'] = 'Woot'

but if you absolutely crave conciseness and compactness, despising the Zen of Python's koan "Sparse is better than dense", you can at least obtain them without the various nightmares I mentioned as needed to achieve your ideal "syntax sugar".

The best way is to subclass dict and use the fnmatch module.

subclass dict: adding functionality you want in an object-oriented way.
fnmatch module: reuse of existing functionality.

You could use fnmatch for functionality to match on dictionary keys although you would have to compromise syntax slightly, especially if you wanted to do this on a nested dictionary. Perhaps a custom dictionary-like class with a search method to return wildcard matches would work well.

Here is a VERY BASIC example that comes with a warning that this is NOT RECURSIVE and will not handle nested dictionaries:

from fnmatch import fnmatch

class GlobDict(dict):
    def glob(self, match):
        """@match should be a glob style pattern match (e.g. '*.txt')"""
        return dict([(k,v) for k,v  in self.items() if fnmatch(k, match)])

# Start with a basic dict
basic_dict = {'file1.jpg':'image', 'file2.txt':'text', 'file3.mpg':'movie',
              'file4.txt':'text'}

# Create a GlobDict from it
glob_dict = GlobDict( **basic_dict )

# Then get glob-styl results!
globbed_results = glob_dict.glob('*.txt')
# => {'file4.txt': 'text', 'file2.txt': 'text'}

As for what way is the best? The best way is the one that works. Don't try to optimize a solution before it's even created!

Following the principle of least magic, perhaps just define a recursive function, rather than subclassing dict:

import fnmatch

def set_dict_with_pat(it,key_patterns,value):
    if len(key_patterns)>1:
        for key in it:
            if fnmatch.fnmatch(key,key_patterns[0]):
                set_dict_with_pat(it[key],key_patterns[1:],value)
    else:
        for key in it:
            if fnmatch.fnmatch(key,key_patterns[0]):
                it[key]=value

Which could be used like this:

myiter=({'dir1':{'a.txt':{'name':'Roger'},'b.notxt':{'name':'Carl'}},'dir2':{'b.txt':{'name':'Sally'}}})
set_dict_with_pat(myiter,['*','*.txt','name'],'Woot')
print(myiter)
# {'dir2': {'b.txt': {'name': 'Woot'}}, 'dir1': {'b.notxt': {'name': 'Carl'}, 'a.txt': {'name': 'Woot'}}}