Python: finding files with matching extensions or extensions with matching names in a list_问答_开发者

Suppose I have a list of filenames: [exia.gundam, dynames.gundam, kyrios.gundam, virtue.gundam], or [exia.frame, exia.head, exia.swords, exia.legs, exia.arms, exia.pilot, exia.gn_drive, lockon_stratos.data, tieria_erde.data, ribbons_almark.data, otherstuff.dada].

In one iteration, I'd like to have all the *.gundam or *.data files, whereas on the other I'd like to group the exia.* files. What's the easiest way of doing this, besides iterating through the list and putting each element in a dictionary?

Here's what I had in mind:

def matching_names(files):
    '''
    extracts files with repeated names from a list

    Keyword arguments:
    files - list of filenames

    Returns: Dictionary
    '''

    nameDict = {}
    for file in files:
        filename = file.partition('.')
        if filename[0] not in nameDict:
            nameDict[filename[0]] = []
        nameDict[filename[0]].append(filename[2])

    matchingDict = {}
    for key in nameDict.keys():
        if len(nameDict[key]) > 1:
            matchingDict[key] = nameDict[key] 
    return matchingDict

Well, assuming I have to use that, is there a simple way to invert it and开发者_开发百科 have the file extension as key instead of the name?

In my first version, it looks like I misinterpreted your question. So if I've got this correct, you're trying to process a list of files so that you can easily access all the filenames with a given extension, or all the filenames with a given base ("base" being the part before the period)?

If that's the case, I would recommend this way:

from itertools import groupby

def group_by_name(filenames):
    '''Puts the filenames in the given iterable into a dictionary where
    the key is the first component of the filename and the value is
    a list of the filenames with that component.'''
    keyfunc = lambda f: f.split('.', 1)[0]
    return dict( (k, list(g)) for k,g in groupby(
               sorted(filenames, key=keyfunc), key=keyfunc
           ) )

For instance, given the list

>>> test_data = [
...   exia.frame, exia.head, exia.swords, exia.legs,
...   exia.arms, exia.pilot, exia.gn_drive, lockon_stratos.data,
...   tieria_erde.data, ribbons_almark.data, otherstuff.dada
... ]

that function would produce

>>> group_by_name(test_data)
{'exia': ['exia.arms', 'exia.frame', 'exia.gn_drive', 'exia.head',
          'exia.legs', 'exia.pilot', 'exia.swords'],
 'lockon_stratos': ['lockon_stratos.data'],
 'otherstuff': ['otherstuff.dada'],
 'ribbons_almark': ['ribbons_almark.data'],
 'tieria_erde': ['tieria_erde.data']}

If you wanted to index the filenames by extension instead, a slight modification will do that for you:

def group_by_extension(filenames):
    '''Puts the filenames in the given iterable into a dictionary where
    the key is the last component of the filename and the value is
    a list of the filenames with that extension.'''
    keyfunc = lambda f: f.split('.', 1)[1]
    return dict( (k, list(g)) for k,g in groupby(
               sorted(filenames, key=keyfunc), key=keyfunc
           ) )

The only difference is in the keyfunc = ... line, where I changed the key from 0 to 1. Example:

>>> group_by_extension(test_data)
{'arms': ['exia.arms'],
 'dada': ['otherstuff.dada'],
 'data': ['lockon_stratos.data', 'ribbons_almark.data', 'tieria_erde.data'],
 'frame': ['exia.frame'],
 'gn_drive': ['exia.gn_drive'],
 'head': ['exia.head'],
 'legs': ['exia.legs'],
 'pilot': ['exia.pilot'],
 'swords': ['exia.swords']}

If you want to get both those groupings at the same time, though, I think it'd be better to avoid a list comprehension, because that can only process them one way or another, it can't construct two different dictionaries at once.

from collections import defaultdict
def group_by_both(filenames):
    '''Puts the filenames in the given iterable into two dictionaries,
    where in the first, the key is the first component of the filename,
    and in the second, the key is the last component of the filename.
    The values in each dictionary are lists of the filenames with that
    base or extension.'''
    by_name = defaultdict(list)
    by_ext = defaultdict(list)
    for f in filenames:
        name, ext = f.split('.', 1)
        by_name[name] += [f]
        by_ext[ext] += [f]
    return by_name, by_ext

I'm not sure if I entirely get what you're looking to do, but if I understand it correctly something like this might work:

from collections import defaultdict
files_by_extension = defaultdict(list)

for f in files:
    files_by_extension[ f.split('.')[1] ].append(f)

This is creating a hash keyed by file extension and filling it by iterating through the list in a single pass.

Suppose for example that you want as the result a list of lists of filenames, grouped by either extension or rootname:

import os.path
import itertools as it

def files_grouped_by(filenames, use_extension=True):
    def ky(fn): return os.path.splitext(fn)[use_extension]
    return [list(g) for _, g in it.groupby(sorted(filenames, key=ky), ky)]

Now files_grouped_by(filenames, False) will return the list of lists grouping by rootname, while if the second argument is True or absent the grouping will be by extension.

If you want instead a dict, the keys being either rootnames or extensions, and the values the corresponding lists of filenames, the approach is quite similar:

import os.path
import itertools as it

def dict_files_grouped_by(filenames, use_extension=True):
    def ky(fn): return os.path.splitext(fn)[use_extension]
    return dict((k, list(g)) 
                for k, g in it.groupby(sorted(filenames, key=ky), ky)]