I know there's a better way to do this, but I don't know what it is. I'm sorting through a list of files, and I would like to remove 'the usual suspects' so I can compare one list to another.
From what I understand, name.replace() look at each and every item in the listToClean for the phrases I picked, and replace them if present. There has to be a better way to do this...
def cl开发者_C百科eanLists(listToClean, extList):
cleanFileList = []
for filename in listToClean:
name = os.path.split(filename)[1]
ext = os.path.splitext(name)
if ext[1] in extList:
name = name.replace(ext[1], '')
name = name.replace('1080p', '')
name = name.replace('1080P', '')
name = name.replace('720p', '')
name = name.replace('720P', '')
name = name.replace('HD', '')
name = name.replace('(', ' ')
name = name.replace(')', '')
name = name.replace('.', ' ')
cleanFileList.append(name)
cleanFileList.sort(key=lambda x: x.lower())
return cleanFileList
bad_names = ['1080p', '720p'] # and so on
for bad_name in bad_names:
name = name.replace(bad_name, '')
Obviously, your declaration of words to clean from each name would happen at the top of the function, not for each iteration over the list of file names.
# do this once
import re
bad_strings = ['1080p', '720p'] # etc
regex = '|'.join(re.escape(x) for x in bad_strings)
subber = re.compile(regex, re.IGNORECASE).sub
# do this once for each name
name = name.replace(ext[1], '')
# OR maybe better: name = ext[0] # see below
cleanFileList.append(subber('', name))
Consider where 'csv' is in your list of extensions and you have a file named 'summary_of_csv_files.csv' ...
精彩评论