we have a hardrive with hundreds of thousands of files
i need to figure out how many of every file extension we have
how can i do this with python?
i need it to go through every directory. this lawyers at my company need this. it can be a total for the entire hardrive it does not have to be broken down by directory
e开发者_C百科xample:
1232 JPEG
11 exe
45 bat
2342 avi
532 doc
Have a look at os.walk
call in the os module and traverse through the entire directory tree. Get the extension using os.path.splitext
. Maintain a dictionary where key the extension.lower() and increment the count of each extension that you encounter.
import os
import collections
extensions = collections.defaultdict(int)
for path, dirs, files in os.walk('/'):
for filename in files:
extensions[os.path.splitext(filename)[1].lower()] += 1
for key,value in extensions.items():
print 'Extension: ', key, ' ', value, ' items'
import os
from os.path import splitext
extensions = {}
for root, dir, files in os.walk('/'):
for file in files:
ext = splitext(file)[1]
try:
extensions[ext] += 1
except KeyError:
extensions[ext] = 1
You'd probably be better served by using a DefaultDict
, you can use that if you like.
You can then print the values like so:
for extension, count in extensions.items():
print 'Extension %s has %d files' % (extension, count)
Use os.walk()
to go through the files, and os.path.splitext()
to get just the extensions. You may want to lower()
the extensions too, because at least in my $HOME I have a bunch of .jpg and a bunch of .JPG.
import os, os.path, collections
extensionCount = collections.defaultdict(int)
for root, dirs, files in os.walk('.'):
for file in files:
base, ext = os.path.splitext(file)
extensionCount[ext.lower()] += 1
#Now print them out, largest to smallest.
for ext, count in sorted(extensionCount.items(), key=lambda x: x[1], reverse=True):
print ext, count
The pattern is simple.
counter = 0
for root, dirs, files in os.walk(YourPath):
for file in files:
if file.endswith(EXTENSION):
counter += 1
You can create an array with the list of EXTENSION and add them. Another faster way would be to create a dictionary growing little by little. The extension is then a key for adding values. {jpeg: 1232, exe: 11}
Update: With many of the solutions we propose is that we assume that the string is a correct representation of the filetype. But I'm not sure there is any other way of doing it. The iteration should be done only once indeed as the comment says below. So it is better to grow the dictionary little by little
The working script will be very simple and I recommend you use os.walk() function. What it does is generates file names across directory tree( http://docs.python.org/library/os.html).
精彩评论