I'd like to create python command line code that is able to print directory tree with sizes of all subdirectories (from certain directory) and most frequent extensions... I will show the example output.
- root_dir (5 GB, jpg (65 %): avi ( 30 %) : pdf (5 %))
-- aa (3 GB, jpg (100 %) )
-- bb (2 GB, avi (20 %) : pdf (2 %) )
--- bbb (1 GB, ...)
--- bb2 (1 GB, ...)
-- cc (1 GB, pdf (100 %) )
The format is :
nesting level, directory name (size of the directory with all files and subdirectories, most frequent extensions with size percentages in this directory.
I have this code snippet so far. The problem is that it counts only file sizes in directory, so the resulting size is smaller than real size of the directory. Other problem is how to put it all together to print the tree I defined above without redundant computati开发者_如何学编程ons.
Calculating directory sizes really isn't python's strong suit, as explained in this post: very quickly getting total size of folder. If you have access to du
and find
, by all means use that. You can easily display the size of each directory with the following line:
find . -type d -exec du -hs "{}" \;
If you insist in doing this in python, you may prefer post-order traversal over os.walk
, as suggested by PableG. But using os.walk
can be visually cleaner, if efficiency is not the utmost factor for you:
import os, sys
from collections import defaultdict
def walkIt(folder):
for (path, dirs, files) in os.walk(folder):
size = getDirSize(path)
stats = getExtensionStats(files)
# only get the top 3 extensions
print '%s (%s, %s)'%(path, size, stats[:3])
def getExtensionStats(files):
# get all file extensions
extensions = [f.rsplit(os.extsep, 1)[-1]
for f in files if len(f.rsplit(os.extsep, 1)) > 1]
# count the extensions
exCounter = defaultdict(int)
for e in extensions:
exCounter[e] += 1
# convert count to percentage
percentPairs = [(e, 100*ct/len(extensions)) for e, ct in exCounter.items()]
# sort them
percentPairs.sort(key=lambda i: i[1])
return percentPairs
def getDirSize(root):
size = 0
for path, dirs, files in os.walk(root):
for f in files:
size += os.path.getsize( os.path.join( path, f ) )
return size
if __name__ == '__main__':
path = sys.argv[1] if len(sys.argv) > 1 else '.'
walkIt(path)
I personally find os.listdir + a_recursive_function best suited for this task than os.walk:
import os, copy
from os.path import join, getsize, isdir, splitext
frequent_ext = { ".jpg": 0, ".pdf": 0 } # Frequent extensions
def list_dir(base_dir):
dir_sz = 0 # directory size
files = os.listdir(base_dir)
ext_size = copy.copy(frequent_ext)
for file_ in files:
file_ = join(base_dir, file_)
if isdir(file_):
ret = list_dir(file_)
dir_sz += ret[0]
for k, v in frequent_ext.items(): # Add to freq.ext.sizes
ext_size[k] += ret[1][k]
else:
file_sz = getsize(file_)
dir_sz += file_sz
ext = os.path.splitext(file_)[1].lower() # Frequent extension?
if ext in frequent_ext.keys():
ext_size[ext] += file_sz
print base_dir, dir_sz,
for k, v in ext_size.items():
print "%s: %5.2f%%" % (k, float(v) / max(1, dir_sz) * 100.),
print
return (dir_sz, ext_size)
base_dir = "e:/test_dir/"
base_dir = os.path.abspath(base_dir)
list_dir(base_dir)
@Cldy Is right use os.path
for example os.path.walk
will walk depth first through every directory below the argument, and return the files and folders in each directory
Use os.path.getsize
to get the sizes and split to get the extensions. Store extensions in a list or dict and count them after going through each
If your are on Linux, I would suggest looking at du
instead.
That's the module you need. And also this.
精彩评论