I have a file system containing PNG images. The layout of the filesystem is: ZOOM/X/Y.png
where ZOOM, X, and Y are all integers.
I need to change the names of the PNG files. Basically, I need to convert Y from its current value to 2^ZOOM-Y-1. I've written a bash script to accomplish this task. However, I suspect it can be optimized substantially. (I also suspect that I may have been better off writing it in Perl, but that is another story.)
Here is the script. Is this about as good as it gets? Or can the performance be optimized? Are there tools I can use that would profile the script for me and tell me where I'm spending all my execution time?
#!/bin/bash
tiles=`ls -d */*/*`
for oldPath in $tiles
do
oldY=`basename -s .png $oldPath`
zoomX=`dirname $oldPath`
zoom=`echo $zoomX | sed 's#\([^\]\)/.*#\1#'`
newY=`echo 2^$zoom-$oldY-1|bc`
mv ${zoomX}/${oldY}.png ${zoomX}/$开发者_如何转开发{newY}.png
done
for oldpath in */*/*
do
x=$(basename "$oldpath" .png)
zoom_y=$(dirname "$oldpath")
y=$(basename "$zoom_y")
ozoom=$(dirname "$zoom_y")
nzoom=$(echo "2^$zoom-$y-1" | bc)
mv "$oldpath" $nzoom/$y/$x.png
done
This avoids using sed
. I like basename
and dirname
. However, you can also use bash (and Korn) shell notations such as:
y=${zoom_y#*/}
ozoom=${zoom_y%/*}
You might be able to do it all without invoking basename
or dirname
at all.
REWRITE due to misunderstanding of the formula and the updated var names. Still no subprocesses apart from mv
and ls
.
#!/bin/bash
tiles=`ls -d */*/*`
for thisPath in $tiles
do
thisFile=${thisPath#*/*/}
oldY=${thisFile%.png}
zoomX=${thisPath%/*}
zoom=${thisPath%/*/*}
newY=$(((1<<zoom) - oldY - 1))
mv ${zoomX}/${oldY}.png ${zoomX}/${newY}.png
done
It's likely that the overall throughput of your rename is limited by the filesystem. Choosing the right filesystem and tuning it for this sort of operation would speed up the overall job much more than tweaking the script.
If you optimize the script you'll probably see less CPU consumed but the same total duration. Since forking off the various subprocesses (basename
, dirname
, sed
, bc
) are probably more significant than the actual work you are probably right that a perl
implementation would use less CPU because it can do all of those operations internally (including the mv
).
I see 3 improvements I would do, if it was my script. Whether they have an huge impact - I don't think so.
But you should avoid as hell parsing the output of ls
. Maybe this directory is very predictable, from the things found inside, but if I read your script correctly, you can use the globbing with for directly:
for thisPath in */*/*
repeatedly, $(cmd) is better than cmd
with the deprecated backticks, which aren't nestable.
thisDir=$(dirname $thisPath)
arithmetic in bash directly:
newTile=$((2**$zoom-$thisTile-1))
as long as you don't need floating point, or output is getting too big.
I don't get the sed-part:
zoom=`echo $zoomX | sed 's#\([^\]\)/.*#\1#'`
Is there something missing after the backslash? A second one? You're searching for something which isn't a backslash, followed by a slash-something? Maybe it could be done purely in bash too.
one precept of computing credited to Donald Knuth is, "don't optimize too early." Scripts run pretty fast and 'mv' operations(as long as they're not going across filesystems where you're really copying it to another disk and then deleting the file) are pretty fast as well, as all the filesystem has to do in most cases is just rename the file or change its parentage.
Probably where it's spending most of its time is in that intial 'ls' operation. I suspect you have ALOT of files. There isn't much that can be done there. Doing it another language like perl or python is going to face the same hurdle. However you might be able to get more INTELLIGENCE and not limit yourself to 3 levels(//*).
精彩评论