I have a huge text file with lines like:
-568.563626 159 33 -1109.660591 -1231.295129 4.381508
-541.181308 159 28 -1019.279615 -1059.115975 4.632301
-535.370812 155 29 -1033.071786 -1152.907805 4.420473
-533.547101 157 28 -1046.218277 -1063.389677 4.423696
What I want is to sort the file, depending on the 5th column, so I would get
-568.563626 159 33 -1109.660591 -1231.295129 4.381508
-535.370812 155 29 -1033.071786 -1152.907805 4.420473
-533.547101 157 28 -1046.218277 -1063.389677 4.423696
开发者_运维问答-541.181308 159 28 -1019.279615 -1059.115975 4.632301
For this I use:
for i in file.txt ; do sort -k5n $i ; done
I wonder if this is the fastest or more efficient way
Thanks
Why use for
? Why not just:
sort -k5n file.txt
And what sort is more efficient depends on a number of issues. You could no doubt make a faster sort for specific data sets (size and other properties)- bubble sort can actually outperform other sorts (with particular inputs).
However, have you tested the standard sort and established that it's too slow? That's the first thing you should do. My machine (which is by no means the gruntiest on the planet) can do 4 million of those lines in under ten seconds:
real 0m9.023s
user 0m8.689s
sys 0m0.332s
Having said that, there is at least one trick which may speed it up. Transform the file into fixed-length records with fixed length fields before applying a sort to it. Sorting on a specific set of characters and fixed length records can often be much faster than the more flexible sorting allowed by variable field and record sizes allowed by sort
.
That way, you add an O(n)
operation (the transformation) to speed up what is probably at best an O(n log n)
operation (the sort).
But, as with all optimisations, measure, don't guess!
if you have many different files to sort, you may use a loop, however, since you have only 1 file, just pass the filename to sort
$ sort -k5n file
精彩评论