I wish to shuffle the lines (the rows) of a file at random then print out to different five files.
But I keep having exactly the same order of lines appeared in file1 to file5. T开发者_Python百科he random generation process does not work properly. I would be grateful for any advices.
#!/bin/bash
for i in seq 1 5
do
awk 'BEGIN{srand();} {print rand()"\t"$0}' shuffling.txt | sort -k2 -k1 -n | cut -f2- > file$i.txt
done
Input shuffling.txt
111 1032192
111 2323476
111 1698881
111 2451712
111 2013780
111 888105
112 2331004
112 1886376
112 1189765
112 1877267
112 1772972
112 574631
If you don't provide a seed to srand
, it will either use the current date and time or a fixed starting seed (this may vary with the implementation). That means, for the former, if your processes run fast enough, they'll all use the same seed and generate the same sequence.
And, for the latter, it won't matter how long you wait, you'll get the same sequence each time you run.
You can get around either of these by using a different seed, provided by the shell.
awk -v seed=$RANDOM 'BEGIN{srand(seed);}{print rand()" "$0}' ...
The number provided by $RANDOM
changes in each iteration so each run of the awk
program gets a different seed.
You can see this in action in the following transcript:
pax> for i in $(seq 1 5) ; do
...> awk 'BEGIN{srand();print rand()}'
...> done
0.0435039
0.0435039
0.0435039
0.0435039
0.0435039
pax> for i in $(seq 1 5) ; do
...> awk -v seed=$RANDOM 'BEGIN{srand(seed);print rand()}'
...> done
0.283898
0.0895895
0.841535
0.249817
0.398753
Awk's pseudo-random is not very random, you need to keep seeding, you should be able to use microseconds for most situations, otherwise you may want to look into Bash ${RANDOM}
or hitting /dev/urandom
direct:
awk 'BEGIN{"date +%N"|getline rseed;srand(rseed);close("date +%N");print rand()}'
for((i=1;i<=5;i++));do awk 'BEGIN{"date +%N"|getline rseed;srand(rseed);close("date +%N");print rand()}';done
#!/bin/bash
for i in {1..5}
do
shuf -o "file$i.txt" shuffling.txt
done
精彩评论