How does List::Util 'shuffle' actually work?_问答_开发者

How does List::Util 'shuffle' actually work?

开发者 https://www.devze.com 2023-02-14 21:02 出处：网络

I am currently working on building a classifier using c5.0. I have a dataset of 8000 entries and each entry has its own i.d number (1-8000). When testing the performance of the classifier I had to make 5sets of 10:90 (training data: test data) splits. Of course any training cases cannot appear again in the test cases, and duplicates cannot occur in either set.

To solve the problem of picking examples at random for the training data, and making sure the same cannot be picked for the test data I have developed a horribly slow method;

fill a file with numbers fr开发者_运维百科om 1-8000 on separate lines.
randomly pick a line number (from a range of 1-8000) and use the contents of the line as the id number of the training example.
write all unpicked numbers to a new file
decrement the range of the random number generator by 1
redo

Then all unpicked numbers are used as test data. It works but its slow. To speed things up I could use List::Util 'shuffle' to just 'randomly' shuffle and array of these numbers. But how random is 'shuffle'? It is essential that the same level of accuracy is maintained. Sorry about the essay, but does anyone know how 'shuffle' actually works. Any help at all would be great

Here is the shuffle algorithm used in List::Util::PP

sub shuffle (@) {
  my @a=\(@_);
  my $n;
  my $i=@_;
  map {
    $n = rand($i--);
    (${$a[$n]}, $a[$n] = $a[$i])[0];
  } @_;
}

Which looks like a Fisher-Yates shuffle.

How does List::Util 'shuffle' actually work?

精彩评论

关注公众号

热门标签

图文推荐

How does List::Util 'shuffle' actually work?

更多 问答 相关资讯：

精彩评论

关注公众号

热门标签

图文推荐

更多问答相关资讯：