开发者

Find All Possible Combination of Features (Column) in a Tab Delimited Data

开发者 https://www.devze.com 2023-01-18 21:45 出处:网络
I 开发者_StackOverflowhave a data that looks like this: 1 1:-0.394668 2:-0.794872 3:-1 4:-0.871341 5:0.9365 6:0.75597

I 开发者_StackOverflowhave a data that looks like this:

1 1:-0.394668 2:-0.794872 3:-1 4:-0.871341 5:0.9365 6:0.75597 
1 1:-0.463641 2:-0.897436 3:-1 4:-0.871341 5:0.44378 6:0.121824 
1 1:-0.469432 2:-0.897436 3:-1 4:-0.871341 5:0.32668 6:0.302529 
-1 1:-0.241547 2:-0.538462 3:-1 4:-0.871341 5:0.9994 6:0.987166 
1 1:-0.757233 2:-0.948718 3:-1 4:-0.871341 5:-0.33904 6:0.915401 
1 1:-0.167147 2:-0.589744 3:-1 4:-0.871341 5:0.95078 6:0.991566 

The first column is class, and next 6 columns are features, I am trying to find all possible combination of features (2 features, 3 features, ... 5 features),

E.g.:

feat1 - feat2
feat1 - feat3
...
feat5 - feat6
...
feat1 - feat2 -feat3 -feat4 -feat 5
feat1 - feat2 -feat3 -feat4 -feat 6
..etc..

One of the file feat12.txt contains:

1 1:-0.394668 2:-0.794872
1 1:-0.463641 2:-0.897436
1 1:-0.469432 2:-0.897436
-1 1:-0.241547 2:-0.538462
1 1:-0.757233 2:-0.948718
1 1:-0.167147 2:-0.589744

Is there any existing implementation of that in Perl?


There is, of course, Algorithm::Combinatorics and/or Set::CrossProduct but it is hard to figure out from your problem description which one would be more appropriate.

Maybe you can use something like this as a starting point:

 #!/usr/bin/perl

use strict; use warnings;
use Algorithm::Combinatorics qw( combinations );

while ( my $line = <DATA> ) {
    last unless $line =~ /\S/;
    my $row = [ $line =~  /([1-6]:\S+)/g ];
    for my $i (2 .. 6) {
        my $it = combinations($row, $i);
        while ( my $x = $it->next ) {
            print "@$x\n";
        }
    }
}

__DATA__
1 1:-0.394668 2:-0.794872 3:-1 4:-0.871341 5:0.9365 6:0.75597
1 1:-0.463641 2:-0.897436 3:-1 4:-0.871341 5:0.44378 6:0.121824
1 1:-0.469432 2:-0.897436 3:-1 4:-0.871341 5:0.32668 6:0.302529
-1 1:-0.241547 2:-0.538462 3:-1 4:-0.871341 5:0.9994 6:0.987166
1 1:-0.757233 2:-0.948718 3:-1 4:-0.871341 5:-0.33904 6:0.915401
1 1:-0.167147 2:-0.589744 3:-1 4:-0.871341 5:0.95078 6:0.991566
C:\Temp> c
1:-0.167147 2:-0.589744 3:-1
1:-0.167147 2:-0.589744 4:-0.871341
1:-0.167147 2:-0.589744 5:0.95078
…
2:-0.589744 3:-1 5:0.95078 6:0.991566
2:-0.589744 4:-0.871341 5:0.95078 6:0.991566
3:-1 4:-0.871341 5:0.95078 6:0.991566
1:-0.167147 2:-0.589744 3:-1 4:-0.871341 5:0.95078
1:-0.167147 2:-0.589744 3:-1 4:-0.871341 6:0.991566
1:-0.167147 2:-0.589744 3:-1 5:0.95078 6:0.991566
1:-0.167147 2:-0.589744 4:-0.871341 5:0.95078 6:0.991566
1:-0.167147 3:-1 4:-0.871341 5:0.95078 6:0.991566
2:-0.589744 3:-1 4:-0.871341 5:0.95078 6:0.991566
1:-0.167147 2:-0.589744 3:-1 4:-0.871341 5:0.95078 6:0.991566
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号