开发者

Create File Speedily From Individual Column

开发者 https://www.devze.com 2023-02-03 19:20 出处:网络
I have a data that looks like this: -1 1:-0.394668 2:-0.794872 3:-1 4:-0.871341 5:0.9365 6:0.75597 1 1:-0.463641 2:-0.897436 3:-1 4:-0.871341 5:0.44378 6:0.121824

I have a data that looks like this:

-1 1:-0.394668 2:-0.794872 3:-1 4:-0.871341 5:0.9365 6:0.75597
1 1:-0.463641 2:-0.897436 3:-1 4:-0.871341 5:0.44378 6:0.121824
1 1:-0.469432 2开发者_如何学Go:-0.897436 3:-1 4:-0.871341 5:0.32668 6:0.302529
-1 1:-0.241547 2:-0.538462 3:-1 4:-0.871341 5:0.9994 6:0.987166
1 1:-0.757233 2:-0.948718 3:-1 4:-0.871341 5:-0.33904 6:0.915401
1 1:-0.167147 2:-0.589744 3:-1 4:-0.871341 5:0.95078 6:0.991566

The first column is class, and next 6 columns are features. I want to create 6 files for individual features. For example

my_input_feat1.txt will contain

 -1 1:-0.394668
  1 1:-0.463641
   ...
  1 1:-0.757233
  1 1:-0.167147

my_input_feat2.txt will contain

-1 2:-0.794872
...
1 2:-0.589744 

and so on. I have a Perl code that does this but it is horribly slow. Is there a way to do it faster? Typically the input files will contain 100K lines.

use strict;
use Data::Dumper;
use Carp;
my $input = $ARGV[0] || "myinput.txt";




my  $INFILE_file_name = $input;     # input file name

open ( INFILE, '<', $INFILE_file_name )
    or croak "$0 : failed to open input file $INFILE_file_name : $!\n";

    my $out1 = $input."_feat_1.txt";
    my $out2 = $input."_feat_2.txt";
    my $out3 = $input."_feat_3.txt";
    my $out4 = $input."_feat_4.txt";
    my $out5 = $input."_feat_5.txt";
    my $out6 = $input."_feat_6.txt";

    unlink($out1);
    unlink($out2);
    unlink($out3);
    unlink($out4);
    unlink($out5);
    unlink($out6);

    print "$out1\n";

while ( <INFILE> ) {
    chomp;
    my @els = split(/\s+/,$_);
    my $lbl = $els[0];

    my  $OUTFILE1_file_name = $out1;        # output file name
    open ( OUTFILE1, '>>', $OUTFILE1_file_name )
        or croak "$0 : failed to open output file $OUTFILE1_file_name : $!\n";
    print OUTFILE1 "$lbl $els[1]\n";
    close ( OUTFILE1 );         # close output file

    my  $OUTFILE2_file_name = $out2;        # output file name
    open ( OUTFILE2, '>>', $OUTFILE2_file_name )
        or croak "$0 : failed to open output file $OUTFILE2_file_name : $!\n";
    print OUTFILE2 "$lbl $els[2]\n";
    close ( OUTFILE2 );         # close output file

   # Etc.. until OUTFILE 6

}

close (INFILE);


You should move the open/close output files outside the while loop.


#!/usr/bin/sh

for i in `seq 1 $1`; do
    cut -f1,$i $2 > ${2}_$i;
done

or

#!/usr/bin/perl

use warnings; use strict;

my $input_file = $ARGV[0];
my %handles;

while (<>) {
    my ($class, @features) = split /\s+/;

    for my $i (1 .. @features) {
        open $handles{$i}, '>', $input_file . "_$i" or die $!
        unless exists $handles{$i};

        print {$handles{$i}} join( ' ', $class, $features[$i - 1] ), "\n";       
    }
}

while (my (undef, $handle) = each %handles) {
    close $handle or die $!;
}


Is a shell script OK?

awk '{print $1" "$2}' data.txt > feat1_file.txt 
awk '{print $1" "$3}' data.txt > feat2_file.txt 
awk '{print $1" "$4}' data.txt > feat3_file.txt 
awk '{print $1" "$5}' data.txt > feat4_file.txt 
awk '{print $1" "$6}' data.txt > feat5_file.txt 
awk '{print $1" "$7}' data.txt > feat6_file.txt 
0

精彩评论

暂无评论...
验证码 换一张
取 消