开发者

Need help collapsing a list and obtaining totals in perl

开发者 https://www.devze.com 2023-01-26 14:09 出处:网络
Hi I have a large list of data: http://paste-it.net/public/y17027d/ It is 67859 rows by 10 columns.The 6th column contains values that represent Z-Scores from 1 to 6 in .01 increments.What I would lik

Hi I have a large list of data: http://paste-it.net/public/y17027d/ It is 67859 rows by 10 columns. The 6th column contains values that represent Z-Scores from 1 to 6 in .01 increments. What I would like to do is to total all of the other column values that have the same Z-score value, but my current code is not working.

What I have now prints out values but the totals for each Z-Score are incorrect.

Here is my code:

#! /usr/bin/perl

use strict;
use warnings;
use POSIX;
use Data::Dumper;



my $input = $ARGV[0];
open (DATAFILE, $input) or die $!;
open(OUT,">>"."final.output.txt");

my($line,$fMeasure,$filename,$recall,$precision,$z_score,$computer_calls,$johns_calls,$false_negatives,$false_positives,$true_positives,$count);
$fMeasure=$filename=$recall=$precision=$z_score=$computer_calls=$johns_calls=$false_negatives=$false_positives=$true_positives=$count = 0;




my %stats=();
my %zscore=();
while($line = <DATAFILE>){ 
     # Chop off new line character, skip the comments and empty lines.                 
     chomp($line); 
     my @temp = split(/\t/, $line);
     $true_positives = $temp[0];
     $false_positives = $temp[1];
     $false_negatives = $temp[2];
     $johns_calls = $temp[3];
     $computer_calls = $temp[4];
     $z_score = $temp[5];
     $fMeasure = $temp[6];
     $precision = $temp[7];
     $recall =  $temp[8];
     $filename = $temp[9];
     $stats{$z_score}{$filename}[0] = $true_positives;
     $stats{$z_score}{$filename}[1] = $false_positives;
     $stats{$z_score}{$filename}[2] = $johns_calls;
     $stats{$z_score}{$filename}[3] = $computer_calls;
     $stats{$z_score}{$filename}[4] = $fMeasure;
     $stats{$z_score}{$filename}[5] = $precision;
     $stats{$z_score}{$filename}[6] = $recall;
     $stats{$z_score}{$filename}[6] = $filename;
     $zscore{$z_score}++;

}


my $false_positives_new = 0;
my $true_positives_new = 0;
my $johns_calls_new = 0; 
my $computer_calls_new = 0;
my $file_name = 0;


foreach $z_score ( sort keys %stats ) {
foreach $filename( keys %{$stats{$z_score}} ){
    my $tp = $stats{$z_score}{$filename}开发者_Go百科[0];
    my $fp = $stats{$z_score}{$filename}[1];
    my $jc = $stats{$z_score}{$filename}[2];
    my $cc = $stats{$z_score}{$filename}[3];
    my $fn = $stats{$z_score}{$filename}[6];
    #print "$z_score\t$jc\n";
    $false_positives_new = $false_positives + $fp;
    $true_positives_new = $true_positives + $tp;
    $johns_calls_new = $johns_calls + $jc; 
    $computer_calls_new = $computer_calls + $cc;

    #print OUT "$fn\n";
}

print OUT"$true_positives_new\t$false_positives_new\t$johns_calls_new\t$computer_calls_new\t$z_score  \n";
$false_positives_new = 0;
$true_positives_new = 0;
$johns_calls_new = 0;
$computer_calls_new = 0;
$file_name = 0;

}



close(OUT);
close (DATAFILE);

I know that I must be doing something wrong but I am not able to figure out what. Any help would be greatly appreciated. Thank you


OK. I was able to get the data from pastebin and I think the following code does what you want.

#! /usr/bin/perl

use strict; use warnings;
use Data::Dumper;

my ($input) = @ARGV;
open my $DATAFILE, '<', $input
    or die "Cannot open '$input': $!";

my @field_names = qw(
    fMeasure
    recall
    precision
    z_score
    computer_calls
    johns_calls
    false_negatives
    false_positives
    true_positives
    count
);

my @track_fields = qw(
    false_positives
    false_negatives
    johns_calls
    computer_calls
);

my (%stats, %by_zscore);

while ( my $line = <$DATAFILE> ) {
    last unless $line =~ /\S/;
    chomp $line;
    my @temp = split /\t/, $line;
    my $filename = pop @temp;

    my %fields;
    @fields{ @field_names } = @temp;

    my $z_score = $fields{z_score};

    $stats{ $z_score }{$filename} = \@temp;

    for my $f ( @track_fields ) {
        $by_zscore{$z_score}{ $f } += $fields{ $f };
    }
}

print Dumper \%by_zscore;


I think you want to say

$false_positives_new = $false_positives_new + $fp;
# etc.

instead of

$false_positives_new = $false_positives + $fp;
0

精彩评论

暂无评论...
验证码 换一张
取 消