开发者

Best way to assert that multiple values are defined in Perl (0 and "" are defined)

开发者 https://www.devze.com 2023-03-20 15:39 出处:网络
The program that I am writing needs to do this: read every line of a file if the line contains an ordered pair (x,y) store the ordered pair

The program that I am writing needs to do this:

  • read every line of a file
  • if the line contains an ordered pair (x,y) store the ordered pair
  • before the next ordered pair, there will be a line of the file that starts with "Results"
    • store the ordered pair at the end of that line as a "value" and "error"
  • print out the corresponding x, y, value, error in CSV format
  • read the next (x,y) value and so on, the (x,y) lines and (value,error) lines will alternate in the file

This is not a homework assignment. As you can see, I already have code that works down to 17 lines. I'm wondering if I can accomplish this task with any fewer lines or cleaner code, while maintaining at least the level of readability that this version has, and maintaining Perl style (such as the line break between includes and the first executable line).

The line that I am least thrilled with is

if (defined($x) && defined($y) && defined($val) && defined($err))

Is there a better way to do an an assertion to take 开发者_StackOverflow社区care of the alternating data in the file? If I don't use the defined() function, the program does not function as intended, because some of the x and y coordinates are 0 values.

#!/usr/bin/perl
use strict;

print "X,Y,Val\n";
foreach (@ARGV){
    open log,$_ or die $!;
    my ($x,$y,$val,$err);
    while(<log>){
        chomp;
        ($x,$y) = ($1,$2) if (/\((\d*|-\d*),(\d*|-\d*)\)/);
        ($val,$err) = ($1,$2) if (/^Results.*\((.*),(.*)\)$/);
        if (defined($x) && defined($y) && defined($val) && defined($err)){
            print "$x,$y,$val:$err\n";
            ($x,$y,$val,$err) = undef;
        }
    }
}

Thank you everyone for the answers, I'm learning a lot of new Perl syntax. I've figured out how to get this script down to 10 lines. I was challenging myself on the number of lines in which I could write this.

#!/usr/bin/perl 
use strict;

print "X,Y,Val\n";
open LOG,"<@ARGV[0]" or die $!;
while(<LOG>){
    chomp;
    print "$1,$2," if (/\((\d*|-\d*),(\d*|-\d*)\)/);
    print "$1:$2\n" if (/^Results.*\((.*),(.*)\)$/);
}

Another Update. Using the information in the answers, I was able to get this down to 8 lines. I also improved the regex and made sure that the header would only be printed once if multiple files were provided.

#!/usr/bin/perl
use strict;

while(<>){
    print "X,Y,Val\n" if ($. == 1);
    print "$1,$2," if (/.*\((-?\d+),(-?\d+)\)/);
    print "$1:$2\n" if (/^Results.*\((.*)\).*\((.*)\)$/);
}


I would switch to reading two lines, rather than one:

#!/usr/bin/perl

use strict;
use warnings;

use autodie;

print "X,Y,Val\n";
for my $filename (@ARGV) {
    open my $log, "<", $filename;

    while (my $coord_line = <$log>) {
        my ($x, $y) = $coord_line =~ /\((-?[0-9]+),(-?[0-9])\)/
            or die "bad coored line";
        my $results_line = <$log>;
        my ($val,$err) = $results_line =~ /^Results.*\((.*),(.*)\)$/
            or die "bad results line";

        print "$x,$y,$val:$err\n";
    }
}

One of the benefits of this approach is that your variables are now properly scoped. A simpler version of this program is:

#!/usr/bin/perl

use strict;
use warnings;

use ARGV::readonly; #prevent files like "|ls" from breaking us

print "X,Y,Val\n";
while (<>) {
    my ($x, $y) = /\((-?[0-9]+),(-?[0-9]+)\)/
        or die "bad coored line";
    my ($val,$err) = <> =~ /^Results.*\((.*),(.*)\)$/
        or die "bad results line";

    print "$x,$y,$val:$err\n";
}

Another variant that takes into account the possibility of lines between the two lines we care about. It assumes the first coordinate pair is the right one:

#!/usr/bin/perl

use strict;
use warnings;

use ARGV::readonly; #prevent files like "|ls" from breaking us

print "X,Y,Val\n";
while (<>) {
    next unless my ($x, $y) = /\((-?[0-9]+),(-?[0-9]+)\)/;
    my ($val, $err);
    while  (<>) {
        last if ($val, $err) = /^Results.*\((.*),(.*)\)$/;
    }
    die "bad format" unless defined $val;
    print "$x,$y,$val:$err\n";
} 

And this one handles the case where you want the last coordinate line:

#!/usr/bin/perl

use strict;
use warnings;

use ARGV::readonly; #prevent files like "|ls" from breaking us

print "X,Y,Val\n";
my ($x, $y);
while (<>) {
    ($x, $y) = ($1, $2) if /\((-?[0-9]+),(-?[0-9]+)\)/;
    next unless my ($val, $err) = /^Results.*\((.*),(.*)\)$/;
    print "$x,$y,$val:$err\n";
} 


I am a big proponent of readability and not brevity. Perl is pretty good at optimizing your code, so you don't have to worry about it. Don't get overly concerned about the number of lines, and keep your code readable. What ever you save (if you save anything) in CPU time will be wasted in the time and errors generated by trying to maintain a hard to read program.

In that respect:

  • Don't use post-suffix if statements unless it's something very simple such as next if (s/^\s*$/);.
  • Use variable names and don't depend upon $_.
  • Use spaces after commas.

On top of that, I would add:

  • Don't be afraid to add parentheses if they help clarify what you're doing. I tend to use parentheses if functions have more than two arguments just to help hold arguments together:

For example:

open my $foo, "<", $bar or die qq(This is the end!\n);

vs.

open (my $foo, "<", $bar) or die qq(This is the end!\n);

It's now more obvious which part of the line are parameters in the open function.

The line that I am least thrilled with is:

if (defined($x) && defined($y) && defined($val) && defined($err)){

What's wrong with this line? It's perfectly clear what you're trying to say. I'd use a bit more modern syntax and add in some parentheses to help regroup to make it clearer:

if ((defined $x) and (defined $y) and (defined $val) and (defined $err)) {

Looking at what you're doing, I'd rearrange things a bit...

#! /usr/bin/env perl

use strict;
use warnings;
use features qw(say);

say "X, Y, Val";

for my $filename (<>) {
    open (my $log, "<", $filename) or die $!;

    my ($x, $y, $value, $err);
    while (chomp (my $coord_line = <$log>)) {
        if ($coord_line =~ /\((-?[0-9]+),(-?[0-9])\)/) {
           ($x, $y) = ($1, $2);
        }
        elsif ($coord_line =~ /^Results.*\((.*),(.*)\)$/) {
           ($val, $err) = ($1, $2);
           say "$x, $y, $val:$err";
        }
    }
}

}

Notice I am now just checking a line once. And, notice that I am printing when I get a result which eliminates the need for checking if all variables are set.

Also notice that you don't need ARGV::readonly because you're using more than two parameters in the open function. In this case, opening a file ls| won't cause any problems. The problem only happens when you have just two parameters in your open statement.

The above program is assuming that you only have coordinates and results, or garbage lines. However, if you have multiple coordinates, AND you only want the first set, you'll have to track them. I recommend using a separate variable for this purpose, and you can use constants to help clarify what you're doing:

#! /usr/bin/env perl

use strict;
use warnings;
use features qw(say);

use autodie;

use constants {
    SET     => 1,
    NOT_SET => 0,
};

say "X, Y, Val";

for my $filename (<>) {
    if (not open my $log, "<", $filename) {
       warn qq(Cannot open file "$filename": $!);
       next;
    }

    my ($x, $y, $value, $err);
    my $coordinates = NOT_SET;
    while (my chomp($coord_line = <$log>)) {
        if ($coord_line =~ /\((-?[0-9]+),(-?[0-9])\)/) {
           if ($coordinates == NOT_SET)) {
               ($x, $y) = ($1, $2);
               $coordinates = SET;
           }
        }
        elsif ($coord_line =~ /^Results.*\((.*),(.*)\)$/) {
           ($val, $err) = ($1, $2);
           say "$x, $y, $val:$err";
           $coordinates = NOT_SET;
        }
    }
}

By using an if/elsif statement, you are now checking each line only once. It also lets users know that each line is either a coordinate line or a result line, and that a single line isn't both. In your original program, you're checking each line for both, so it wasn't clear whether or not a single line could be both.

I am also not dying if the file can't be opened. Instead, I print a warning and go on to the next. You could do either way. (I died in the first, but keep plowing ahead in the second).

BTW, your preference on whether the first two if statements can be combined instead of nested. I also have a friend who doesn't like using numeric constants because it's way to easy to say:

if ($coordinates = SET) {

rather than

if ($coordinates == SET) {

If you had this:

 use constants {
    SET     => "set",
    NOT_SET => "",
 };

You'd get use to doing this:

if ($coordinates eq SET) {

and don't run into the = vs. == issue.


One improvement you can do is just open the @ARGV files directly like below. You can also skip the if statement when getting values for your four target variables. You can split the checks and pattern matches with an if-else, to save some processing, and also restrict the scope of $val and $err.

Also, you don't need chomp, since you do not use the lines or the line ending.

Not sure it helps much, but it's something.

use warnings;
use strict;

my ($x,$y);
while (<ARGV>) {
    if (defined $x && defined $y) {
        my ($val,$err) = /^Results.*\((.*),(.*)\)$/;
        if (defined $val && defined $err) {
            print "$x,$y,$val:$err\n";
            ($x,$y) = undef;
        }
    } else {
            ($x,$y) = /\((\d*|-\d*),(\d*|-\d*)\)/;
    }
}
0

精彩评论

暂无评论...
验证码 换一张
取 消