开发者

Looping through files with perl

开发者 https://www.devze.com 2023-01-09 12:22 出处:网络
Okay I have 2 files. One file is data that is updated every 10 minutes while the second is data that was previously used. What I am trying to do is take one line from the new file and loop through eac

Okay I have 2 files. One file is data that is updated every 10 minutes while the second is data that was previously used. What I am trying to do is take one line from the new file and loop through each line of the second file and see if it matches one. If it does I dont want to use it, but if there is no match than I want to add it to a string. In what I have done so far it seems that the check does not ever find a match even though there is one. Here is what I have and a sample of the data I have been using from both files. CHECKHAIL and USEDHAIL are the two files

while(my $toBeChecked = <CHECKHAIL>){
        my $found = 0;
        seek USEDHAIL, 0, 0 or die "$0: seek: $!";
        while(my $hailCheck = <USEDHAIL>){
            if( $toBeChecked == $hailCheck){
                $found += 1;
            }
        }
        print USEDHAIL $toBeChecked;
        if ($found == 0){
            $toEmail .= $toBeChecked;
        }
    }
    print $toEmail;
    return;
}

CHECKHAIL sample data

2226  175   2 NE      LAWRENCE           DEADWOOD         SD    44.4    -103.7  (UNR)

2305  200   2 S       SISKIYOU           GREENVIEW        CA    41.52   -122.9  2 INCH HAIL REPORTED WITH STORM JUST SOUTH OF GREENVIEW. (MFR)

2350  200             DANIELS            E FLAXVILLE      MT    48.8    -105.17 GOLF BALL TO HEN EGG SIZED HAIL (GGW)

2350  175   5 N       DANIEL开发者_运维百科S            RICHLAND         MT    48.89   -106.05 DESTROYED CROPS (GGW)

USEDHAIL sample data

2226  175   2 NE      LAWRENCE           DEADWOOD         SD    44.4    -103.7  (UNR)

2305  200   2 S       SISKIYOU           GREENVIEW        CA    41.52   -122.9  2 INCH HAIL REPORTED WITH STORM JUST SOUTH OF GREENVIEW. (MFR)


It never has an opportunity to succeed because of

while(<USEDHAIL>){
    my $hailCheck = $_;
    if( $toBeChecked eq $hailCheck){
        $found += 1;
    }else{
        return;  ### XXX
    }
}

On the first mismatch, the sub returns to its caller. You may have meant next instead, but for conciseness, you should remove the whole else clause. Remove the other else { return; } (corresponding to when $found is true) for the same reason.

Note that your algorithm has quadratic complexity and will be slow for large inputs. It'd be better to read the used records into a hash and then for each line of CHECKHAIL probe the %used hash to see whether it's been processed.

With those lines removed, I get

$ ./prog.pl 

2305  200   2 S       SISKIYOU           GREENVIEW        CA    41.52   -122.9  2 INCH HAIL REPORTED WITH STORM JUST SOUTH OF GREENVIEW. (MFR)

2350  200             DANIELS            E FLAXVILLE      MT    48.8    -105.17 GOLF BALL TO HEN EGG SIZED HAIL (GGW)

2350  175   5 N       DANIELS            RICHLAND         MT    48.89   -106.05 DESTROYED CROPS (GGW)

As you can see, that still has a bug. You need to rewind USEDHAIL for each line of CHECKHAIL:

seek USEDHAIL, 0, 0 or die "$0: seek: $!";
while(<USEDHAIL>){
...

This produces

$ ./prog.pl 
2350  200             DANIELS            E FLAXVILLE      MT    48.8    -105.17 GOLF BALL TO HEN EGG SIZED HAIL (GGW)
2350  175   5 N       DANIELS            RICHLAND         MT    48.89   -106.05 DESTROYED CROPS (GGW)

For an example of a better way to do it, consider

#! /usr/bin/perl

use warnings;
use strict;

sub read_used_hail {
  my($path) = @_;

  my %used;

  open my $fh, "<", $path or die "$0: open $path: $!";

  local $" = " ";  # " fix Stack Overflow highlighting
  while (<$fh>) {
    chomp;
    my @f = split " ", $_, 10;
    next unless @f;
    ++$used{"@f"};
  }

  wantarray ? %used : \%used;
}

my %used = read_used_hail "used-hail";
open my $check, "<", "check-hail" or die "$0: open: $!";

while (<$check>) {
  chomp;
  my @f = split " ", $_, 10;
  next if !@f || $used{join " " => @f};
  print $_, "\n";
}

Sample run:

$ ./prog.pl 
2350  200             DANIELS            E FLAXVILLE      MT    48.8    -105.17 GOLF BALL TO HEN EGG SIZED HAIL (GGW)
2350  175   5 N       DANIELS            RICHLAND         MT    48.89   -106.05 DESTROYED CROPS (GGW)


Why wouldn't you just create a hash for the first (used) file?

use strict; 
use warnings;
my %fromUsedFile;
open USEDFILE, '<', '/the/data/file/that/is/10minutesold';
$fromUsedFile{$_}++  while <USEDFILE>;
close USEDFILE;

while ($toBeChecked = <CHECKHAIL>) {
    if (defined $fromUsedFile{$toBeChecked}) {
        # ... line is in both the new and old file
    } else {
        # ... line is only in the new file
        $toBeEmailed .= $toBeChecked;
    }
}


Using $_ within an inner loop can cause problems. Try naming your lines first like so:

while(my $toBeChecked = <CHECKHAIL>){
    my $found = 0;
    while( my $hailCheck = <USEDHAIL>){

Also perl sees numeric comparison and string comparison differently. You're using string comparison instead of numeric comparison:

 if ($found eq 0){

Change to:

 if ($found == 0){


This line sticks out for me:

if ($found eq 0){

Since $found is a boolean, perform boolean tests on it:

if (not $found) {

It also looks like your logic is a bit reversed -- in the first if, you return if the lines do not match, and then in the second if, you return if there was a match. Do you perhaps intend to say next; to skip out of the innermost loop, instead?

0

精彩评论

暂无评论...
验证码 换一张
取 消