开发者

Perl-script to read and print lines from multiple txt files?

开发者 https://www.devze.com 2023-03-30 16:36 出处:网络
We have 300+ txt files, of which are basically replicates of an email, each txt file has the following format:

We have 300+ txt files, of which are basically replicates of an email, each txt file has the following format:

To: blabla@hotmail.com 
Subject: blabla 
From: bla1@hotmail.com 
Message: Hello World! 

The platform I am to the script on is Windows, and everything is local (including the Perl instance). The aim is to write a script, which crawls through each file (all located within the same directory), and print out a list of each 'unique' email address in the from field. The concept 开发者_如何学Cis very easy.

Can anyone point me in the right direction here? I know how to start off a Perl script, and I am able to read a single file and print all details:

 #!/usr/local/bin/perl
 open (MYFILE, 'emails/email_id_1.txt');
 while (<MYFILE>) {
    chomp;
    print "$_\n";
 }
 close (MYFILE);

So now, I need to be able to read and print line 3 of this file, but perform this activity not just once, but for all of the files. I've looked into the File::Find module, could this be of any use?


What platform? If Linux then it's simple:

foreach $f (@ARGS) {    
    # Do stuff 
}

and then call with:

perl mything.pl *.txt

In Windows you'll need to expand the wildcard first as cmd.exe doesn't expand wildcards (unlike Linux shells):

@ARGV = map glob, @ARGV

foreach $f (@ARGS) {
    # Do stuff
}

then extracting the third line is just a simple case of reading each line in and counting when you've got to line 3 so you know to print the results.


The glob() builtin can give you a list of files in a directory:

chdir $dir or die $!;
my @files = glob('*');

You can use Tie::File to access the 3rd line of a file:

use Tie::File;

for (@files) {
    tie my @lines, 'Tie::File', $_ or die $!;
    print $lines[2], "\n";         
}


Perl one-liner, windows-version:

perl -wE "@ARGV = glob '*.txt'; while (<>) { say $1 if /^From:\s*(.*)/ }"

It will check all the lines, but only print if it finds a valid From: tag.


Are you using a Unix-style shell? You can do this in the shell without even using Perl.

grep "^From:" ./* | sort | uniq -c"

The breakdown is as follows:

  1. grep will grab every line that starts with "From:", and send it to...
  2. sort, which will alpha sort those lines, then...
  3. uniq, which will filter out dupe lines. The "-c" part will count the occurrences.

Your output would look like:

    3 From: dave@example.com
    5 From: foo@bar.example.com
    etc...

Possible issues: I'm not sure how complex your "From" lines will be, e.g. multiple addresses, different formats, etc.

You could enhance that grep step in a few ways, or replace it with a Perl script that has less-broad functionality than your proposed all-in-one script.

Please comment if anything isn't clear.


Here's my solution (I hope this isn't homework).

It checks all files in the current directory whose names end with ".txt", case-insensitive (e.g., it will find "foo.TXT", which is probably what you want under Windows). It also allows for possible variations in line terminators (at least CR-LF and LF), and searches for the From: prefix case-insensitively, and allows arbitrary whitespace after the :.

#!/usr/bin/perl

use strict;
use warnings;

opendir my $DIR, '.' or die "opendir .: $!\n";
my @files = grep /\.txt$/i, readdir $DIR;
closedir $DIR;
# print "Got ", scalar @files, " files\n";

my %seen = ();
foreach my $file (@files) {
    open my $FILE, '<', $file or die "$file: $!\n";
    while (<$FILE>) {
        if (/^From:\s*(.*)\r?$/i) {
            $seen{$1} = 1;
        }
    }
    close $FILE;
}

foreach my $addr (sort keys %seen) {
    print "$addr\n";
}
0

精彩评论

暂无评论...
验证码 换一张
取 消