开发者

Validate email address from a text file?

开发者 https://www.devze.com 2023-01-26 14:03 出处:网络
Im trying to search through a text file and find the valid email addresses.Im doing something like this:

Im trying to search through a text file and find the valid email addresses. Im doing something like this:

    #!/usr/bin/perl -w

my $infile = 'emails.txt';

    open IN, "< $infile" or die "Can't open $infile : $!";

    while( <IN> )
    { 
        if ($infile =~ /^[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,6}$/) 
        { 
            print "Valid开发者_如何学Go \n"; 
        } 
    }

    close IN;

But it doesnt do anything, any help?


You match the email address regexp against the name of the file. And anyway you should not use regex to validate email address - use Email::Valid

use strict;

use Email::Valid;

my $infile = 'emails.txt';

open my $in, "< $infile" or die "Can't open $infile : $!";

while(my $line = <$in> ) {

    chomp $line;

    if (Email::Valid->address($line)) {

        print "Valid \n";

    }


}

close $in;


You're trying to match $infile, which contains the name of the text file, i.e. 'emails.txt'.

You should be doing something like

while(<IN>) {
     print "Valid \n" if $_ =~ /\bYOURREGEX\b/
}

This way \b matches word boundaries instead of the beginning and end of the line and you can match email addresses contained within another string.

EDIT: But Jira's answer is definitely better, this one just tells you what's wrong.

Hope this helps!


You'll have problems with this regex unless:

  1. The email address is the only thing in a line of the file
  2. The email address in the file is all caps.

You should replace all A-Z, which only accepts caps, with \p{Alpha} all alpha characters regardless of case. Where you combine it with 0-9 and _. You should instead replace it with \w (any word character).

/^[\w.%+-]+@[\p{Alnum}.-]+\.\p{Alpha}{2,6}$/

This still isn't a valid regex for emails, though, see Benoit's comment--but it might do the job in a pinch.


I don't know Perl, but your Regular Expression is matching the beginning and end of the entire string. Unless you are setting a multi-line flag and/or only having 1 email address per file you won't get results.

Try removing the ^ (beginning of string) and $ (end of string) tokens and see if that helps any.

It might help to post a dataset sample as well. As without a sample I can't help you any further.


Don't you need something like this?

@lines = <IN>;
close IN;

foreach $line (@lines)
{
...
}


There is a copy of the regex to validate RFC 5322 email addresses here on SO, you know. It looks like this:

$rfc5322 = qr{
    # etc
}x;

It has a thing or two in the # etc elision I’ve made above, which you can check out in the other answer.

By the way, if you’re going to use \b in your regexes, please please be especially careful that you know what it’s touching.

$boundary_before     =  qr{(?(?=\w)(?<!\w)|(?<=\w))}; # like /\bx/
$boundary_after      =  qr{(?(?<=\w)(?!\w)|(?=\w))};  # like /x\b/
$nonboundary_before  =  qr{(?(?=\w)(?<=\w)|(?<!\w))}; # like /\Bx/
$nonboundary_after   =  qr{(?(?<=\w)(?=\w)|(?!\w))};  # like /x\B

That’s seldom what people are expecting.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号