开发者

Checking correctness of an email address with a regular expression in Bash

开发者 https://www.devze.com 2022-12-18 07:01 出处:网络
I\'m trying to make a Bash script to check if an email address is correct. I have this regular expression:

I'm trying to make a Bash script to check if an email address is correct.

I have this regular expression:

[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?

Source: http://www.regular-expressions.info/email.html

And this is my bash script:

regex=[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'开发者_StackOverflow社区*+/=?^_`{|}~-]+)*@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?

i="test@terra.es"
if [[ $i=~$regex ]] ; then
    echo "OK"
else
    echo "not OK"
fi

The script fails and give me this output:

10: Syntax error: EOF in backquote substitution

Any clue??


You have several problems here:

  • The regular expression needs to be quoted and special characters escaped.
  • The regular expression ought to be anchored (^ and $).
  • ?: is not supported and needs to be removed.
  • You need spaces around the =~ operator.

Final product:

regex="^[a-z0-9!#\$%&'*+/=?^_\`{|}~-]+(\.[a-z0-9!#$%&'*+/=?^_\`{|}~-]+)*@([a-z0-9]([a-z0-9-]*[a-z0-9])?\.)+[a-z0-9]([a-z0-9-]*[a-z0-9])?\$"

i="test@terra.es"
if [[ $i =~ $regex ]] ; then
    echo "OK"
else
    echo "not OK"
fi


You don't have to create such a complicated regex to check valid email. You can simply split on "@", then check whether there are 2 items, one that is in front of the @, and the other at the back.

i="test@terraes"
IFS="@"
set -- $i
if [ "${#@}" -ne 2 ];then
    echo "invalid email"
fi
domain="$2"
dig $domain | grep "ANSWER: 0" 1>/dev/null && echo "domain not ok"

To check the domain further, you can use tools like dig to query the domain. It is better than regex because @new.jersey gets matched by regex but its actually not a proper domain.


Quotes, backticks and others are special characters in shell scripts and need to be escaped if they are used like in the assignment of regex. You can escape special characters with backslashes, or use single quotes around the regex if you leave out the single quote used in it.

I would recommend to use a simpler regular expression like .*@.* because all the complexity is futile. foo@example.com looks perfectly fine and will be accepted by any regular expression, but still it doesn't exist.


Bash version less than 3.2:

if [[ "$email" =~ "^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,4}$" ]]
then
    echo "Email address $email is valid."
else
    echo "Email address $email is invalid."
fi

Bash version greater than or equal to 3.2:

if [[ "$email" =~ ^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,4}$ ]]
then
    echo "Email address $email is valid."
else
    echo "Email address $email is invalid."
fi

The reasons why you shouldn't use a very specific regex, like you have, are explained here.


The immediate problem with your script is you need to fix the quoting:

regex='[a-z0-9!#$%&'"'"'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'"'"'*+/=?^_`{|}~-]+)*@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?'

However, this regular expression does not accept all syntactically valid email addresses. Even if it did, not all syntactically valid email addresses are deliverable.

If deliverable addresses are what you care about, then don't bother with a regular expression or other means of checking syntax: send a challenge to the address that the user supplies. Be careful not to use untrusted input as part of a command invocation! With sendmail, run sendmail -oi -t and write a message to the standard input of the sendmail process, e.g.,

To: test@terra.es.invalid
From: no-reply@your.organization.invalid
Subject: email address confirmation

To confirm your address, please visit the following link:

http://www.your.organization.invalid/verify/1a456fadef213443


In a moment of madness once, I wrote this Perl subroutine based on the Mastering Regular Expressions book:

sub getRFC822AddressSpec
{
    my ($esc, $space, $tab, $period) = ('\\\\', '\040', '\t', '\.');
    my ($lBr, $rBr, $lPa, $rPa)      = ('\[', '\]', '\(', '\)');
    my ($nonAscii, $ctrl, $CRlist)   = ('\200-\377', '\000-\037', '\n\015');

    my $qtext       = qq{ [^$esc$nonAscii$CRlist] }; # within "..."
    my $dtext       = qq{ [^$esc$nonAscii$CRlist$lBr$rBr] }; # within [...]
    my $ctext       = qq{ [^$esc$nonAscii$CRlist()] }; # within (...)
    my $quoted_pair = qq{ $esc [^$nonAscii] }; # an escaped char
    my $atom_char   = qq{ [^()$space<>\@,;:".$esc$lBr$rBr$ctrl$nonAscii] };
    my $atom        = qq{ $atom_char+     # some atom chars
                          (?!$atom_char)  # NOT followed by part of an atom
                        };
    # rfc822 comments are (enclosed (in parentheses) like this)
    my $cNested     = qq{ $lPa (?: $ctext | $quoted_pair )* $rPa };
    my $comment     = qq{ $lPa (?: $ctext | $quoted_pair | $cNested )* $rPa };

    # whitespace and comments may be scattered liberally
    my $X           = qq{ (?: [$space$tab] | $comment )* };

    my $quoted_str  = qq{ " (?: $qtext | $quoted_pair )* " };
    my $word        = qq{ (?: $atom | $quoted_str ) };
    my $domain_ref  = $atom;
    my $domain_lit  = qq{ $lBr (?: $dtext | $quoted_pair )* $rBr };
    my $sub_domain  = qq{ (?: $domain_ref | $domain_lit ) };
    my $domain      = qq{ $sub_domain (?: $X $period $X $sub_domain )* };
    my $local_part  = qq{ $word (?: $X $period $X $word )* };
    my $addr_spec   = qq{ $local_part $X \@ $X $domain };

    # return a regular expression object
    return qr{$addr_spec}ox;
}

my $spec = getRFC822AddressSpec();
my $address = q{foo (Mr. John Foo) @ bar. example};
print "$address is an email address" if ($address =~ qr{$spec});


I've adjusted the above examples to have a unique function that will check for the validity of the address with the regexp and if the domain actual exist with dig, otherwise return an error.

#!/bin/bash
#Regexp
regex="^[a-z0-9!#\$%&'*+/=?^_\`{|}~-]+(\.[a-z0-9!#$%&'*+/=?^_\`{|}~-]+)*@([a-z0-9]([a-z0-9-]*[a-z0-9])?\.)+[a-z0-9]([a-z0-9-]*[a-z0-9])?\$"

#Vars
checkdig=0;
checkreg=0;
address=$1;
maildomain=`echo $address | awk 'BEGIN { FS = "@" } ; { print $2 }'`;

#Domain Check
checkdns() {
        echo $maildomain | awk 'BEGIN { FS = "@" } ; { print $2 }' | xargs dig $maildomain | grep "ANSWER: 0" 1>/dev/null  || checkdig=1;
}

#Regexp
checkreg() {
        if [[ $address =~ $regex ]] ;
                then checkreg=1;
        fi
}

#Execute
checkreg;
checkdns;

#Results
if [ $checkreg == 1 ] && [ $checkdig == 1 ];
        then    echo "OK";
        else    echo "not OK";
fi
#End

Nothing special.


Comming late for the party, but I adapted a script to read a file containing emails and filtering it using RFC822 regex, domain typo lists, mx lookup (thanks to eagle1 here) and ambiguous email filtering.

The script can be used like:

./emailCheck.sh /path/to/emailList

and produces two files, the filtered list and the ambiguous list. Both are already cleared from non RFC822 compliant adresses, email domains that don't have valid MX domains, and domain typos.

Script can be found here: https://github.com/deajan/linuxscripts

Corrections and comments are welcome :)

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号