开发者

Using a regular expression to validate email addresses

开发者 https://www.devze.com 2023-02-27 22:18 出处:网络
I have just started learning to code both PHP as well as HTML and had a look at a few tutorials on regular expressions however have a hard time understanding what these mean. I appreciate any help.

I have just started learning to code both PHP as well as HTML and had a look at a few tutorials on regular expressions however have a hard time understanding what these mean. I appreciate any help.

For example, I would like to validate the email address peanuts@monkey.com. I start off with the code and I get the message invalid email address.

  1. What am I doing wrong?
  2. I know that the metacharacters such as ^ denote the start of a string and $ denote the end of a string however what does this mean? What is the start of a string and what is the end of a string开发者_开发问答?
  3. When do I group regular expressions?

 

$emailaddress = 'peanuts@monkey.com';

if(preg_match('/^[a-zA-z0-9]+@[a-zA-z0-9]+\.[a-zA-z0-9]$/', $emailaddress)) {
    echo 'Great, you have a valid email address';       
} else {
    echo 'boo hoo, you have an invalid email address';      
}


What you have written works with some small modifications if that is what you want to use, however you miss a '+' at the end.

1)

 ^[a-zA-Z0-9]+@[a-zA-Z0-9]+\.[a-zA-Z0-9]+$ 

The caret and dollar character match positions rather than characters, ^ is equal to the beginning of line and $ is equal to the end of line, they are used to anchor your regex. If you write your regex without those two you will match email addresses everywhere in your text, not only the email addresses which is on a single line in this case. If you had written only the ^ (caret) you would have found every email address which is on the start of the line and if you had written only the $ (dollar) you would have found only the email addresses on the end of the line.

Blah blah blah someEmail@email.com blah blah

would not give you a match because you do NOT have a email address at the beginning of line and the line does not terminate with it either so in order to match it in this context you would have to drop ^ and $.

  1. Grouping is used for two reasons as far I know: Back referencing and... grouping. Grouping is used for the same reasons as in math, 1 + 3 * 4 is not the same as (1 + 3) * 4. You use parentheses to constrain quantifiers such as '+', '*' and '?' as well as alternation '|' etc.

You also parentheses for back referencing, but since I can't explain it better I would link you to: http://www.regular-expressions.info/brackets.html

I will encourage you to take a look at this book, even though you only read the first 2-3 chapters you will learn a lot and it is a great book! http://oreilly.com/catalog/9781565922570


And as the commentators say, this regex is not perfect but it works and show you what you had forgotten. You were not far away!


UPDATED as requested:

The '+', '*' and '?' are quantifiers. And is also a good example where you group.

  • '+' mean match whatever charachter preceeds it or group 1 or n times.
  • '*' mean match whatever charachter preceeds it 0 or n times.
  • '?' mean match whatever charachter preceeds it or the group 0 or 1 time.

n times meaning (indefinitely)

The reason why you use [a-zA-Z0-9]+ is without the '+' it will only match one character. With the + it will match many but it must match at least one. With * it match many but also 0, and ? will match 1 character at most but also 0.


Your regex doesn't match email addresses. Try this one:

/\b[\w\.-]+@[\w\.-]+\.\w{2,4}\b/

I recommend you read through this tutorial to learn about Regular Expressions.

Also, RegExr is great for testing them out.

As for your second question; the ^ character means that the regular expression must start matching from the first character in the string you input. The $ means that the regular expression must end at the final character in the string you input. In essence, this means that your regular expression will match the following string:

peanuts@monkey.com

but NOT the following string:

My email address is peanuts@monkey.com, and I love it!

Grouping regular expressions has lots of use cases. Using matching groups will also make your expression cleaner and more readable. It's all explained quite well in the tutorial I linked earlier.


As CanSpice points out, matching all possible email addresses isn't all that easy. Using the RFC2822 Email Validation expression will do a better job:

/[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?/

There are many alternatives, but even the simplest ones will do a fair job as most email addresses end in .com (or other 2-4 character length top domains).


The only reason your original expression doesn't work is that you're limiting the number of characters behind the period (.) in your expressions to 1. Changing your expression to:

/^[a-zA-z0-9]+@[a-zA-z0-9]+\.[a-zA-z0-9]+$/

Will allow for an infinite amount of characters behind the last period.

/^[a-zA-z0-9]+@[a-zA-z0-9]+\.[a-zA-z0-9]{2,4}$/

Will allow 2 to 4 characters behind the last period. That would match:

name@email.com

name@email.info

but not:

fake@address.suckers


The top level domain (".com," ".net," ".museum") can be from 2 to 6 characters. So you should be saying 2,6 instead of 2,4.

I wrote an extremely good email address regular expression a few years ago:

^\w+([-+._]\w+)@(\w+((-+)|.))\w{1,63}.[a-zA-Z]{2,6}$

A lot of research went into that. But I have some basic tips:

DON'T JUST COPY-PASTE! If someone says "here's a great regex for that," don't just copy paste it! Understand what's going on! Regular expressions are not that hard. And once you learn them well, it'll pay dividends forever. I got good at them by taking a class in Perl back in college. Since then, I've barely gotten any better and am WAY better than the vast majority of programmers I know. It's sad. Anyways, learn it!

Start small. Instead of building a giant regex and testing it when you're done, test just a few characters. For example, when writing an email validator, why not try \w+@\w+.\w+ and see how good that is? Add in a few more things and re-test. Like ^\w+@\w+.[A-Za-z]{2,6}$


The start and end of a regex string means that nothing can come before or after the characters you specify. Your regex string needs to account for underscores, needs capitals Zs with your capital ranges, and other adjustments.

/^[a-zA-Z_0-9]+@[a-zA-Z0-9]+\.[a-zA-z0-9]{2,4}$/

{2,4} says the top level domain is between 2 and 4 characters.


This will validate ANY email address (at least i've tried a lot )

preg_match("/^[a-z0-9._-]{2,}+\@[a-z0-9_-]{2,}+\.([a-z0-9-]{2,4}|[a-z0-9-]{2,}+\.[a-z0-9-]{2,4})$/i", $emailaddress);

Hope it works!


Make sure you ALWAYS escape metacharacters (like dot):

if(preg_match('/^[a-zA-z0-9]+@[a-zA-z0-9]+\.[a-zA-z0-9]$/', $emailaddress)) {
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号