开发者

What's the difference between Word Boundaries and Start of String and End of String Anchors (Regex)?

开发者 https://www.devze.com 2023-03-20 14:01 出处:网络
Why are the two regular expressions evaluating the email differently in this example? http://codepad.viper-7.com/SEgMzZ

Why are the two regular expressions evaluating the email differently in this example?

http://codepad.viper-7.com/SEgMzZ

    <?php

    $email = 'ΘΘΘme@gmail.com';
    $regex = '#\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b#i';
    $regex2 = '#^[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}$#i';


    if (preg_match($regex, $email)) {
        echo "A match was found.";
    } else {
        echo "A match was not found.";
    }


    i开发者_开发百科f (preg_match($regex2, $email)) {
        echo "A match was found.";
    } else {
        echo "A match was not found.";
    }
    ?>

EDIT: I expect both of these to NOT match


The problem is with your strange Θ chars (U0398 Greek capital letter Theta). PHP is not considering them as being parts of a word, so there is a word boundary between ΘΘΘ and me@....

The first regex matches since the rest of the string is ok.

The second doesn't match because those Θ are not in the first character class, so your string doesn't match it.

As Wrikken points out, you can use the /u (PCRE8) modifier in your regex to make PHP treat the string as UTF-8. The Theta letter will not introduce a word boundary in that case, and both expressions will fail to match.

0

精彩评论

暂无评论...
验证码 换一张
取 消