开发者

Regular expression in JavaScript not the same as in PHP

开发者 https://www.devze.com 2023-01-12 14:16 出处:网络
I have a regular expression to match usernames (which functions in PHP using preg_match): /[a-z]+(?(?=\\-)[a-z]+|)开发者_如何学编程\\.[1-9][0-9]*/

I have a regular expression to match usernames (which functions in PHP using preg_match):

/[a-z]+(?(?=\-)[a-z]+|)开发者_如何学编程\.[1-9][0-9]*/

This pattern matches usernames of the form abc.124, abc-abc.123, etc.

However, when I take this to JavaScript:

var re = new RegExp("/[a-z]+(?(?=\-)[a-z]+|)\.[1-9][0-9]*/"); 

I receive a syntax error:

SyntaxError: Invalid regular expression: /[a-z]+(?(?=-)[a-z]+|).[1-9][0-9]*/: Invalid group

The (?(?=\-)[a-z]+|) is to say if after [a-z]+ we see a - then assert that [a-z]+ is after it otherwise, match nothing. This all works great in PHP, but what am I missing about JavaScript that is different?

EDIT: I appreciate the comments, and now I have one last question regarding this:

    var str="accouts pending removal shen.1206";
    var patt= new RegExp("/[a-z]+(?:-[a-z]+)?\.[1-9][0-9]*/"); 
    var result=patt.exec(str);
    alert(result); 

This alert comes up as null? But if I do the following it works:

var patt=/[a-z]+(?:-[a-z]+)?\.[1-9][0-9]*/;
var result=patt.exec(str);
alert(result); 

Why does "new RegExp()" not work?


Different regular expression engines support different features. Conditionals are not supported by Javascript.

In any event, the conditional is unnecessary for your pattern. I would simplify your expression to /[a-z]+(?:-[a-z]+)?\.[1-9][0-9]*/, which is easier to understand and will work in both PHP's PCRE and in Javascript.


JavaScript does not use the same regular expression implementation as PHP does. In this case JavaScript does not support the conditional expression (?(?=regex)then|else) (see comparison of regular expression flavors). But you could use the following regular expression that is equivalent to yours:

/[a-z]+(?:-[a-z]+)?\.[1-9][0-9]*/

And when using the RegExp constructor to create the regular expression (instead of the regular expression literal syntax /…/), you need to escape the escaping \ too. So:

var re = /[a-z]+(?:-[a-z]+)?\.[1-9][0-9]*/;                 // OR
var re = new RegExp("/[a-z]+(?:-[a-z]+)?\\.[1-9][0-9]*/");


Your conditional doesn't work even in PHP. The lookahead - (?=-) - succeeds if the next character is a hyphen, but it doesn't consume the hyphen. Then [a-z]+ tries to match at the same position and fails, because the next character is still -. You would have to match the hyphen again - -[a-z]+ - but as the others have said, you shouldn't be using a conditional anyway.

Conditionals are seductive; they seem like they should be very useful, but in practice they seldom are. They lure us in by mirroring the way we naturally think about certain problems: "I want to match some letters, and if the character following them is a hyphen, I want to match it and some more letters."

You'll save yourself a lot of hassle if you learn to think a little more like a regex: "I want to match a chunk of letters, optionally followed by a hyphen and some more letters." The regex practically writes itself:

/[a-z]+(?:-[a-z]+)?/

(The \.[1-9][0-9]* part of your regex was fine; I left it out so I could stay focused on the conditional aspect.)


EDIT: To answer the question in the comment, yes, your regex matches strings of both forms: abc.124 and abc-abc.123. But take a look exactly which part of the string it's matching:

Array
(
    [0] => Array
        (
            [0] => abc.124
            [1] => abc.123
        )

)

What happens is that the first [a-z]+ initially matches the first abc in abc-abc.123. Then the lookahead matches the - without consuming it and the second [a-z]+ tries to match the hyphen and fails, as I said earlier.

Having failed to find a match at that position, the regex engine starts bumping ahead one character at a time and trying again. When it gets to the second abc, the first [a-z]+ matches it and hands off to the next part of the regex, the conditional.

The next character in the input string is ., so the lookahead fails. The conditional isn't required to match anything because you didn't provide a subpattern for the else clause. So the conditional matches nothing and control passes to the next part of the regex, \.[1-9][0-9]*, which succeeds.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号