开发者

Double anchoring regular expressions

开发者 https://www.devze.com 2023-01-09 01:26 出处:网络
I want to accept an arbitrary regular expression from the user and anchor it on both sides in order to enforce a full match (^<user\'s-regex>$) however I don\'t know if I have to take into accou

I want to accept an arbitrary regular expression from the user and anchor it on both sides in order to enforce a full match (^<user's-regex>$) however I don't know if I have to take into account the fact that the user may have already anchored his regex.

It looks like Perl, C++, .NET and JavaScript all allow double multiple anchoring.

"hello" =~ /^h/ # true
"hello" =~ /^^h/ # true
"hello" =~ /^^^h/ # true
"hello" =~ /e/ # true
"hello" =~ /^e/ # false
"hello" =~ /^^e/ # false

Does anyone know if this is specified to work this way? Can I depend on this behaviour or is it an accident that is liable to change in the future?


Edit: The reason we need this is that we're using VBScript's regex's (from COM), we're using match however this returns all matches so it开发者_JAVA百科's much slower to match the string abc to .*a.* than to ^.*a.*$. By using the anchoring as suggested by @Tim we speed matches up (for long strings) by more than a factor of 12.


You can depend on this behavior. The regex engine doesn't mind asserting the same thing once, twice, or a hundred times in a row.

However, instead of simply adding anchors around the regex, you should also add a non-capturing group around it:

^(?: - user regex - )$ or preferably, if your regex flavor allows this: \A(?: - user regex - )\Z

Otherwise, you'll trip up if the user uses alternation in his regex. Compare:

user regex:         hello|bye
anchored regex:     ^hello|bye$      // alternation now affects anchors
correctly anchored: ^(?:hello|bye)$
0

精彩评论

暂无评论...
验证码 换一张
取 消