开发者

Regexp crashing on iPhone

开发者 https://www.devze.com 2023-01-30 22:11 出处:网络
I have a regular expression which looks like this: ^(\\+\\d\\d)?(?(?<=\\+\\d\\d)((| )\\(0\\)(| )| |)|(0))(8|\\d\\d\\d?)[-/ ]?\\d\\d( ?\\d){1,4} ?\\d\\d$

I have a regular expression which looks like this:

^(\+\d\d)?(?(?<=\+\d\d)((| )\(0\)(| )| |)|(0))(8|\d\d\d?)[-/ ]?\d\d( ?\d){1,4} ?\d\d$

It's used to validate Swedish phone numbers. In other environments, such as .NET, this regular expression works fine, but in Objective-c, it causes a crash, saying that the regular expression isn't a开发者_开发问答 valid regexp. I'm far from an expert when it comes to regular expressions, so I'm wondering if someone maybe can help me find the reason this regexp isn't working.

I'm using Reggy to validate the regexp and the problem seems to be this group

(?(?<=\+\d\d)((| )\(0\)(| )| |)|(0))

but I can't figure out why... If I remove (? and ) from the start and end of this group, the crash disappears. Does anyone know what (? does? As far as I know ? is used to specify that a group is optional, but what does it mean when it's used at the very beginning of a group?


It's a condition:

(?(condition)true-expression|false-expression)

and because NSPredicate uses ICU's Regular Expressions package, such conditions are not available. See:

http://userguide.icu-project.org/strings/regexp

You should use a third party regex library.


I've made your regex "legible" by transforming it into verbose form and annotating it, so you can see what it is trying to do. I hope you'll agree that most of this is not making much sense:

^                   # Start of string
(\+\d\d)?           # Match + and two digits optionally, capture in backref 1
(?(?<=\+\d\d)       # Conditional: If it was possible to match +nn previously,
 (\s?\(0\)\s?|\s|)  # then try to match (0), optionally surrounded by spaces
                    # or just a space, or nothing; capture that in backref 2
 |                  # If it was not possible to match +nn,
 (0)                # then match 0 (capture in backref 3)
)                   # End of conditional
(8|\d\d\d?)         # Match 8 or any two-three digit combination --> backref 4
[-/\s]?             # match a -, / or space optionally
\d\d                # Match 2 digits, don't capture them
(\s?\d){1,4}        # Match 1 digit, optionally preceded by spaces; 
                    # do this 1 to 4 times, and capture only the last match --> backref 5
\s?\d\d             # Match an optional space and two digits, don't capture them
$                   # End of string

In its current form, it validates strings like

+46 (0) 1234567
+49 (0) 1234567
+00 1234567
+99 08 11 1 11
01234567
012-34 5 6 7 8 90

and it fails on strings like

+7 123 1234567
+346 (77) 123 4567
+46 (0) 12/34 56 7

So I very much doubt that it is doing what it should. Apart from that, most of the regex can be simplified a lot, dropping the conditional that's tripping up your regex library on the way. It doesn't make much sense to optimize something that's broken, but if your client insists, here is a version that has exactly the same functionality, but without conditionals:

^(?:\+\d\d(?: ?(?:\(0\)\s?)?)?|0)(?:8|\d\d\d?)[-/ ]?\d\d(?: ?\d){1,4} ?\d\d$
0

精彩评论

暂无评论...
验证码 换一张
取 消