开发者

How does this regular expression work?

开发者 https://www.devze.com 2023-01-08 14:22 出处:网络
From this article, /^1?$|^(11+?)\\1+$/ checks whether a number(its value in unary) is prime or not. Using this, perl -l -e \'(1 x $_) !~ /^1?$|^(11+?)\\1+$/ && print while ++$_;\' returns a

From this article,

/^1?$|^(11+?)\1+$/ checks whether a number(its value in unary) is prime or not.

Using this, perl -l -e '(1 x $_) !~ /^1?$|^(11+?)\1+$/ && print while ++$_;' returns a list of prime numbers.

I do not have enough experience with Perl, but what I understand is that the regular expression will be true for a number that is not prime. So, if we print all numbers that do not produce a true with this expression, we have a list of prime numbers. Thats what the perl query is trying to do.

About the regex part,

^1?$ part is for counting 1 as not prime

^(11+?)\1+$ is for matching not prime numbers starting from 4.


What I do not understand is why is the ? in the regex needed at all. According to me /^1$|^(11+)\1+$/ should be just fine and actually

perl -l -e '(1 x $_) !~ /^1$|^(11+)\1+$/ && print while ++$_;' gi开发者_如何学运维ves me the same set of prime numbers.

Is there any flaw in my understanding of the regular expression? Why are the ?s needed?

Isn't ? supposed to match zero or one occurrence of the expression preceding it?


The first ? is for matching the empty string (i.e. 0) as non-prime. If you don't care whether the regexp matches 0, then it's not necessary.

The second ? is only for efficiency. + is normally "greedy", which means it matches as many characters as are available, and then backtracks if the rest of the regexp fails to match. The +? makes it non-greedy, so it matches only 1 character, and then tries matching more if the rest of the regexp fails to match. (See the Quantifiers section of perlre for more about greedy vs. non-greedy matching.)

In this particular regexp, the (11+?) means it tests divisibility by 2 ('11'), then 3 ('111'), then 4, etc. If you used (11+), it would test divisibility by N (the number itself), then N-1, then N-2, etc. Since a divisor must be no greater than N/2, without the ? it would waste time testing a lot of "potential" divisors that can't possibly work. It would still match non-prime numbers, just more slowly. (Also, $1 would be the largest divisor instead of the smallest one.)


The first ? will make "" (the empty string, unary zero) not be a prime number. Zero is defined as not prime.

The second is different; it stops the regular expression from greedy-matching. It should improve the performance of the match greatly, since the first part of that section ((11+)) won't consume almost the entire string before having to backtrack. If you omit the question mark, you're effectively testing whether odd n is divisible by n-1 and so one down; if you include it, you're testing divisibility by two first and so on upwards. Obviously, numbers tend to be divisible by smaller factors more often, so your match will be faster.

0

精彩评论

暂无评论...
验证码 换一张
取 消