开发者

Regex that defines a regular language with {a,b} without a substring with exactly 3 b's (bbb)

开发者 https://www.devze.com 2023-01-09 20:38 出处:网络
Pretty much what the question says. I came up with (ba)?(a + bb + bbbbb + aba)*(ab)? Is there anything more readable? Or is this incorrect?

Pretty much what the question says. I came up with

(ba)?(a + bb + bbbbb + aba)*(ab)?

Is there anything more readable? Or is this incorrect? I know you shouldn't really be doing this sorta thing with Regex when you can just go !~/bbb/ in your code, but it's a theory exercise.

Thanks.

Edit for Clarification: I'm not using | to represent the OR bit in the Regex and using + it instead. Sorry for the confusion.

Edit 2: {a,b}开发者_如何转开发 is for a language with just 'a' and 'b' characters. Not {mininum, maximum}. Sorry again.

Edit 3: Because this is part of a theory class, we're just dealing with the basics of Regex. The only things you're allowed to use are +, ?, () and *. You cannot use {minimum, maximum).


I think I have a working regex. Let —which is a notation I invented just now—be the regex that matches zero or more b's, except it won't match three of them. This can be replaced by (ε | b | bb | bbbb+), so don't worry that I'm using magic or anything. Now I think that matching strings can be seen as repeating subpatterns of zero or more a's followed by , which could be (a*b°)*, but you need there to be at least one "a" in between sequences of b's. So your final regex is a*b°(a+b°)*.

Since can match the empty string, the initial a* is superfluous as the a+ can pick up the initial a's just fine, so the regex can be optimized down to b°(a+b°)* (thanks, wrikken).


Hmm, something like this?

^(a|(?<!b)b{1,2}(?!b)|b{4,})*$

edit:

Edit 3: Because this is part of a theory class, we're just dealing with the basics of Regex. The only things you're allowed to use are +, ?, () and *. You cannot use {minimum, maximum).

Pfff, talking about tying your hands behind your back... Simple solution: you cannot do it (^ & $ are requirements for it ever to work), and we need the |. So, come up with a better conditions. Dropping the lookbehind & lookahead could be done, but isn't going to be pretty (at least, not without violating DRY):

^(b|bb|bbbb+)?(a+(b|bb|bbbb+)?)*$


You're matching a string without precisely 3 b's in a row. That means you're looking at substrings like "aa", "aba", "abba", and "abbbbb*a", where any of the exterior a's could be the beginning or end of the string, can be overlapped, and can be multiple. This suggests something like:

(a + ab + abb + abbbbb*)*

with appropriate additions to account for the missing a at the beginning of the string. There's a lot of repetitions, but that's how regular expressions work in basic form.

0

精彩评论

暂无评论...
验证码 换一张
取 消