开发者

Create shortest possible regex

开发者 https://www.devze.com 2023-02-14 10:30 出处:网络
I want to create a regex that will match any of these values 7-5 6-6 ((0-99) - (0-99)) 6-4 6-3 6-2 6-1 6-开发者_Go百科0

I want to create a regex that will match any of these values

7-5

6-6 ((0-99) - (0-99))

6-4

6-3

6-2

6-1

6-开发者_Go百科0

0-6

1-6

2-6

3-6

4-6

the 6-6 example is a special case, here are some examples of values:

6-6 (23-8)

6-6 (4-25)

6-6 (56-34)

Is it possible to make one regex that can do this?

If so, is it possible to further extend that regex for the 6-6 special case such that the the difference between the two numbers within the parentheses is equal to 2 or -2?

I could easily write this with procedural code, but i'm really curious if someone can devise a regex for this.

Lastly, if it could be further extended such that the individual digits were in their own match groups I'd be amazed. An example would be for 7-5, i could have a match group that just had the value 7, and another that had the value 5. However for 6-6 (24-26) I'd like a match group that had the first six, a match group for the second 6, a match group for the 24 and a match group for the 26.

This may be impossible, but some of you can probably get this part of the way there.

Good luck, and thanks for the help.


NO. The answer is "We can't," and the reason is because you're trying to use a hammer to dig a hole.

The problem with writing one long "clever" (this word causes a knee-jerk reaction in many people who are far more anti-regex than I) regex is that, six months from now, you'll have forgotten those clever regex features that you used so heavily, and you'll have written six months worth of code related to something else, and you'll get back to your impressive regex and have to tweak one detail, and you'll say, "WTF?"

This is what (I understand) you want, in Perl:

# data is in $_
if(/7-5|6-[0-4]|[0-4]-6|6-6 \((\d{1,2})-(\d{1,2})\)/) {
  if($1 and $2 and abs($1 - $2) == 2) {
    # we have the right difference
  }
}

Some might say that the given regex is a bit much, but I don't think it's too bad. If the \d{1,2} bit is a little too obscure you could use \d\d? (which is what I used at first, but didn't like the repetition).


You can do it like this:

7-5|6-[0-4]|[0-5]-6|6-6 \(\d\d?-\d\d?\)

Just add parens to get your match groups.


Off the top of my head (there may be some errors but the principle should be good):

\d-\d|6-6 (\d+-\d+)

And like with any regexp, you can surround what you want to extract with parentheses for match groups:

(\d)-(\d)|(6)-(6) ((\d)+-(\d+))

In the 6-6 case, the first two parentheses should get the sixes, and the second two should get the multi-digit values that come afterwards.


Here is one that will match only the numbers you want and let you get each digit by name:

p = r'(?P<a>[0-4]|6|7)-(?P<b>[0-4]|6|5) *(\((?P<c>\d{1,2})-(?P<d>\d{1,2})\))?'

To get each digit you could use:

values = re.search(p, string).group('a', 'b', 'c', 'd')

Which will return a four element tuple with the values you are looking for (or None if no match was found).


One problem with this pattern is that it will patch the stuff in the parenthesis whether or not there was a match to '6-6'. This one will only match the final parenthesis if 6-6 is matched:

p = r'(?P<a>[0-4]|(?P<tmp_a>6)|7)-(?P<b>(?(tmp_a)(?P<tmp_b>6)|([0-4]|5)))(?(tmp_b) *(\((?P<c>\d{1,2})-(?P<d>\d{1,2})\))?)'


I don't know of any way to look for a difference between the numbers in the parenthesis; regex only knows about strings, not numerical values . . .



(I am assuming python syntax here; the perl syntax is slightly different, though perl supports the python way of doing things.)

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号