开发者

Is it feasible to write a regex that can validate simple math?

开发者 https://www.devze.com 2022-12-20 23:55 出处:网络
I’m using a commercial application that has an option to use RegEx to validate field formatting.Normally this works quite well.However, today I’m faced with validating the following strings:quoted a

I’m using a commercial application that has an option to use RegEx to validate field formatting. Normally this works quite well. However, today I’m faced with validating the following strings: quoted alphanumeric codes with simple arithmetic operators (+-/*). Apparently the issue is sometimes users add additional spaces (e.g. “ FLR01” instead of “FLR01”) or have other typos such as mismatched parenthesis that cause issues with downstream processing.

The first examples all had 5 开发者_运维知识库codes being added:

"FLR01"+"FLR02"+"FLR03"+"FMD01"+"FMR05"

So I started going down the road of matching 5 alphanumeric characters quoted by strings:

"[0-9a-zA-Z]{5}"[+-*/]

However, the formulas quickly got harder and I don’t know how to get around the following complications:

  1. I need to test for one of the four simple math operators (+-*/) between each code, but not after the last one.
  2. There can be any number of codes being added together, not just five as in the example above.
  3. Enclosed parenthesis are okay (“X”+”Y”)/”2”
  4. Mismatched parenthesis are not okay.
  5. No formula (e.g. a blank) is okay.

Valid:

"FLR01"+"FLR02"+"FLR03"+"FMD01"+"FMR05"
"0XT"+"1SEAL"+"1XT"+"23LSL"+"23NBL"  
("LS400"+"LT400")*"LC430"/("EL414"+"EL414R"+"LC407"+"LC407R"+"LC410"+"LC410R"+"LC420"+"LC420R")

Invalid:

" FLR01" +"FLR02"
"FLR01"J"FLR02"
("FLR01"+"FLR02"

Is this not something you can easily do with RegExp? Based on Jeff’s answer to 230517, I suspect I’m failing at least the ‘matched pairing’ issue. Even a partial solution to the problem (e.g. flagging extra spaces, invalid operators) would likely be better than nothing, even if I can't solve the parenthesis issue. Suggestions welcomed!

Thanks,

Stephen


As you are aware you can't check for matching parentheses with regular expressions. You need something more powerful since regexes have no way of remembering state and counting the nested parentheses.

This is a simple enough syntax that you could hand code a simple parser which counts the parentheses, incrementing and decrementing a counter as it goes. You'd simply have to make sure the counter never goes negative.

As for the rest, how about this?

("[0-9a-zA-Z]+"([+\-*/]"[0-9a-zA-Z]+")*)?

You could also use this regular expression to check the parentheses. It wouldn't verify that they're nested properly but it would verify that the open and close parentheses show up in the right places. Add in the counter described above and you'd have a proper validator.

(\(*"[0-9a-zA-Z]+"\)*([+\-*/]\(*"[0-9a-zA-Z]+"\)*)*)?


You can easily use regex's to match your tokens (numbers, operators, etc), but you cannot match balanced parenthesis. This isn't too big of a problem though, as you just need to create a state machine that operates on the tokens you match. If you're not familiar with these, think of it as a flow chart within your program where you keep track of where you are, and where you can go. You can also have a look at the Wikipedia page.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号