I need to validate an expression in c# using regular expression.
The string to be validated has following rules:
- Only three variables can be used in the expression: [chair], [table], [fan] (Please note that square brackets are included.
- Only four mathematical operators may be used: +, -, /, *
- Brackets may be used (we need to ensure that the brackets are placed correctly)
- Float constants m开发者_开发问答ay be used: example, 10.55, 22.23, 3, etc.
The first target is to validate that the string entered follows the above rules.
(Later, we may need to parse and calculate the value as well. But the question is basically about validation regex.)
I like long esoteric regular expressions. They make me feel all warm and fuzzy. However, I don't use them in my projects, because they usually don't belong in production code. I'm going to give you a regex that will do almost everything you are asking, and then I am going to explain why this is not a good idea.
It is based on the fact that you can't (or at least I couldn't think of how to) validate that an expression is correct, but you could use one to match malformed expressions. So here it is:
(\[[a-z0-9]+\]).*?(?(1)(\[[a-z0-9]+\]).*?)(?(2)(\[[a-z0-9]+\]).*?)(?(3)(\[[a-z0-9]+\]).*?)|[/+*.-]{2,}|[^\[\]0-9a-z/+*.-]|(?>[0-9]+(?:\.[0-9]+)*)[^/+*-]|\[[^\]]+(?:\[|$)|\[[^a-z0-9]*[^a-z0-9\]]
Now take another look at that pattern... a really good look. Because once you commit that to your code, you are responsible for it.
Problem #1 - Maintenance
What happens when you change the spec? Let's say you wanted to allow (4) variables or allow boolean operators. How exactly will you alter this pattern? I guarantee that my first attempt at modifying it would be a failure, and this isn't even as complicated as it can get. Still, you should have seen how many times I broke this one when I was writing it.
You could try posting again on SO, and maybe someone could decipher it. Still, you'd have to do quite a bit of testing in order to make sure any revisions are correct. Which leads to...
Problem #2 - Testing
How do you debug any revisions? This is why I said I don't use it in production code. With procedural code, you can set break points and use QuickWatch to figure out what's going on, but regular expressions are like a black box. If there is a bug in this pattern, it will be far more difficult to fix it. And even if you understand completely what is going on, good luck ever being able to hand off this monster to fellow developer.
I'd encourage you to study the expression above, try to figure out how and why it works, even play with modifications. But for your own sanity and that of anyone you work with, do not incorporate this or any of its bastard siblings into your project. aioobe gave you a good place to start looking into the correct way to do this. I'd highly recommend following that path.
3. Brackets may be used (we need to ensure that the brackets are placed correctly)
Keep in mind that the language of words with balanced brackets is not regular.
This means that if you indeed really want to use regular expressions for the task, you'll have to rely on regex-engine specific extensions.
Basically, regular expressions are not a good tool for this task. You should have a look at context free grammars, and presumably parser-generators which can produce parsers for C#.
Further SO reading:
- What is a good C# compiler-compiler/parser generator?
精彩评论