I need to validate the contents of a C# method.
I do not care about syntax errors that do not affect the method's scope.
I do care about characters that will invalidate parsing of the rest of the code. For example:
method()
{
/* valid comment */
/* <-- bad
for (i..) {
}
for (i..) { <-- bad
开发者_运维百科}
I need to validate/fix any non-paired characters.
This includeds /* */, { }, and maybe others.
How should I go about this?
My first thought was Regex, but that clearly isn't going to get the job done.
You'll need to scope your problem more carefully in order to get a sensible answer.
For example, what are you going to do about methods that contain preprocessor directives?
void M()
{
#if FOO
for(foo;bar;blah) {
#else
while(abc) {
#endif
Blah();
}
}
This is silly but legal, so you have to handle it. Are you going to count that as a mismatched brace or not?
Can you provide a detailed specification of exactly what you want to determine? As we've seen several times on this site, people cannot successfully build a routine that divides two numbers without a specification. You're talking about analysis that is far more complex than dividing two numbers; the code which does what you're describing in the actual compiler is tens of thousands of lines long.
A regex is certainly not the answer to this problem. Regex's are useful tools for certain types of data validation. But once you get into the business of more complicated data like matching braces or comment blocks a regex no longer gets the job done.
Here is a blog article on the limitations encountered when using a regex to validate input.
- http://blogs.msdn.com/ianhu/archive/2009/11/16/intellitrace-itrace-files.aspx
In order to do this you will have to write a parser of sorts which does the validation.
A regular expression isn't a very convenient thing for such a task. This is often implemented using a stack with an algorithm like the following:
- Create an empty stack S.
- While( there are characters left ){
- Read a character ch.
- If is ch an opening paren (of any kind), push it onto S
- Else
- If ch is a closing paren (of any kind), look at the top of S.
- If S is empty as this point, report failure.
- If the top of S is the opening paren that corresponds to c, then pop S and continue to 1, this paren matches OK.
- Else report failure.
- If at the end of input the stack S is not empty, return failure. Else return success.
for more information check http://www.ccs.neu.edu/home/sbratus/com1101/lab4.html and http://codeidol.com/csharp/csharpckbk2/Data-Structures-and-Algorithms/Determining-Where-Characters-or-Strings-Do-Not-Balance/
If you're trying to "validate" the contents of a string defining a method, then you may be better off just trying to use the CodeDom classes and compile the method on the fly into an in memory assembly.
Writing your own fully-functional parser to do validation will be very, very difficult, especially if you want to support C# 3 or later. Lambda expressions and other constructs like that will be very difficult to "validate" cleanly.
You're drawing a false dichotomy between "characters that will invalidating parsing the rest of the code" and "syntax errors". Lacking a closing curly brace (one of the problems you mention) is a syntax error. It looks like you mean you're looking for syntax errors that potentially break scope boundaries? Unfortunately, there's no robust way to do this short of using a full parser.
As an example:
method()
{ <-- is missing closing brace
/* valid comment */
/* <-- bad
for (i..) {
}
for (i..) {
} <-- will be interpreted as the closing brace for the for loop
There's no general, practical way to infer that it's the for loop that's missing its closing brace, rather than the method.
If you're really interested in looking for these sort of things, you should consider running the compiler programmatically and parsing the results - that's the best approach with the lowest entry threshold.
精彩评论