Use of Goto within lexer/parser_问答_开发者_运维开发者技术经验分享

I have a lexer/parser pair (which I cribbed off someone else years ago). I am going to be adding a couple of features and thought I would first standardise the use of while(true) containing multiple if/else if/else vs a switch which uses a goto to jump back to before the switch.

(Before the flames start, I don't normally use goto as its evil etc. etc.)

The problem with a while(true) and a nested switch is that the break only breaks out of the switch and cannot get outside the while.

I have done some searching here and seen suggestions to use a return from inside the switch. Whilst this would work in some cases, in others, there is som开发者_StackOverflow社区e processing after the while but before returning. Duplicating this code in multiple places doesn't really appeal.

I could also introduce a boolean flag and use that in the while statement to decide whether to break out of the while but that also doesn't appeal as it adds noise to the code.

The current way in the parser of using if/else if/else instead of an inner switch works but I do have a preference for a switch if possible.

The lexer code in general seems to get around this by removing the while(true) and putting a label just before the switch start and using goto to continue the loop. This leaves break meaning stop the loop and, to be honest, seems the cleanest way but does involve the dreadead goto.

Going back to the while(true), I can also see a third way. Use a label after the while(true) and let the switch code use goto to get to it when the loop should end. Break would then mean exit the switch but continue the loop.

So what are the panels views on this? Is goto too abhorrent to use? Or is it OK when there is just a single label to jump to and reduces indenting and produces otherwise clear code? Should parsers/lexers get special license to use gotos?

I can provide some sample code if it would help.

Use of GOTO in disciplined ways is fine. Languages which don't allow breaks out of arbitrarily nested block structures cause this question to be raised repeatedly, since the 1970s when people beat the question of "what control flow structures should a langauge have" to death. (Note: this complaint isn't special to lexers/parsers).

You don't want the scheme with boolean; it just adds extra overhead to the loop checks and clutters the code.

I think you have this problem:

   <if/while/loop head> {
       <if/while/loop head> {
             ...
                 if <cond>  <want to break out all blocks>
             ...
                            }
                       }

The proper cure with a good language is:

  blocks_label:
  <if/while/loop head> {
       <if/while/loop head> {
             ...
                 if <cond>  exit blocks_label;
             ...
                            }
                       }

if the exit construct exists in your language, that exits the blocks labelled by the named label. (There's no excuse for a modern langauge to not have this, but then, I don't design them).

It is perfectly satisfactory to write, as a poor man's substitute:

   <if/while/loop head> {
       <if/while/loop head> {
             ...
                 if <cond>  goto exit_these_blocks;
             ...
                            }
                       }
   exit_these_blocks:  // my language doesn't have decent block exits

On occasion you'll find a language that offers

break <exp>

where exp is usually a constant whole number, meaning, "break out of exp nested blocks". This is an astoundingly stupid idea, as some poor maintainer may later come along an insert another block somewhere in the stack, and now the code does crazy things. (In fact, this exact mistake in a telco switch took out the entire East Coast phone system about 20 years ago). If you see this construct in your langauge, use the poor man's substitute instead.

Within parsers the use of GOTO is perfectly reasonable. When you get down to a base level, the loops and conditions etc are all implemented as gotos, because that is what processors can do - "take the next instruction to be executed from here".

The only problems with gotos, and the reason they are so often demonised, is that they can be an indication of unstructured code, coming form unstructured thinking. Within modern high level languages, there is no need for gotos, because all of the facilities are available to structure code well, and well structured code implies at least some structured thinking.

So use gotos if they are needed. Don't use them just because you can't be bothered to think things through properly.