Implementing Brainfuck loops in an interpreter_问答_开发者

I want to build a Brainfuck (Damn that name) interpreter in my freshly created programming language to prove it's turing-completeness.

Now, everything is clear so far (<>+-,.) - except one thing: The loops ([]). I assume that you know the (extremely hard) BF syntax from here on:

How do I implement the BF loops in my interpreter?

How could the pseudocode look like? What should I do when the interpreter reaches a loop beginning ([) or a loop end (])?

Checking if the loop should continue or stop is not the开发者_运维问答 problem (current cell==0), but:

When and where do I have to check?
How to know where the loop beginning is located?
How to handle nested loops?

As loops can be nested I suppose that I can't just use a variable containing the starting position of the current loop.

I've seen very small BF interpreters implemented in various languages, I wonder how they managed to get the loops working but can't figure it out.

When you reach [, you test the data pointer.

If it's false, you can scan for the next matched ] character, counting up how many [ you see and making sure you mark them off as you see each ].

If it's true, you need to keep track of its position so you can jump back to it later. I suggest using a stack. Push the current program position onto the stack, then when you reach ], test the data pointer. If it's true, go to the topmost program position on the stack. If it's false, pop the position off the stack and continue.

As you nest into inner loops, the stack will cleanly record the context of each loop.

See stack (wikipedia). This is analogous to how assembly programs deal with function calls.

Here is my "optimizing" version of interpreter, that pre-compiles the loop jumps.

def interpret2(code):
    data = [0] * 5000   # data memory
    cp = 0              # code pointer
    dp = 0              # data pointer
    # pre-compile a jump table
    stack = []
    jump = [None] * len(code)
    for i,o in enumerate(code):
        if o=='[':
            stack.append(i)
        elif o==']':
            jump[i] = stack.pop()
            jump[jump[i]] = i
    # execute
    while cp < len(code):
        cmd = code[cp]
        if   cmd == '>': dp += 1
        elif cmd == '<': dp -= 1
        elif cmd == '+': data[dp] += 1 
        elif cmd == '-': data[dp] -= 1 
        elif cmd == '.': stdout.write(chr(data[dp]))
        elif cmd == ',': data[dp] = ord(stdin.read(1))
        elif cmd == '[' and not data[dp]: # skip loop if ==0
            cp = jump[cp]
        elif cmd == ']' and data[dp]:     # loop back if !=0
            cp = jump[cp]
        cp += 1

It does a dry run of the code, keeping track of the brackets (in a stack) and marks the goto addresses in parallel jump array which is later consulted during execution.

I compared the execution speed on long-running BF program (calculate N digits of Pi) and this increased the speed 2x compared to an innocent implementation in which source is scanned forward to exit [ and scanned backwards to loop on ].

How do I implement the BF loops in my interpreter?

That’s the point – it entirely depends on your language. For the case of a stack-based programming language (or any language which can use a stack), @rjh has given a good solution. Other languages would use different solutions, such as recursion (i.e. implicit use of a stack).

From the top of my head, might be some errors in it, but something like this should work:

char* interpret(char* instructions){
  char* c = instructions;
  while (*c) {
    if (*c == ".") putchar(*p);
    else if (*c == ",") *p = getchar();
    else if (*c == '+') (*p)++;
    else if (*c == '-') (*p)--;
    else if (*c == '<') p--;
    else if (*c == '>') p++;
    else if (*c == '[') c = interpret(c+1);
    else if (*c == ']') { if (*p) c = instructions else return c; }
    c++;
  }
  return 0;
}

Call interpret with the brainf*ck source code. The pointer p is to the current memory position. Do a recursive call when spotting a [. Return from this recursive call when encountering a ].

I prefer to use a jump table (along with conversion of raw BF to 'bytecode'). Optimizing BF interpreters are clearly the way to go!