I've written a Brainfuck implementation (C++) that works like this:
- Read input brainfuck file
- Do trivial optimizations
- Convert brainfuck to machine code for the VM
- Execute this machine code in the VM
This is开发者_C百科 pretty fast, but the bottleneck is now at the VM. It's written in C++ and reads a token, executes an action (which aren't many at all, if you know Brainfuck) and so on.
What I want to do is strip out the VM and generate native machine code on the fly (so basicly, a JIT compiler). This can easily be a 20x speedup.
This would mean step 3 gets replaced by a JIT compiler and step 4 with the executing of the generated machine code.
I don't know really where to start, so I have a few questions:
- How does this work, how does the generated machine code get executed?
- Are there any C++ libraries for generating native machine code?
Generated machine code is just
jmp
-ed to orcall
-ed as usual function. Sometimes it also needed to disable no-execution flag (NX bit) on memory, containing generated code. In linux, this is done withmprotect(addr, size, PROT_READ | PROT_WRITE | PROT_EXEC.)
In windows the NX is called DEP.There are some... E.g. http://www.gnu.org/software/lightning/ - GNU Lightning (universal) and https://developer.mozilla.org/En/Nanojit - Nanojit, which is used in Firefox JavaScript JIT engines. More powerful and modern JIT is LLVM, you just need to translate BF code into LLVM IR, and then LLVM can do optimisations and code generation for many platforms, or run LLVM IR on interpreter (virtual machine) with JIT capabilities. There is a post about BF & LLVM with complete LLVM JIT compiler for BF http://www.remcobloemen.nl/2010/02/brainfuck-using-llvm/
Another BF +LLVM compiler is here, in the svn of LLVM: https://llvm.org/svn/llvm-project/llvm/trunk/examples/BrainF/BrainF.cpp
LLVM is a complete C++ library (or set of libraries) for generating native code from an intermediate form, complete with documentation and examples, and which has been used to produce JITters.
(It also has a C/C++ compiler which uses the framework - however the framework itself can be used for other languages).
This might be late but for the sake of help to any other i am posting this answer.
JIT compiler has all the steps that AOT compiler has. The main difference is that AOT compiler outputs the machine dependent code to an executable file like exe etc while the JIT compiler loads the machine dependent code into the memory at run time (hence the performance overhead because every time it needs to recompile and load).
How a JIT compiler loads the machine code into the memory at runtime ?
i will not teach you about the machine code because i assume you already know about it,
for eg. assembly code
mov rax,0x1
is translated to
48 c7 c0 01 00 00 00
you dynamically generate translated code and save it into a vector like this (this is a C vector)
vector machineCode{
0x48, 0xc7, 0xc0, 0x01, 0x00, 0x00, 0x00,
}
then you copy this vector into the memory, for this you need to know the memory size required by this code, which u can get by machinecode.size() and keep in mind the page size.
to copy this vector into the memory u need to call mmap function in C. set the pointer to the beginning of your code and call it. u are good to go.
Sorry if anything is not clear, u can always check out this post for the simplicity https://solarianprogrammer.com/2018/01/10/writing-minimal-x86-64-jit-compiler-cpp/ https://github.com/spencertipping/jit-tutorial
GNU Lightning is a set of macros which can generate native code for a few different architectures. You will need a solid understanding of assembly code because your step 3 will involve using Lightning macros to emit machine code directly into a buffer you will later execute.
精彩评论