I am implementing a turing machine (think of it as a virtual machine) in Javascript. I'm working on a routine which is to perform the calculations as efficiently as possible (this was not a focus of the project from the outset).
Yes, I should not be thinking about optimizing unless I run into performance issues. But the nature of what I'm doing (where most non-trivial programs have very inefficient asymptotic runtimes) means that there will always be something to gain from optimizing. I want to do the best I can to get as many instructions per second as (reasonably) possible.
The solution is clear if I was programming in C++ for instance. Do some timing. gprof
. -O3
etc. I would be studying the architectures I expect the code to be run on, and probably also peek at the assembly being generated.
Can't do that with javascript, though. My first instinct is to reduce the operations in the inner loop to array lookups. It seems to me that it's a good bet I will be able to take advantage of CPU cache performance if the interpreter is able to translate it to a (hopefully short) series of integer operations.
A turing machine is very simple. It is in fact the most simple formulation of computation that exists (!): It has a finite number of states, a bidirectionally infinite tape, a tape head that can move one unit in either direction, and can read and write a single character to the tape.
A program is encoded in a transition function which takes a state and a character that is read, and with that information provides the character to write, the direction to move the head, and the new state.
This is the logic each step:
// states is an array of arrays of triplets and is the transition func
var trans = states[state][alph_index[tape[pos]]];
tape[cur_tape_pos] = trans[0]; // write
cur_tape_pos += trans[1]; // move
state = trans[2]; // state update
The process happens in a loop. It seems clear to me the tape would be an array. I imagine that storing (appending) values to the end of an array is at least an amortized constant time operation with Javascript arrays. It is much less clear that appending to the front of an array would also have great performance, so I will probably want to use two arrays, one extending left, and one extending right.
Problem is, that would in the naive implementation insert a conditional statement in the inner loop. I'm not liking that. There must already be a conditional check anyway to check if the state is the halting state. So maybe it won't be so bad.
There is also one more potential optimization which could eliminate indexing into alph_index
by storing the index in the alphabet rather than the alphabet value itself on the tape.
But the the main problem is this. What other things could I possibly do to make it faster? Is it possible to make it any faster at all? I don't know what component of the execution will be the bottleneck (CPU or I/O, or something else?) and I don't know how I might go about finding that out. With Javascript I have hash tables at my disposal, too, but it seems like arrays would always be faster.
Perhaps I am prematurely seeking advice. I will come back and edit with performance numbers as I make progress.
As a reward for reading my question I will provide a link to a live work-in-progress version of my project: http://stevenlu.net/tm.html
开发者_运维百科Its operation has so far been to manipulate a div
filled with spans
which represents the tape. It also performs a ton of operations on strings and also does a lot of copying around of elements that is entirely unnecessary where the actual computation of the turing machine is concerned. But even so it achieves decent performance. It took about a minute on my machine to calculate 600,000 or so steps (5^4 = 625), which is 10,000 steps per second. Which isn't so bad, but I know I can probably achieve more than a million per second with some lower level programming.
Looking at benchmark perf here for previous-gen CPU's I'm seeing about 10,000 MIPS per core. I estimate therefore that if I can get my inner loop running once in the time it takes to run 50 Dhrystone iterations (which seems very possible with a simple C implementation even though I have no idea what those synthetic benchmarks actually do), barring memory bandwidth limitations, I've got 200 million iterations per second on one thread. My 600k step calculation would be completed in 3ms!!
Well, if I can get my 5^4 computation to run without the browser reporting to me that it's hung up, I'll be pretty happy...
UPDATE
With a more efficient javascript implementation of the algorithm completed, the computation of 9^4 = 6561
, taking 58202209 steps, took 6173 ms to compute. That's 9.4 million steps per second. Almost a 1,000 fold increase from my original DOM dependent method.
The original 5^4
computation (which took about 30 seconds even without scrolling the tape) is now completed in 84 ms.
With Javascript I have hash tables at my disposal, too, but it seems like arrays would always be faster.
You probably think that "Array" lookup in JavaScript work like this:
[ 1, 2, 3, 4, 5 ]
|------>x
That given the index 2, you'd simply calculate:
int operator[] (int index) {
return start_of_data + (sizeof(data_item) * index);
}
So that your lookup will simply get O(1)
time.
Not so, at least traditionally, in JavaScript. Generally, "Array"s are hash maps with numeric keys. So, every time you're doing a look-up you're actually running your index (which is just treated as a key) through a hash function. So, this:
a[1]
Is more treated like:
a["1"]
this, and you're running through some (probably decently good) hash function to try to make bucket distribution regular and minimize collisions. This sounds expensive to me, but I don't know how optimized implementations are (as hash map look-up is amortized constant time, but it's probably still not as efficient and depends on how many collisions you get and what you do when you encounter one).
Fortunately, some (if not most) modern JavaScript interpreters understand how to work with dense and sparse collections after they've they traced your code to see if you're using it like a sparse or dense array. Double check on the environment you're expecting to use.
What other things could I possibly do to make it faster? Is it possible to make it any faster at all?
One idea I had was to use typed arrays to have a better chance at getting constant time look-up speed. This helped Fabrice Bellard port an entire Linux kernel into the JavaScript / the browser (jslinux.org).
The solution is clear if I was programming in C++ for instance. Do some timing.
I recommend using jsperf.com if you plan to run things in a browser (which it seems, you do) as they have a pretty good Java timer (more fine-grained that anything dealing with time in JavaScript, IIRC) or simply use node.js or Rhino and other command-line profiling tools (you can emulate a DOM in this environment if you really need to...).
If you want to have both +ve and -ve integer indexes, why not use a plain object? Are you using any special array methods? The array methods are mostly generic so can be called using other native objects (may not work with host objects but that's not an issue here I think).
In javascript, arrays are just objects so I can't see why accessing array[i] should be any faster or slower than object[i] (though of course it might be on different implementations). You just need to keep your own length property (perhaps both positive and negative lengths).
If you provide some example code, regardless of performance, you might get some more concrete advice.
精彩评论