Kolmogorov Complexity Approximation Algorithm_问答_开发者

I'm looking for a algorithm that can compute an approximation of the Kolmogorov complexity of given input string. So if K is the Kolmogorov complexity of a st开发者_运维技巧ring S, and t represents time, then the function would behave something like this.. limit(t->inf)[K_approx(t,S)] = K.

In theory, a program could converge on the Kolmogorov complexity of its input string as the running time approaches infinity. It could work by running every possible program in parallel that is the length of the input string or shorter. When a program of a given length is found, that length is identified as the minimum length known for now, is printed, and no more programs >= that length are tried. This algorithm will (most likely) run forever, printing shorter and shorter lengths, converging on the exact Kolmogorov complexity given infinite time.

Of course, running an exponential number of programs is highly intractible. A more efficient algorithm is to post a code golf on StackOverflow. A few drawbacks:

It can take a few days before good results are found.
It uses vast amounts of our most valuable computing resources, costing thousands of dollars in productivity loss.
Results are produced with less frequency over time as resources are diverted to other computations.
The algorithm terminates prematurely for many inputs, meaning it does not work in general.

The wikipedia page for Kolmogorov complexity has a subsection entitled "Incomputability of Kolmogorov complexity", under the "Basic results" section. This is not intended to be a basic measure that you can compute, or even approximate productively.

There are better ways of achieving what you want, without a doubt. If a measure of randomness is what you want, you could try the binary entropy function. Compressibility by one of the standard algorithms might also fit the bill.

I think this might work? If somebody sees an error, please point it out.

function KApprox(S:string,t:integer,TapeSizeMax:integer) : Turing Machine of size k
  begin

    // An abstract data type that represents a turing machine of size k
    var TM(k:integer) : Turing Machine of size k;
    var TMSmallest(k:integer) : Turing Machine of size k;  

    var j : integer;
    var i : integer;

    for (j = t to 0 step -1) // reduce the time counter by 1
      begin
       for (i = TMax to 1 step -1) // go to the next smaller size of TM
         begin
          foreach (TM(i)) // enumerate each TM of size i
             begin 
               if (TM(i).halt(TapeSizeMax) == true) and (TM(i).output() == S) then
                 begin
                   if (sizeof(TM(i)) < sizeof(TMSmallest(i))) then
                      TMSmallest(i): = TM(i);
                 end;
             end;
         end;
      end;      
    return TMSmallest;
 end;

It looks like Ray Solomonoff did a lot of work in this field.

Publications of Ray Solomonoff

Inductive Inference Theory - A Unified Approach to Problems in Pattern Recognition and Artificial Intelligence.

Does Algorithmic Probability Solve the Problem of Induction?

The first issue that I notice is that "the Kolmogorov Complexity" isn't well defined. It depends to some degree on the choice of how to represent programs. So, the first thing you would need to do is fix some encoding of programs (for example, Joey Adams' specification that programs be written in J).

Once you have the encoding, the algorithm you are looking for is quite simple. See Joey's answer for that.

But the situation is even worse than having to run exponentially many programs. Each of those programs could run as long as you could possibly imagine (technically: running time as a function input size could grow faster than any recursive function). What's more, it could be the case that some of the shortest programs are the ones that run the longest. So while the parallel approach will approach the correct value as time goes to infinity, it will do so unimaginably slowly.

You could stop the program prematurely, figuring that the approximation at that point is good enough. However, you have no idea in general how good that approximation is. In fact, there are theorems that show you can never know.

So the short answer is "easy, just use Joey's algorithm", but by any measure of practicality, the answer is, "you don't have a chance". As has been recommended by rwong, you are better off just using a heavy-duty compression algorithm.