Algorithmic complexity

https://arbital.com/p/Kolmogorov_complexity

by Eliezer Yudkowsky May 26 2015 updated Jun 14 2016

When you compress the information, what you are left with determines the complexity.


Algorithmic complexity is a formal measure of the minimum amount of information required to specify some message, binary string, computer program, classification rule, etcetera. The algorithmic complexity or Kolmogorov complexity of a message is the number of 1s and 0s required to specify a Turing machine that reproduces the message.

(The relevant Wikipedia article is filed under Kolmogorov complexity. Not to be confused with "computational complexity", which isn't the same thing at all.)

A string of a million 1s has around as much K-complexity as the number 1,000,000, since getting a Turing machine to print out exactly a million 1s and then stop, is mostly a matter of encoding the number 1,000,000. The number 1,000,000 can itself potentially be compressed further, since it's an unusually simple number of that magnitude: it's 10 to the 6th power, so if we already have a concept of 'exponentiation' or can encode it simply, we just need to encode the numbers 10 and 6, which are quite small.

When we say that a message has high Kolmogorov complexity, we mean that it can't be compressed beyond a certain point (unless the 'compression algorithm' is itself large and contains much of the key information baked in). Things have high Kolmogorov complexity when they're made up of many independent facts that can't be predicted from knowing the previous facts.

Shakespeare's Romeo and Juliet will compress by a lot using simple algorithms, because there are many words used more than once, and some words are much more frequent than others. But we are exceedingly unlikely to find, anywhere in the first trillion Turing machines, any Turing machine that prints out the exact text of this play. 2^40 is greater than a trillion, so if we consider the set of all 40-bit binary strings, it's clear we can't print out all of them exactly using the first trillion Turing machines. Finding a 40-bit Turing machine that printed out the exact text of Romeo and Juliet would be vastly improbable.

On the other hand, it wouldn't be particularly surprising to find a small Turing machine that printed out $~$3\uparrow\uparrow\uparrow3$~$ 1s and then stopped, because the algorithm for this enormous number seems simple, and easy to encode as a computer program or Turing machine.

The algorithmic complexity of a system shouldn't be confused with the total number of visible, complicated-looking details. The Mandelbrot set looks very complicated visually - you can keep zooming in using more and more detail - but there's a very simple rule that generates it, so we say the algorithmic complexity is very low.

As a corollary, a piece of a big-looking object can easily have more Kolmogorov complexity than the whole. If you zoom far down into the Mandelbrot set and isolate a particular piece of the fractal, the information of that image now includes both the Mandelbrot-generating rule and also the exact location of that particular piece.

Similarly, the Earth is much more algorithmically complex than the laws of physics, and if there's a multiverse that developed deterministically out of the laws of physics, the Earth would be much more complex than that multiverse. To print out the whole multiverse, you'd only need to start from the laws of physics and work forward; to print out Earth in particular you'd need a huge number of additional bits to locate Earth inside the multiverse.

Or more simply, we can observe that a program that prints out all possible books in order, is much simpler than a program that prints out only Romeo and Juliet. To put it another way, Borge's "Library of Babel" containing every possible book has far lower algorithmic complexity than an Earth library containing only some books. The moral is that the amount of visible stuff and its seeming surface complexity should not be confused with the notion of algorithmic complexity or theoretical compressibility.


Comments

Lancelot Verinia

It is not clear to me that:

Or more simply, we can observe that a program that prints out all possible books in order, is much simpler than a program that prints out only Romeo and Juliet. To put it another way, Borge's "Library of Babel" containing every possible book has far lower algorithmic complexity than an Earth library containing only some books.

Is true. As far as I can tell, the above statement requires some of the following to be true:

  1. The set of all possible books is something more like the set of all possible strings above a certain length (that the content of those strings be coherent, and constitute legitimate books is not required).
  2. The shortest program encoding Romeo and Juliet in our reference UTM does so by explicitly locating it within the set of all possible books.

I see no reason to assume that 2 is true, and I feel that if 1 is assumed, then it should be stated as such or it would be misleading.