A Mandatory Example
Consider the following two binary strings:
Which one do you think is easier to remember? Most people would say
that the first one, because there is an obvious and simple rule/method
to generate it. So, instead of remembering the string, we can remember
the rule. One can also say that the first string is simple while the
second one is complex. Kolmogorov complexity is a quantified version of
this intuitive notion.
Let us semi-formalize it. First let us fix a Turing complete
programming language. I will not define what a programming language
is or should be and, frankly, be somewhat vague about
it. Hence the prefix semi in front of formalize.
Though I will make a few assumptions. Programs in our programming
language will be strings from from a fixed alphabet , say, containing ASCII or
UTF-8. I will also assume that each program accepts strings from as input and produce strings
from as output. So, in
particular, we can give a program to another program as input or a
program may produce a program. For ease of reading, I will use when writing a string
from .
One technical assumption which will be important later is that our
programs will use decimal representation for natural numbers. Actually
pretty much any positional system would do, but for the sake of
definiteness I will stick with decimal.
When I write a program, I will use some sort of pseudocode hoping
that there is no ambiguity. Also I will not exclude partial programs
which corresponds do partial functions. For instance is a valid program even though it does not produce
any result as it is always stuck in the while loop.
Now we can define the Kolmogorov complexity of a string from : Let be a string from . The Kolmogorov complexity of
is the length of the
shortest program which generates on any input. It is denoted by
.
How canonical is this
definition?
The definition seems very intuitive, especially after the mandatory
example. However, there is something fishy here. We made several
arbitrary choices yet we called the Kolmogorov
complexity of . It should be
clear that it is easier to generate certain strings in some languages
than others. Indeed, the second string in our example is just the first
few digits of the binary expansion of so it can be easier to generate in a
language designed for numerical computations.
Even worse, for any string , we can very well imagine a
programming language which has as a built-in constant. Thus the
Kolmogorov complexity of a single string seems to very heavily depend on
our language of choice. So does this mean that our definition of is arbitrary? What happens if we change
our programming language?
Let us consider two notions of complexity , corresponding to two different
programming languages and , respectively. Since is Turing complete, we can write a
compiler for in . In other words, can simulate . Let be a string and let . Then there is a program
in the language which witnesses the fact that . That is, has length and generates . Now consider the following
(description of a) program in :
Simulate . The length of this
program will be for some
because the program contains the
string and has length . The part corresponding to is the part that simulates and it does not depend on . By construction, “Simulate ” generates , thus We proved that there is a constant such that for all
. By symmetry, there is a
such that for
all . Combining these two we
obtain This shows that if we consider an infinite family of strings
and consider the asymptotic behavior of Kolmogorov complexity on that
family, then the programming language we choose does not matter.
Obviously, this does not imply that this asymptotic behavior is easy to
determine.
From now on we will stick with a fixed but not explicitly determined
choice of language and denote the complexity function given by that
language by .
At this point I would compute the Kolmogorov complexity of some
concrete strings (or infinite families of strings) but it is tricky. We
can always give an upper bound for Kolmogorov complexity by explicitly
constructing a program and measuring its length but the tricky part is
to show that no shorter program generates the same string. Actually, the
problem is so difficult that there is no general solution. To put it
more concretely, the function is
not computable. The aim of this post is to give a proof of this
result.
Berry Paradox
Before moving on to the actual proof, let us see the underlying idea
of the proof in a linguistic context. Let us consider the following
description of a number:
Let be the natural number
defined by this description. But this description itself has only 87
characters, counting spaces and digits, so has a description with only 87
characters. Contradiction! This argument is known as the Berry paradox.
I will not go into a linguistic or philosophical debate here, there is
already a substantial body of literature on the topic.
Now note that if we start with the following description, we cannot
really say much about the number it describes, nevertheless, we do not
end up with a paradox either:
Thus the number in the description is not arbitrary. Here is a
natural question: What values of
give rise to the Berry paradox? To answer that question we will make the
number a parameter. For a natural
number let be the decimal
expression of as a string. So,
for instance , , etc. Now define be
Clearly a positive natural number gives rise to the Berry paradox if
where denotes the length of . Note that because is simply the number of digits of . So the inequality we should solve is
We can of course find the exact solution set of this
inequality but what is more interesting is that we can immediately tell
that there are solutions. The reason is that the left hand side grows
logartihmically and the right hand side grows linearly and hence the
right hand side should dominate the left hand side for all sufficiently
large values of . Even better, if
we change the phrasing of the condition or express it in a different
language such as french, we would need to solve the equation
for some constant and, by the
same argument, we would have a solution. Thus the Berry paradox is about
exploiting the fact that our measure of complexity can be referred to in
a not so complex way. Note that if we were to use the unary system
instead of decimal, that is if we expressed as -many ’s, then we would have for some and the Berry paradox would not come
up.
An Incomputability Result
Here is the theorem we: There is no computer program which takes
as input and produces as output.
For the proof, we will mimic the Berry paradox.
Suppose that there is such a program . We will obtain a contradiction.
Consider the following program:
Note that we use to check the
condition in the while loop. Also note that the while loop is never
stuck because there is no bound on the Kolmogorov complexity. So this
program produces a string of complexity grater than if its input is .
Now for each we can construct
a program as follows:
Clearly ignores the input
and behaves like with input
. Moreover
for some constant .
Thus, for a sufficiently large we
have .
Let be the string
produced by on an input satisfying . Here comes the
finishing blow: By the construction of , we have . On the other hand also produces (on any input) therefore we must
have .
But these two inequalities imply that . Contradiction!
Let us summarize what we did: We defined a not exactly canonical
notion of complexity which is impossible to compute in practice. So what
can we even do with it? Well, mathematical logic of course! This will be
the topic of the forthcoming
post on Kolmogorov complexity.