## What Is The Purpose of Science? Algorithm Discovery

Consider the trial of Amanda Knox. What’s the purpose of the legal process here?

Well, let’s think about. Here’s how a trial works (at least on television): the prosecution and the defense get up in front of the jury. They present evidence — it could be DNA, surveillance videos, witness testimony, or even a tic-tac-toe playing chicken. Closing arguments follow. Then the jury deliberates and returns a verdict.

Now, the purpose of all this evidence is ostensibly to get at the truth. To figure out what it is that really happened. Did Amanda Knox kill Meredith Kercher? Or not?

We can visualize the jury, then, as a sort of machine. It takes in evidence and then applies that evidence to update two competing hypotheses: guilty or not guilty. At the end of the trial, the jury spits out its verdict.

Science works in the same manner.

## What’s a hypothesis?

Okay, I haven’t been entirely honest. A jury doesn’t have just two hypotheses floating around in its collective head. There are a bunch of different possible explanations. When they consider the most likely explanation (“someone else did it”), they decide guilty or not guilty based on that. So in that box above with the G and NG for guilty or not guilty, it really ought to contain all possible explanations.

What are these explanations, really? They’re scenarios which could have produced the evidence. Amanda Knox murdering Meredith Kercher is one possible scenario. Rudy Guédé murdering her is another, or maybe Raffaele Sollecito did it. Or maybe it was aliens or a government conspiracy.

But what’s a scenario here, really? Consider the plight of physicists. They’re trying to uncover the underlying laws of the universe. They look at the world as it is — the evidence — and ask, “What underlying structure produced this?” Much like a paleontologist who carefully brushes away dirt to reveal a fossil.

Now, what’s a structure that produces data, evidence? An algorithm! Physicists are seeking not the laws of the universe, but the algorithm of the universe — what produced it all.

We can think, then, of science as the process of collecting evidence and then updating the likelihood of possible algorithms that might have produced it. Science is the process of algorithm discovery.

Here the colored circles are algorithms (hypotheses) and their size is their likelihood.

## Creativity, Fan Fiction, and Compression

I’ve written before about the relationship between creativity and compressibility. To recap, a creative work is one that violates expectations, while a compressible statement is one that’s expected.

For instance, consider two sentences:

• Where there’s a will, there’s a way.
• Where there’s a will, there’s a family fighting over it.

I suspect you find the second more creative.

Three more examples of creative sentences:

• When I was a kid, my parents moved a lot. But I always found them.
• Dad always said laughter is the best medicine, which is why several of us died of tuberculosis.
• A girl phoned me the other day and said, “Come on over, there’s nobody home.” I went over. Nobody was home.

Given that less predictable sentences are more creative, and less predictable sentences are less compressible, creative works ought to be less compressible than non-creative ones. And, indeed, I found some evidence for this in a previous experiment.

But that was not too compelling as it compared technical, repetitive works to novels. This time, I decided to compare very creative writing to normal creative writing.

## Methods

The idea then is to compare the compressibility of amateur creative writing with that of experts. To accomplish this, I took 95 of the top 100 most downloaded works from Project Gutenberg. I figure that these count as very creative works given that they’re still popular now, ~100 years later. For amateur writing, I downloaded 107 fanfiction novels listed as “extraordinary” from fanfiction.net.

I then selected the strongest open source text compression algorithm, as ranked by Matt Mahoney’s compression benchmarkpaq8pxd. I ran each work through the strongest level of compression, and then compared the ratio of compressed to uncompressed space for each work.

## Analysis and Results

I plotted the data and examined the outliers, which turned out to be compressed files that my script incorrectly grabbed from Project Gutenberg. I removed these from the analysis, and produced this:

Here the red dots are fanfiction novels, while the blue-ish ones are classic works of literature. If the hypothesis were true, we’d expect them to fall into distinct clusters. They don’t.

Comparing compressibility alone produces this:

Again, no clear grouping.

Finally, I applied a Student’s t test to the data, which should tell us if the two data sets are distinguishable mathematically. Based on the graphs, intuition says it won’t, and indeed it doesn’t:

The p-value here is 0.1755, which is not statistical significance. The code and data necessary to reproduce this are available on GitHub.

## Discussion

I must admit a certain amount of disappointment that we weren’t able to distinguish between literature and fanfiction by compressiblity. That would have been pretty neat.

So, what does this failure mean? There at least six hypothesis that get a boost based on this evidence:

• Creativity and compression are unrelated.
• A view of humans as compressors is wrong.
• Human compression algorithms (the mind) and machine compression algorithms are distinct to the point where one cannot act as a proxy for the other.
• Compression algorithms are still too crude to detect subtle differences.
• Fanfiction is as creative as literature.

And so on and, of course, it’s possible that I messed up the analysis somewhere.

Of all of these, my preferred explanation is that compression technology (and hardware) are not yet good enough. Consider, again, the difference between a creative and a not-creative sentence:

• Honesty is the best policy.
• I want to die peacefully in my sleep, like my grandfather… not screaming and
yelling like the passengers in his car.

The first is boring, right? Why? Because we’ve heard it before. It’s compressible — but how’s a compression algorithm supposed to know that? Well, maybe if we trained it on a corpus of the English language, gave it the sort of experience that we have, then it might be able to identify a cliche.

But that’s not how compression works right now. I mean, sure, some have certain models of language, but nothing approaching the memory that a human has, which is where “human as computer compression algorithm” breaks down. Even with the right algorithm — maybe we already know it — the hardware isn’t there.

Scientific American estimates that the brain has a storage capacity of about 2.5 petabytes, which is sort of hand-wavy and I’d bet on the high side, but every estimate I’ve seen puts the brain at more than 4 gigabytes, by at least a couple orders of magnitude. I don’t know of any compressors that use memory anywhere near that, and certainly none that use anything like 2.5 petabytes. At the very least, we’re limited by hardware here.

But don’t just listen to me. Make up your own mind.

• The idea that kicked off this whole line of inquiry is Jürgen Schmidhuber’s theory of creativity, whichI’ve written up. If you prefer, here’s the man himself giving a talk on the subject.
• To reproduce what I’ve done here, everything is on GitHub. That repository is also a good place to download the top 100 Gutenberg novels in text form, as rate-limiting makes scraping them a multi-day affair.
• I similarly compared technical writing and creative writing in this post and did find that technical writing was more compressible.
• For an introduction to data compression algorithms, try this video.
• Check out the Hutter Prize, which emphasizes the connection between progress in compression and artificial intelligence.
• For a ranking of compressors, try Matt Mahoney’s large text compression benchmark. He’s also written a data compression tutorial.

## What Are Quantum Computers Used For?

The literal answer to the question, “What are quantum computers used for?” is well, nothing, since we can’t build them yet — but maybe you want to know the answer to the question, “What will quantum computers be used for?”

That I can answer.

There is a lot of magical thinking around quantum computers. That they’re the next big thing in computing, that they’ll replace classical computers, that they’ll be impossibly fast and small. There’s at least one quantum physicist who doesn’t know any better. Most problems receive no sort of “quantum speedup.”

Here’s what Scott Aaronson, a theoretical computer science guy who works on quantum computing at MIT, has to say:

To be clear, I think it’s entirely possible that I’ll see practical quantum computers in my lifetime (and also possible, of course, that I won’t see them). And if we do get scalable, universal quantum computers, then they’ll almost certainly find real applications (not even counting codebreaking): mostly, I think, for specialized tasks like quantum simulation, but to a lesser extent for solving combinatorial optimization problems.

Sure, quantum computing will be useful, but no, they won’t enable us to replace classical architecture machines with fingernails that — I don’t know — pass the Turing test.

Here’s Aaronson himself in Scientific American with a bit more on quantum computers for simulation:

If quantum computers ever become a reality, the “killer app” for them will most likely not be code breaking but rather something so obvious it is rarely even mentioned: simulating quantum physics. This is a fundamental problem for chemistry, nanotechnology and other fields, important enough that Nobel Prizes have been awarded even for partial progress.

## Problems fit for a quantum computer

On what sort of specific problems quantum computers will be used for, well, that’s a complexity class: BQP. A quantum computer — should they prove physically realizable — can outperform a classical computer at problems outside of P (another complexity class) but in BQP.

The main suspected problems in that class are:

• Integer factorization (important for crypto)
• Quantum-level simulations of physical stuff.

So, you know, what Aaronson already told us.

So, classical machines are here to stay. Quantum computers are not a silver bullet. Hard problems remain hard, or as Aaronson puts it (again from his Scientific American piece):

I predict that the hardness of NP-complete problems will someday be seen the same way: as a fundamental principle that describes part of the essential nature of our universe. There is no way of telling what theoretical enlightenment or what practical consequences might come from future application of this kind of fundamental principle.

• Scott has a book, Quantum Computing Since Democritus. It’s excellent. I recommend it. You should buy a copy.

## The Ultimate Guide to Simulated Annealing

Imagine that you’re approached by the Greek goddess of discord, Eris and, given that Eris is a cruel goddess, she places you into the mathematical space above. She promises, “If you climb to the highest point, I will release you.”

This would not be too difficult except, like most encounters with Greek gods, there’s a catch. The goddess, in her wickedness, has blinded you. You can only tell whether or not a step will take you upwards or downwards. How might you find the highest peak? You think for a while and decide to climb upwards. From any position, you choose the direction that will increase your elevation.

So you climb and you get to the top of this hill. Nothing happens. What gives? You’ve climbed the top of a peak, but you haven’t made it to the top of the highest peak.

You’re going to have to modify your strategy. There are a couple options. You might stumble around at random. You’d make it to the top eventually, but it would take an eternity, most of which would be wasted exploring areas that you’d seen before.

Instead, you could adopt a systematic approach to exploring the landscape, which is much more efficient. With a memory better than mine, you could keep track of where you’ve visited before and focus on exploring those areas you haven’t. You’d start by checking out one mountain, and then the three surrounding, and then the nine surrounding those three, and so on until you’d found the highest point.

This would work, unless Eris has placed you in an infinite space where the elevation increases forever. If you imagine the edges extending forever, such a space would look like this:

In such a space, Eris is twisted indeed, as there is no way to reach the peak of an infinite climb. Let’s imagine that you toil away at this Sisyphean task for a few billion years until one of the gods, Clementia, takes pity on you. She offers you a new deal, telling you, “I will transport you to a new, finite space. If you can reach one of the highest peaks with sufficient haste, I will free you.” You accept the deal and are teleported here:

This space features a number of valleys filled with sub-optimal plateaus. If you use your original “upwards climb” strategy, you may find yourself stuck for eternity. Alternatively, you could try searching the entire space, but then you run the risk of violating Clementia’s “sufficient haste” clause.

The trick is to modify the initial strategy to sometimes accept a downwards move, which will help prevent you from getting stuck at a sub-optimal plateau. Such a path might look like this:

Still, this is not quite right, as you can see from the path above. The problem is that your strategy never terminates. You quickly reach a high point, but then are forced to accept sub-optimal moves as those are the only possible moves.

To get around this, you need to modify your strategy such that you’re willing to accept a lot of bad moves early on, but fewer and fewer with time, until you eventually will accept no sub-optimal moves. This strategy will eventually reach a plateau.

That’s more like it. With this strategy, you begin to climb. Clementia frees you and you live happily ever after (or at least until Eris decides to visit you again.)

Except for formalizing the details, this is the basic intuition behind simulated annealing, which Wikipedia calls a “generic probabilistic metaheuristic for the global optimization problem of locating a good approximation to the global optimum of a given function in a large search space.” I’m not convinced such a sentence was written with the realization that humans will have to read it. In English, simulated annealing is a method for finding an approximation of the highest (or lowest) point in a space, like the one above.

## Shaking Intuition

Simulated annealing can be understood in terms of body language. It can be felt as motion. You can think of simulated annealing as shaking.

Imagine that you’re holding onto one of the spaces above. (Masterfully illustrated below.) You place a ping pong ball into the space, with the goal of moving the ball to the lowest place possible. The ball will naturally roll downwards thanks to gravity, but sometimes it will get stuck. When it gets stuck, the natural response is to shake the space, dislodging the ping pong ball, allowing it to continue rolling downwards.

There you have it. That’s the core of simulated annealing. You could have invented it and, now, if you ever do come across Eris, you’ll be prepared. (On second thought, if you ever come across Eris and are teleported to a mathematical space, see a psychiatrist.)

• The early strategies mentioned are real search algorithms. The “just climb upwards” algorithm is aptly named hill-climbing. The random exploration method is known as a random walk and the “systematic exploration approach” described is breadth-first search. Its cousin, depth-first search, would have worked equally well.
• For more on simulated annealing, try this paper. For different interesting search algorithms, check out STAGE and beam search.
• Simulated annealing is a heuristic search algorithm, meaning that it attempts to find a “close enough” solution. This makes it well suited for otherwise intractable problems, such as those in NP. There’s some discussion of applications here.
• Simulated annealing was inspired by the natural process of annealing in metallurgy. It’s one of a class of algorithms inspired by nature. Scott Aaronson writes about the relationship between nature and complexity in this paper.
• In The Algorithm Design Manual (recommended), the author writes (of heuristic search algorithms) , “I find [simulated annealing] to be the most reliable method to apply in practice.”

## Relationship Between Incompleteness and The Halting Problem

Yesterday, while googling around for information on hyperoperations, I came across Scott Aaronson’s essay, “Who can name the bigger number?” You should go read it. I’ll wait.

On the halting problem:

The proof is a beautiful example of self-reference. It formalizes an old argument about why you can never have perfect introspection: because if you could, then you could determine what you were going to do ten seconds from now, and then do something else. Turing imagined that there was a special machine that could solve the Halting Problem. Then he showed how we could have this machine analyze itself, in such a way that it has to halt if it runs forever, and run forever if it halts. Like a hound that finally catches its tail and devours itself, the mythical machine vanishes in a fury of contradiction.

What if we could solve the halting problem? On hypercomputation:

Suppose we could endow a Turing machine with a magical ability to solve the Halting Problem. What would we get? We’d get a ‘super Turing machine’: one with abilities beyond those of any ordinary machine. But now, how hard is it to decide whether a super machine halts? Hmm. It turns out that not even super machines can solve this ‘super Halting Problem’, for the same reason that ordinary machines can’t solve the ordinary Halting Problem. To solve the Halting Problem for super machines, we’d need an even more powerful machine: a ‘super duper machine.’ And to solve the Halting Problem for super duper machines, we’d need a ‘super duper pooper machine.’ And so on endlessly.

Wikipedia’s hypercomputation page is silent on the super Halting Problem. Is this the work of the hypercomputation Illuminati, trying to suppress an inconvenient truth?

Then it occurred to me: the structure of the super Halting Problem is virtually identical to the “Contracrostipunctus” dialogue in [Gödel, Escher, Bach] (very recommended, buy a copy), in which the Tortoise tells Achilles about his collection of phonograph-destroying records. The super Halting Problem can be thought of as a record that cannot be played on a super Turing machine, which makes one wonder: what is the relationship between the incompleteness theorems and the halting problem?

My understanding of it is this: Gödel’s incompleteness theorems state that for any sufficiently powerful formal system, there are assertions that cannot be proven either true or false. Concretely, perhaps all the recent work on the twin prime conjecture has been in vain and it is unprovable. One wonders, then: given some proposition, can you decide in advance whether or not it is provable? If we could decide in advance, we could avoid wasting a whole lot of time trying to solve something that cannot be solved.

At this point, we need to step back and consider just how one goes about proving something. You start with a proposition, carry out some operations, and then voila!, it is proved. The difference between a provable statement and an unprovable statement is the number of steps we have to carry out in order to prove it. If a proposition is provable, it can be proven in a finite number of steps. If not, you can use as many steps as you like and you will never prove it.

This is the halting problem!