Why Category Theory Matters

I hope most mathematicians continue to fear and despise category theory, so I can continue to maintain a certain advantage over them.
—John Baez


The above is a graph of the number of times the phrase “category theory” has been used in books, from about 1950 through the present. It speaks for itself.

But why? What’s the big deal? Why does category theory matter? I’m about a quarter of the way through Conceptual Mathematics: A First Introduction to Categories and still not sure why I’m bothering with fleshing out all this theory. Is this just set theory for hipsters?

What category theory is about

Category theory is, essentially, the study of mathematical structure. It’s the study of things and the mappings between those things, the translations of these objects. These are usually called objects and morphisms (or arrows, if you prefer). Objects can be thought of as sets and arrows as functions, though they are not limited to this interpretation.


The subject’s major insight is, in order to understand something, focus on the structure preserving mappings of that something — the legal translations.

What the excitement is about

The vast applicability and expressiveness of category theory leads to the observation that most structures in mathematics are best understood from a category theoretic or higher category theoretic viewpoint.

Category theory is one of, if not the most, abstract fields of mathematics. It’s even been dubbed, as one might tease a younger sibling, “abstract nonsense.” After all, the field throws out all the specific properties of objects and instead focuses solely on their translations.

This extreme generality of category theory means that it can say something about anything, but nothing too specific. In other words, part of the growth of category is probably because you can use it to talk about damn near anything. (See the applications below for examples.)

In this respect, category theory is like set theory. The popularity of set theory is a result of the fact that, hey, it’s a pretty good language for talking about a lot of different types of mathematics. Most things can be formalized as a combination of sets and first order logic, and it’s not that unnatural to think in terms of sets so, bam, popularity.

In Categories for Software Engineering, the authors put it this way: “The way we like to present category theory is as a toolbox similar to set theory: as a kind of mathematical lingua franca in the sense that it can be used for formalizing concepts that arise in our day-to-day activity.”

This generality mirrors the difference between strong and weak methods in artificial intelligence. General methods, while widely applicable, don’t typically scale up to hard problems. Instead, specialized tools are necessary. In the same way, category theory is more of a tool for elucidating connections between mathematical structures than for solving problems — in contrast with something like linear algebra, or really any field of applied math.

Benefits of category theory over set theory

God made the integers, all the rest is the work of man.
—Leopold Kronecker

To be honest, I don’t like set theory. It’s artificial — the axioms aren’t obviously true, but rather the product of a search for a paradox-free foundation for mathematics. It’s sort of complicated, maybe not at the lowest levels, but definitely once you try to build up something like the real numbers. (A 2008 issue of the AMS reports, “…to expand the definition of the number 1 fully in terms of Bourbaki primitives requires over 4 trillion symbols.”)

The whole enterprise is bizarre. As humans, we didn’t start out with sets and then build out mathematics. No, the Egyptians did arithmetic and some algebra. (The oldest extant mathematical records deal with the Pythagorean theorem.) Animals have some notion of magnitude and many can even count. Set theory, rather than a natural extension of mathematical enterprise, seems more like something forced — the difference between English and Esperanto.

As far as I can tell, the mathematical community agrees with me. Paul Cohen is the only person to ever win a Fields medal for work on foundations and today “foundations of mathematics” is code not for mathematics, but philosophy.

So, immediately, category theory has an advantage over set theory in that it’s a less artificial construction, given that it stems directly from work in algebraic topology.

But, beyond that, is their anything else exciting about category theory? The main draw is its ability to connect otherwise disparate fields, a sort of skeleton to hang other knowledge on. Mike Stay and John Baez write about this in their “Physics, topology, logic and computation: a Rosetta Stone”, where they use category theoretic constructs to speak about the similarities between — you guessed it — physics, topology, logic, and computation.

Jocelyn Ireson-Paine puts it this way, “category theory is a great source of unifying concepts and organizing principles.” This is the benefit of all the abstraction — by throwing away all the details, an object’s structure reveals itself.

As a concrete example, consider one of the most profound mathematical achievements, Descartes’s discovery of analytic geometry — the realization that geometry can be translated into cartesian coordinates and, thus, the power of algebra can be brought to bear on the subject.

With category theory, this discovery can be expressed in what has to be one of the most satisfying formulas of all time:

\( P \xrightarrow{\quad f \quad} \mathbb{R}^{2} \)

Applications of category theory

The above is nice and all, but it’s still just sort of hey-take-my-word-for-it, which is not so satisfying. Here are some actual examples:

I will leave you with the following:

[Category theory] does not itself solve hard problems in topology or algebra. It clears away tangled multitudes of individually trivial problems. It puts the hard problems in clear relief and makes their solution possible.
“The Last Mathematician from Hilbert’s Gottingen”

Further Reading

Math Jokes

The AMS has a 2005 paper “Foolproof: A Sampling of Mathematical Folk Humor” which is — delightfully — filled with math jokes. Excerpts:

Q: What’s sour, yellow, and equivalent to the Axiom of Choice?
A: Zorn’s Lemon.

Q: What is a topologist?
A: Someone who cannot distinguish between a doughnut and a coffee cup.

Theorem. All positive integers are interesting.
Proof. Assume the contrary. Then there is a lowest noninteresting positive integer. But, hey, that’s pretty interesting! A contradiction.

Q: How many light bulbs does it take to change a light bulb?
A: Just one, if it knows its Gödel number.

One day a farmer called up an engineer, a physicist, and a mathematician and asked them to fence in the largest possible area with the least amount of fence. The engineer made the fence in a circle and proclaimed that he had the most efficient design. The physicist made a long, straight line and proclaimed “We can assume the length is infinite…” and pointed out that fencing off half of the Earth was certainly a more efficient way to do it. The mathematician just laughed at them. He built a tiny fence around himself and said, “I declare myself to be on the outside.”

One day the Wiener family was scheduled to move into a new house. Mrs. Wiener, mindful of her husband’s propensity for forgetting, wrote the new address on a slip of paper and handed it to him. He scoffed, saying, “I wouldn’t forget such an important thing,” but he took the slip of paper and put it in his pocket. Later that same day at the university a colleague came by his office with an interesting problem. Wiener searched for a piece of paper and took the slip from his pocket to use to write some mathematical equations. When he finished, he crumpled up the slip of paper and threw it away. That evening, he remembered there was something about a new house but he couldn’t find the slip of paper with the address on it. Without any alternative course of action, he returned to his old home, where he spotted a little girl on the sidewalk. “Say, little girl,” he said, “Do you know where the Wieners live?” The girl replied, “That’s o.k., Daddy, Mommy sent me to get you.”

The full paper is here. For still more jokes, see this MathOverflow question and this page.

Proofs In Math: What’s The Point?

Tyler Cowen pointed me to an article on automated theorem proving. Namely, a computer “has solved the longstanding Erdős discrepancy problem!” This would not be such a big deal, except the 13 gigabyte proof is too complicated for anyone to understand.

So, of course, the Boeotians are in rare form as the clack of keyboards fills the metaphorical air in an attempt to sate the internet’s endless appetite for stupidity. Or: people are saying some really dumb stuff.

The discussions tend to anchor around trust. Can we trust the computer? What if there was a bug in the software? Gil Kalai, a math professor at the Hebrew University of Jerusalem, said, “I’m not concerned by the fact that no human mathematician can check this, because we can check it with other computer approaches.”

The mistake here is to assume that mathematics is about proof. It’s not. Proof is a means to producing insight, and a proof that no one can understand is close to worthless.

But don’t listen to me. Take it from Fields medalist Bill Thurston. He wrote a paper “On Proof and Progress in Mathematics.” It has profoundly influenced my ideas about what intellectual enterprise and mathematics particularly are about.

We are not trying to meet some abstract production quota of definitions, theorems and proofs. The measure of our success is whether what we do enables people to understand and think more clearly and effectively about mathematics.

In not too many years, I expect that we will have interactive computer programs that can help people compile significant chunks of formally complete and correct mathematics (based on a few perhaps shaky but at least explicit assumptions), and that they will become part of the standard mathematician’s working environment. However, we should recognize that the humanly understandable and humanly checkable proofs that we actually do are what is most important to us, and that they are quite different from formal proofs.

When I started working on foliations, I had the conception that what people wanted was to know the answers. I thought that what they sought was a collection of powerful proven theorems that might be applied to answer further mathematical questions. But that’s only one part of the story. More than the knowledge, people want personal understanding.

Or as Richard Hamming put it, “The purpose of computing is insight, not numbers.”

The Stable Marriage Problem Explained

You are out in a thunderstorm. You look up, at the rolling thunderheads painting the sky, and wonder, “Why am I here? What’s the point of all of this? What difference can I make in a world of 7 billion?”

Your weekly scheduled existential angst is interrupted by a flash. It’s lightning, six-ish bolts. The yellow branches cross and, for a moment, spell out your name. “What are the odds?” you wonder.

It’s not a sign, though. You’re not falling for that one. Your worn copy of Dawkins has earned you that much. The mind, you know, has a tendency of seeing patterns where there are none.

You soon tire of the diversion and go back to fretting about existence. At least until a boom interrupts your reverie. You look up again. The clouds are parting.

You wonder if this is some sort of freak weather system and why, exactly, you thought it would be a good idea to be out in a thunderstorm.

Light filters through as the clouds continue to part. A shy blue sky reveals herself. Trumpets sound. There are winged creatures in the sky. A man in white floats down from the heavens.

You figure it’s a brain tumor — hopefully benign.

The man approaches you and says, “There’s been an administrative mix up.”

Fast forward an hour. It’s explained to you that Nietzsche was almost right. God isn’t dead, not exactly. He’s disappeared. Maybe on vacation or maybe this is part of one of his weird plans. (“Oh, God. I know the one. Luminous fellow. Always coming up with those crazy schemes.”) The angels, in their wisdom, figure that the best way to choose a new God to serve while the real one is out is via a lottery. They assign a number to every living soul and then choose one at random.

The angels, you see, are nihilists — and you are the chosen one.

First act as God

Now you’re God and, boy, if you thought you had responsibilities before, you’ve really got ’em now, and all the angels are waiting. Or, at least, you feel like they ought to be waiting. That’s how people on Earth behave after electing a new president — waiting for him to do something, anything.

But the angels aren’t really like that. They speak in Zen koans. Stuff like, “Do something. Do nothing. Be one hand clapping,” and “A cow is hanging from a tree by its teeth. A river rushes below. Does the cow mu?”

Mostly they just sit around, shrugging at each other. You figure they are sort of like what cats would be like if they were human-shaped and had wings.

Still, you suppose you ought to make some huge gesture to, you know, your people. Maybe not let them outright know that God is back — save a little mystery for further on in the relationship — but something.

So you ponder for a while and think, “Hey, what about that whole divorce epidemic going on? As God, I ought to be able to do something about that. Solve that romance thing.”

Your first act of God, you announce to the angels, will be to solve the Stable Marriage Problem.

The Stable Marriage Problem

Mathematics is a game of the imagination. There is but one rule: you may not contradict yourself — and I’m not even sure about that rule. Maybe there’s a neat looking branch of math waiting to be discovered where sometimes contradictions are allowed.

For every 100 human girls born, 106 boys are born, but the sex ratio for the global population is 101 males for every female — which means that, even if “true love” were a real thing, fated by God, some people would end up alone.

But with mathematics, I get to be the God, and I decide that I’m doing mathematics in a platonic reality where there is a man for every woman and a woman for every man.

Now consider these men and women living in this platonic mathematical universe. The women have some men they would prefer to be with — the Johnny Depps and George Clooneys — and the men have their own preferences — the Scarlett Johanssons and, well, more Scarlett Johanssons.

However, if God matches me with Scarlett Johansson, there is a problem. It’ll end in divorce. A smarter, better looking, more successful, all-around-wonderful guy will come along and, well, I’m out. And our twelve children will be devastated.

No, this will not do at all.

What I’m after is an equilibrium where no individual has a better option, a stable pairing. Like let’s say I’m with Casey Anthony — who is pretty cute but also maybe a murderess — and some other non-possible-murderess comes along who prefers me over her husband. If I prefer this new woman over the constant threat of death-by-angry-wife, then a better option exists.

An ideal couple is one with no options — people who cannot do any better than each other. If you marry someone, you both probably prefer someone else (at least when not blinded by infatuation), like Brad Pitt or Angelina Jolie. But none of those people want you more than their spouse, so you’re stuck together. And that’s true love.

Is there a stable pairing?

But it’s not obvious that there is always a stable pairing. Maybe there is some way to set things up so that there’s an infinite cycle — one where people keep getting divorced and then remarried and then divorced again. I can sort of imagine such a scenario. Look at all the people getting divorced and then remarried today.

Every introduction to this problem that I’ve read papers over this concern. It says, “Well, there is a stable pairing for every set of preferences. Just analyze this algorithm I’m about to give.” This is totally unsatisfying — I don’t want to just know something. I want to know how I could reinvent it.

But alas. I’m not that clever, so you will have to put up with the traditional style of presentation.

A first algorithm

Since you’re God, you can just use the nicest algorithm that you can invent, which would go like this:

  1. Given all men and women, generate all possible matches.
  2. Filter out all the non-stable matches.
  3. Pick your favorite one.

So let’s say you whip this up and you recognize that it has some seriously nasty algorithmic complexity. It grows exponentially and, on a conventional computer will take longer than the age of the universe for inputs larger than ~50.

But that’s fine, right? Now that you’re omnipotent, you’d think that you wouldn’t need to deal with any of this computational complexity nonsense. Except one of those damned angels chimes in and lets you know that even God isn’t allowed to use exponential algorithms for large inputs.

So, if we want to have stable marriages and more than 50 happy souls in our platonic mathematical reality, we’re going to have to come up with something better.

Gale-Shapely algorithm

For inspiration, consider how dating actually works. Bob approaches Alice and says something like, “Let’s hang out sometime.” or “Want to get dinner sometime?” or, for college students, the always popular, “We should watch Netflix together.”

Alice says “yes” if she finds Bob acceptable and “no” otherwise. Then they date until someone better comes along, at which point one of them “falls of out love,” which is really evolution nudging them to go pursue someone else. And the relationship ends. Usually, the woman is the rejecter and the man the rejectee.

So in the real world, the algorithm looks sorta like:

  1. A man asks out a woman he likes.
  2. The woman accepts or rejects the man.
  3. If the woman accepts the man, the woman dates the man until a suitor she
    prefers comes along. At this point, the relationship ends.
  4. The woman dates the new man and the old man asks out a new woman.

Rinse and repeat.

This is basically what the solution to The Stable Marriage Problem — the Gale-Shapely algorithm — looks like.

  1. Each man proposes to his first choice.
  2. Each woman (provisionally) accepts the best man who proposed to her. We could say that they’re engaged.
  3. Each non-engaged man proposes to his next-best choice.
  4. Each woman accepts the best man who proposed to her and rejects the
    rest. This may entail breaking off her current engagement and “trading up.”
  5. Repeat until everyone pairs off.

Looks sorta like dating in real life, huh?

Proving stability

How do we know that this algorithm produces a stable match?

Let’s consider two couples: Fred + Wilma and Barney + Betty. Is it possible that there is a better option? That Barney prefers Wilma over Betty and Wilma prefers Barney over Fred?

No. If Barney preferred Wilma over Betty, he would have proposed to her before Betty, as each man proposes in order of most preferred. In such a case, either Wilma accepted and later kicked him out because she traded up for Fred, or she didn’t accept him because she was already engaged to someone better — Fred or someone worse than Fred. In either case, it’s impossible that Wilma prefers Barney over Fred.

Smashing the Patriarchy

But wait! We’ve discovered a sexist algorithm. All the men are matched with the best possible woman from their point of view, while the women get their worst stable match! It’s stable only because all the men are too happy to agree to swap with any woman.

If we reverse the algorithm so that the women ask the men out, then we get a female optimal algorithm — “proving” that there’s more than one stable match.

How many possible stable matches are there?

Lots. It turns out that the upper bound is something like \(O(n!^{\frac{2}{3}}\)) and a lower bound of \(\Omega(2.28^n)\). Or, you know, lots.

This means that we can probably find a more egalitarian one than the male-optimal version. And indeed we can, except the algorithm is fairly complicated, relying on more knowledge of graph theory than I currently possess.

But as long as you’re omniscient, surely you can figure it out.

Further Reading

The Boy Girl Paradox Explained

Probability theory is notorious for violating human intuition. Consider the Boy Girl Paradox:

Mr. Smith has two children. At least one of them is a boy. What is the probability that both children are boys?

Answer: Of course, the brain thinks, it must be one half. Except it isn’t. It’s one in three.

(Edit: note that the problem can be interpreted in two ways, making it ambiguous as formulated. See the comments for details.)

Trust, but Verify

If the whole Monty Hall debacle and my sister’s reaction (“That’s bullshit!”) are any indication, though, you will not believe me. We must force the intuition to see the error of its ways.

Let’s consider a set of 100 pairs of children and assume that it’s a perfect, representative sample. This means that \(\frac{1}{4}\) is \((Boy, Boy)\), \(\frac{1}{4}\) is \((Boy, Girl)\), \(\frac{1}{4}\) is \((Girl, Boy)\), and \(\frac{1}{4}\) is \((Girl, Girl)\).


"Picture of pairs in the Boy Girl Paradox."

We don’t need to consider double girl sets, since the problem specifies at least one child is a boy.

"Picture of pairs in the Boy Girl Paradox, without girl pairs."

From the image, we can see that there are twice as many boy-girl pairs than double boy pairs — giving us \(\frac{1}{3}\).

"Picture of solution to the Boy Girl Paradox."

Examining this problem suggests a more general heuristic:

Heuristic: When considering a probability problem, consider all possible permutations. Draw a picture.

The Other Problem

Usually, the problem is presented as a pair of problems, the second of which is:

Mr. Jones has two children. The older child is a girl. What is the probability > that both children are girls?

Answer: This time it is one half. Can you see why?

Consider the original image again:

"Picture of pairs in the Boy Girl Paradox."

Then eliminate all pairs where the eldest child is not a girl:

"Picture of eldest girl child pairs in the Boy Girl Paradox."

\(\frac{1}{2}\) of the pairs are \((Girl,Girl)\).

Debugging the Intuition

When MBA students were presented with the first problem, 83% of them gave an answer of \(\frac{1}{2}\). Why is this so tempting? Why does intuition lead us astray?

I’ve heard that teachers consider wrong answers on tests to be valuable feedback. They can see the sort of errors that children make and iron out the conceptual bugs, so to speak. Consider the process of coming up with wrong answer to the first problem. When I worked through it the first time, I thought, “Well, I know that one child is a boy, so there’s a 50% chance that the other child is a boy.” The trouble with this is that it implicitly fixes the first child as a boy — like the second problem does — but this is not a valid move.

Fox and Levav argue that people have a faulty heuristic in their head, that works like this:

Faulty Heuristic: People split up the sample space in the simplest way necessary to accommodate the problem’s parameters. In the boy or girl paradox, the simplest way of splitting up the problem space — whether or not there are two boys — is by halving it.

Finding the correct answer to the problem feels different. It’s more of a, “Wait, what are all the possibilities?” followed by listing them out, \((Boy, Boy)\), \((Boy, Girl)\), \((Girl, Boy)\), \((Girl, Girl)\). Once that’s done, the final step is taking care not to eliminate \((Girl, Boy)\) by falsely assuming that only those in which \(Boy\) comes first are valid, at which point the solution is a matter of counting. This is much easier to avoid when everything is out on paper than it is when considering it mentally.

The trouble with the intuition may just be that it is too quick to compress the space of possibilities — “It can either be \(Boy\) or \(Girl\)!” thinks the fast, system one processor. By slowing it down, listing out all the possibilities, and only throwing out those that don’t apply — \((Girl, Girl)\) — we avoid that failure mode and arrive at the right answer.

Further Reading

The Secretary Problem Explained: Dating Mathematically


I was, to put it mildly, something of a mess after my last relationship imploded. I wrote poems and love letters and responded to all of her text messages with two messages and all sorts of other things that make me cringe now and oh god what was I thinking.

I learned a few things, though, like when you tell strangers that your long-term relationship has just been bulldozed as thoroughly as the Romans salted Carthage, they do this sorta Vulcan mind-meld and become super empathy machines. Even older folk, who usually treat me not exactly as a non-person but something sorta like it. At the time, I had this gruff, Russian psychiatrist I’d see once a month, and he was all like, “Been there, man. Have some Diazepam and relax.” — except, you know, he said it in Russian-accented doctorspeak.

This was surprising to me then but isn’t now. Live long enough and you’ll have your heart thrashed about a fair bit, along with the rest of you. Mention heartbreak and everyone has their own private story — maybe more than one. It’s not Vietnam. They’ve been there and they understand.

I sometimes wonder — if I could go back in time, what could I say to comfort my former self? What can you say to someone that will pull them out of the throes of hormone-induced suffering? Probably nothing. The remarkable thing about words is not that they sometimes move people, but that they so seldom do.

Still, I think I’d say something like, “My boy, evolution is a motherfucker and you need a new woman in your life.” He would probably protest that women were the problem and that he’s pretty sure the last thing he needs is another one. Then, I would let out the most condescending sigh imaginable, the sort of sigh that says I-have-unimaginable-wisdom-born-of-experience-and-am-from-the-future, and say, “Not that sort of woman. You need the Queen. You need mathematics –”

“Let me tell you about the secretary problem.”

The Secretary Problem

Consider the plight of John. John’s 25. He lives in Utah and likes country music, hunting, and four wheelers. You probably see where I’m going with this. That’s right, ladies and gentlemen. John is gay.

The recent supreme court decision overturning gay marriage in Utah has him thinking. He’d like to settle down one day — maybe adopt a child with the right man. He has a couple short-term relationships going on right now, but married to Bruce or Sidney? No way.

How can he guarantee that he snags, if not Mr. Right, at least Mr. Close Enough? He figures he ought to date at least a few different men, and then… what?

Imagine he meets this guy, Jim. They make out at a party, hang out a few times, and realize that they’re kinda already dating, and decide to label it by making it Facebook official. Things progress. John asks Jim to move in with him.

Then there’s a snag. Valentine’s Day rolls around, and John finds himself, at the last minute, at Walmart, looking to pick up some chocolate and cheap Champagne, wondering, “Is this really what love feels like?”

What should John do in such a situation? Should he next Jim and take his chances on the dating market? Or should he settle and settle down?

John’s predicament is an example of the secretary problem — so named because we can imagine the same situation, except instead of a man searching for a husband, it’s a man interviewing potential secretaries. When is the candidate good enough? What’s the stopping criteria?

Formalizing the Secretary Problem

We can abstract away the specifics of John’s plight and formalize the problem. Let’s consider each man that John dates as an integer — the integer representing his “husbandness factor.” Thus, a sequence of lovers like \((Sidney, Bruce, Jim, Todd, Keith, Bruno, Terrence, Cecil, Nigel)\) would translate to the integer sequence \((1, 3, 7, 5, 8, 3, 1, 9, 4)\).

This problem would be trivial — just pick the max element — if it weren’t for two properties.

  1. There’s no look-ahead. When I’m dating any one person, I’m unable to look forward into the future and consider who I’ll date in the future. I have no crystal ball.
  2. There’s no undo. If I date a great girl for a while, but leave her in a misguided attempt to find someone better, there’s a good chance she’ll be unavailable in the future, married to some douche named Trevor who played lacrosse in high school.

We can think of it visually as a machine which is fed a tape of integers. It has two actions: it can either stop or it can consider the next integer. The machine’s objective is to stop on the highest integer.


Real World Examples of the Secretary Problem

At the heart of the secretary problem is conflict. Do I reject the current possibility in hopes of landing something better if I keep looking, or do I stick with what I have?

Examples of the secretary problem:

  • This is the case with dating. I could commit to the woman I’m with right now or I could start texting her best friend.
  • It applies to hiring not just secretaries, but anyone. Is the current candidate the right person for the job or should I hold out for someone better? What if no one comes along?
  • When buying a house — should I put an offer on some house, or should I hope that something better comes along in the future? How many houses ought I look at before deciding?
  • The opposite side of the interviewing problem: should I accept this job offer or should I keep looking?
  • Alligator hunting, at least in Louisiana. Each year, you’re allotted a set number of tags based on the size of your property — that’s the amount of alligators you’re allowed to “harvest” under the law. When you stumble across an alligator, you’re forced to decide: should I kill this one or save my tag and hope a bigger one comes along?
  • When selling a house or a car or, well, any big ticket item. When presented with an offer, you’re forced to decide: should I accept this offer or hope something better comes along?
  • Deciding whether or not to buy something at the supermarket. Is this bread cheap enough or should I hope for a sale next week? The same goes for clothing and, well, anything.

Solving the Secretary Problem

The following contains the formal details for solving the secretary problem analytically. It can safely be skimmed.

As a general problem solving strategy, I often find it useful to first come up with a horrible solution to a problem and then iterate from there. I call this the dumbest-thing-that-could-work heuristic.

In the case of the secretary problem, our horrible solution can be the lucky number seven rule: In an integer sequence, always choose the 7th item.

If we follow this rule, we’re essentially picking an integer at random. The probability, then, of picking the best element from an integer sequence of length \(N\) with this rule is \(\frac{1}{N}\).

To beat this, let’s consider how people go about solving secretary problems in the real world. I don’t know anyone who’s dating strategy is, “I’m going to date seven women and pick the seventh one — no matter what.” One would have to be a staunch nihilist to adopt such a strategy.

Instead, the strategy most adults adopt — insofar as they consciously adopt a strategy — is to date around for a while, gain some experience, figure out one’s options, and then choose the next best thing that comes around.

In terms of the secretary problem, such a strategy would be: Scan through the first \( r \) integers and then choose the first option that is greater than any of the integers in \( [1,r] \).


How does this new strategy compare to our old one? The above image is a prop to help understand the discussion that follows. Assume that \(i\), the greatest integer, occurs at \(n + 1\).

In order for this strategy to return the maximum integer, two conditions must hold:

  1. The maximum integer cannot be contained in \([1,r]\). Our strategy is to scan through \([1,r]\), so if the solution is in \([1,r]\), we necessarily
    lose. This can also be stated as \(n \geq r\).
  2. Our strategy is going to select the first integer, \( i \),  in \([r,N]\) that’s greater than \(max([1,r])\). Given this, there cannot be any integers greater than \(i\) that come after \(i\), otherwise the strategy will lose. Alternatively put, the condition \(max([1,r]) == max([1,n])\) must be true.

Thus, to calculate the effectiveness of our strategy, we need to know the
probability that both of these will hold. For some given \(n\), this is:

$$ \frac{r}{n}\frac{1}{N} $$

\(\frac{1}{N}\) is the probability that \(i\) occurs at \(n + 1\) (remember, this is the probability for some n, not the n), while \(\frac{r}{n}\) is a consequence of the second condition — the probability that the condition \(max([1,r]) == max([1,n])\) is true.

To calculate the probability for some \(r\), \(P(r)\), not for arbitrary \(n\), but for everything, we need to sum over \(n \geq r\):

$$ P(r) = \frac{1}{N}(\frac{r}{r}+\frac{r}{r+1}+\frac{r}{r+2}+…+\frac{r}{N-1}) = \frac{r}{N}\sum_{n=r}^{N-1}\frac{1}{n} $$

This is a Riemann approximation of an integral so we can rewrite it. By letting \(\lim_{N \rightarrow \infty}\frac{r}{N} = x\) and \(\lim_{N \rightarrow \infty}\frac{n}{N} = t\), we get:

$$ P(r) = \lim_{N \rightarrow \infty}\frac{r}{N}\sum_{n=r}^{N-1}\frac{N}{n}\frac{1}{N}=x\int_{x}^{1}\frac{1}{t}dt=-x \ln x $$

Now, we can find the optimal \(r\) by solving for \(P'(r) = 0\). By plugging \(r_{optimal}\) back into \(P(r)\), we will find the probability of success.

$$ P'(r) = -\ln x – 1 = 0 \Rightarrow x = \frac{1}{e} $$
$$ P(\frac{1}{e}) = \frac{1}{e} \approx .37 $$

What The Math Says

How can this math help John? Well, the optimal solution is for him to estimate how many people he believes he might reasonably date in the future, say \(20\). We plug this into the equation \(\frac{N}{e}\), where \(N=20\), \(\frac{20}{e} \approx 7\).

This result says that, if John wants to maximize his probability of ending up with the best possible man, he should date 7 men and, then, marry the next man who is better than all of those men.

However, we have sneaked some probably untenable assumptions into our analysis. The typical secretary problem maximizes the chances of landing the best man, and considers all other outcomes equally bad. Most on the dating market are not thinking this way — they want to maximize the probability that they end up with a pretty good spouse. It’s not all or nothing.

Maximizing the Probability of a Good Outcome

Fear not, there’s a modification of the secretary problem that maximizes the probability of finding a high-value husband or wife. I’m not going to cover the derivation for this flavor of the secretary problem in this post. (For technical details, see Bearden 2005), but suffice it to say, the strategy is the same except we use a cutoff of \(\sqrt{N}\) rather than \(\frac{N}{e}\).

Consider dating for the average American. Assuming one wants to settle down by the age of 35, one has the opportunity for somewhere between 7 and 30 sorta serious relationships. Taking the geometric mean, we get about 14. Johannes Kepler famously considered 11 women for his second wife, so this is, at the very least, not absurd.

The square root of 14 is about 4. Thus, according to the math, one should have four kinda serious relationships and then marry the next person that comes along who is better than all of those four.

How Human Behavior Compares to the Mathematics

The median number of premarital sexual partners is unclear, with different sources reporting markedly different numbers. I’m inclined to place the number between 1 and 4. Using this number as a rough proxy for the number of kinda serious relationships before marriage, reality conflicts with the results of the secretary problem.

Most people aren’t dating even four other people before marriage. What gives?

At its core, the conflict implies that either the solution to the secretary problem does not apply or that humans are not gathering enough information before getting married.

A number of experimental studies (here, here, here, and here) support the second view. When undergraduates are asked to participate in a secretary problem in its pure form — that is, like the tape discussed in the beginning — they almost always stop searching too early.

One might argue that evolution ought to know what it’s doing — especially when it comes to human mating — and that we should have a strong prior that dating is, in some sense, optimal. Such a view ignores that we’re no longer in small tribes of 50 to 200. Humans did not evolve to deal with modern society and the horror that is dating today. A preference for sugar was adaptive 50,000 years ago, but we have since invented the Twinkie.

Indeed, probably pre-civilized human life didn’t look a whole lot like a secretary problem. Back then, one might have had to choose from a half dozen possible mates — mates one had already known for many years. This looks more like a game of pick the maximal element from a set than a bona fide secretary problem.

What Sort of Optimal?

If I use the results of the secretary problem to find a wife, I will almost certainly end up worse off than a strictly rational agent who pursues the same goal, but probably better than those who have no strategy at all. At the end of the day, the secretary problem is a mathematical abstraction and fails to take into account much of complexity of, you know, reality.

The secretary problem assumes, for instance, that our only means of finding out about the distribution of potential mates and our preferences for them is via dating. This isn’t remotely true. We can observe the actions of others, introspect, read about human mate preferences, discuss our experiences with friends, and otherwise share information.

It’s also not the case that we’re dating men or women at random. There are a huge number of filters that go into deciding whether or not someone is marriage material. Are we of similar ages and interests? Do we speak the same language? Do I feel any attraction for this person?

A theory of optimal dating would need to take this and much more into account. There are a near unlimited number of paths to strategically choosing who to spend the rest of your life with, and a lot of that strategy consists of things other than choosing. You might try getting fit, earning more money, adopting interesting hobbies, honing social skills, meeting lots of the opposite sex, taking voice acting or improv classes, and so on. An optimal theory of dating would, I have no doubt, emphasize some subset of these skills.

All Together Now


  • The secretary problem is the problem of deciding whether or not one should stick with what they have or take their chances on something new.
  • Examples of secretary problems include finding a husband or wife, hiring a secretary, and alligator hunting.
  • The solution to the secretary problem suggests that the optimal dating strategy is to estimate the maximum number of people you’re willing to date, \(N\), and then date \(\sqrt{N}\) people and marry the next person who is better than all of those.
  • In laboratory experiments, people often stop searching too soon when solving secretary problems. This suggests that the average person doesn’t date enough people prior to marriage.
  • At the end of the day, the secretary problem is a mathematical abstraction and there is more to finding the “right” person than dating a certain number of people.

Further Reading

Worldbuilding, Worldbuilders, and Mathematics

This week, I was introduced to the hobby of worldbuilding — inventing imaginary places, making maps, elaborating histories. (The platonist in me prefers to think of worldbuilding as the discovery of fictional universes, rather than an act of invention.)


There is a (perhaps apocryphal) tale that J. R. R. Tolkien got into a fight with his publisher over using the words “elves” and “dwarves” instead of “elfs” and “dwarfs”. The publisher argued that the latter was how the dictionary did it. “I know,” Tolkien responded, “I wrote the dictionary.” Tolkien had, in fact, spent several years as an assistant working on the Oxford English Dictionary.

Although most known for his fiction, Tolkien was a linguist at heart, even inventing some eleven fictional languages along with a number of variations on those languages.

The remarkable degree of internal consistency of some Tolkien’s language use in The Lord of the Rings is perhaps unsurprising, then. The prefix “mor” in his work translates literally to black and is used consistently — Mordor is black land, moria is black pit, and morranon is black gate. The “dor” in Mordor means land. Gondor, as you might expect, means stone land.

Of his languages and Lord of the Rings, Tolkien wrote “The invention of languages is the foundation. The ‘stories’ were made rather to provide a world for the languages than the reverse.”


It’s very far away,
It takes about half a day
to get there
if we travel by my, uh, dragonfly.
—Jimi Hendrix, “Spanish Castle Magic”

I have no such patience for languages. I have little interesting in learning a new one, except perhaps Lojban, and less still in attempting to invent my own, unless we’re speaking of programming.

But it does suggest a question: Where would I start with discovering a fictional universe? If I wanted to maximize the believability of a fictional universe, on what rock would I put it?

I’m thinking maybe economics. Robin Hanson recently pointed out that the movie Her, while enjoyable, is far from realistic. Humans invent human-level “strong” AI and use it… to chat with.

I doubt that’s how it’s going to play out in the real world. Why bother hiring a human to do any sort of computer work when an AI can do it faster, better, and cheaper? Newspaper reporter? Computer has it covered. Secretary? Computer. Author? Computer. Researcher? Computer. Teacher? Computer. Customer support? Computer.

Talk about a fuck up. In the real world, everything makes a perverted sort of sense. A causes B causes C, ad infinitum. In the real world, physics makes the rules, and we, well, we’re an expression of them. When worldbuilding, there is no physics ensuring that what you write is consistent. It’s up to you.

Discovering a coherent world draws on the same skills that are necessary for understanding our own world. When the movie Gravity was released, it was criticized for fucking up a whole lot of physics — vehicles in impossible orbits, backpacks with unlimited fuel supplies, and indestructible space suits.

That’s all ignoring messy details like relationship mechanics, attraction, and how language works. The movie Ted drives me up the wall — not because there is a magical, talking teddy bear, but because Mark Wahlburg and Mila Kunis’s relationship strikes me as absurd. A chick like that, decent career, and she’s with this unemployed man-child? Ri-ght.

All of which is to suggest that the skills and knowledge necessary to understand this world are the same needed to build your own: economics, game theory, physics, social dynamics, and so on. The converse is true, too: to understand this world, consider the sort of things you’d need to know to build your own. Or, as Feynman put it, “What I cannot create, I do not understand.”

The Fractal Nature of Worldbuilding

Worldbuilders often go out of their way to produce natural looking maps — like by spilling coffee on paper. Here’s an example:


Another technique for generating maps: taking pictures of rusted fire hydrants.


But there’s a whole branch of mathematics for this sorta thing, fractals! Ever notice the self-similar nature of trees? Each branch looks like a small tree unto itself. Or coastlines — each “crevice” of a coastline itself looks like a coastline. Branches, snowflakes, crystals — all like this.

Indeed, ever after reading Benoit Mandelbrot’s The Fractal Geometry of Nature, I sometimes catch myself seeing fractals superimposed on clouds when I unfocus my eyes.

Mathematics and Worldbuilding

Okay, confession: I wasn’t 100% honest with you. While this is my first brush with groups of other worldbuilders, I’ve toyed with the idea in the past — after colliding with Tegmark’s mathematical universe hypothesis and reading Permutation City.

I wasn’t thinking about making maps. I was wondering: How could I model the essentials of emergent behavior? Is it Conway’s game of life — or something simpler? How could I simulate a universe? What do the fundamental laws of our universe look like?

And I started wondering if mathematics wasn’t a sort of world unto itself. A set of axioms with implications of the sort we could never anticipate — implications we are still discovering, who knows what it could lead to? And not one system of axioms, but infinitely many — each with different definitions, objects, theorems, branches, and applications.

In short, I started to think of the work of a mathematician as being a whole lot like worldbuilding. Discovering some object that obeys certain rules of logic and nothing more, and asking, “What does it do? Is such and such true of this thing? How does it behave here?”

I’m reminded of János Bolyai. Of non-euclidean geometry, he wrote, “Out of nothing I have created a strange new universe.”

What Makes Something Interesting?

Francis Galton, cousin of Charles Darwin and maybe best known for his work on intelligence, was a bit obsessed with the idea that people have certain innate traits. You know the movie Minority Report, where a special police department tries to predict crime before it happens? He sorta tried to invent it — in 1883.

He had this idea, see, that you could predict whether or not someone was a criminal based on the structure of their face. He devised a technique of composite photography, which allowed him to create averages of many images. While he didn’t manage to identify criminals, he did find that the average of several faces tended to be more attractive than any of the individual faces he used as input.

More than 100 years later, it turns out Galton was on to something — regarding both crime and attractiveness. Men with wider faces are more aggressive hockey players, less trustworthy in laboratory games, engage in more aggressive behavior, and are more successful CEOs. Computer averages of faces are more attractive than the people used as inputs, and this result holds not only for faces, but for averages of cars, fish, and birds. A wide face is a dangerous face and an average fish is an attractive fish, it seems.

The Beautiful is the Compressible


We can think of human beings as agents who take in information from the environment, run that information through a compressor module, and then store that information in long-term memory. This is not rocket science. Our brains can’t hold all of the information in the world. We forget. We are forced to compress experience down to a few relevant details and store those. Indeed, a fair amount of evidence now supports the hypothesis that memories are reconstructed during recall. Each time you remember something, you’re modifying that memory. The brain is not a high-fidelity recorder.

In our man-as-compressor model, what sets the beautiful, averaged face apart from a typical face? It’s easier to compress. Consider all the information the brain has to store about a hideous face: a giant nose, a lazy eye, a unibrow, scars, maybe a teardrop tattoo. When the brain encounters a beautiful face, though, the compressor says something like, “Ah, a face so face-like that I need not spend any more processing time on it. I can relax.”

This idea is taken to its logical extension in low-complexity art. The aim of low-complexity art is to create images that can be described by a short computer program — a measure of complexity known as Kolmogorov complexity. The picture at the beginning of this section is an example of this style of art.

The Interesting is the Unexpected

Consider two facts:

When I speak to people, they find the second fact a lot more interesting than the first. This is, I think, because it violates their model of the world. They think of evolution as pushing us toward ever increasing complexity, but this is not true. Consider venereal sarcoma, which is today an infectious cancer, but used to be a dog.

This notion of surprise, the violation of expectation, is at the core of interestingness. If you already know something, if you anticipated it, it’s boring. The first time you hear a joke, it’s funny. The second time, not so much.

But not all unexpected data is interesting. If I published random sequences of numbers instead of words on this blog, well, no one would read it, and I wouldn’t blame them. What separates the interesting from the uninteresting?

If we consider our man-as-compressor model, interesting facts are those that improve the future performance of the compressor. Here’s an example: marriages are more likely to dissolve during periods of unemployment, but this only holds for unemployed husbands. For someone unaware of this fact, it improves their compressor — in this case, predicting when a couple will get divorced. (If you can predict something, you can compress it. They’re the same construct.) Depending on the person, this fact might further propagate through their compressor, updating beliefs about human mate preferences.

Indeed, a discovery is “just” a large improvement in the compressor. Consider Darwin’s theory of evolution. It connects and explains a huge amount of the phenomena around us. Where did humans come from? Why do fats taste good? Why do whales have organs similar to those of humans — and not fish? Talk about a compression upgrade.

We can even tie this into curiosity. After all, what is curiosity if not the pursuit of one’s interests? Given that what is interesting are those things that upgrade our model of the world, curiosity can be thought of as a drive to improve the compressor — a drive to improve our understanding of how things work.

Creativity, too, can be understood through the compressor model. Creativity is the consistent violation of other people’s expectations. Consider this poem:

Roses are red,
And ready for plucking,
You’re sixteen,
And ready for high school.
—Kurt Vonnegut, Breakfast of Champions

Notice how it violates the expectations of the compressor? That’s creativity.

All together, then:

  • Humans can be thought of as agents who take in information from the environment, run it through a compressor, and store the result in long-term memory.
  • Something is beautiful insofar as it can be compressed. Example: an average of faces is more beautiful than any individual face.
  • How interesting something is depends on how much it improves the performance of the compressor. When a fact violates expectations and improves one’s model of the world, that’s interesting. It improves the compressor.
  • Curiosity is the pursuit of the interesting — action designed to improve the compressor.
  • Creativity is the consistent violation of the expectations of the compressor.

Further Reading

The Science of Problem Solving

feynman-chalkboardMathematics is like the One Ring in the Lord of the Rings. Once you’ve loaded a problem into your head, you find yourself mesmerized, unable to turn away. It’s an obsession, a drug. You dig deeper and deeper into the problem, the whole time unaware that the problem is digging back into you. Gauss was wrong. Mathematics isn’t a queen. She’s a python, wrapping and squeezing your mind until you find yourself thinking about the integer cuboid problem while dreaming, on waking, while brushing your teeth, even during sex.

Lest you think I exaggerate, Feynman’s second wife wrote in the divorce complaint, “He begins working calculus problems in his head as soon as he awakens. He did calculus while driving in his car, while sitting in the living room, and while lying in bed at night.” Indeed, the above is a picture of Feynman’s blackboard at the time of his death. It says on it, “Know how to solve every problem that has been solved.” I like this sentiment, this idea of man as problem solver. If I were running things, I think I would have sent Moses down the mountain with that as one of the ten commandments instead of two versions of “thou shalt not covet.”

That’s what this post is about: How do humans solve problem and what, if anything, can we do to become more effective problem solvers? I don’t think this needs any motivating. I spend too much time confused and frustrated, struggling against some piece of mathematics or attempting to understand my fellow man to not be interested in leveling up my general problem-solving ability. I find it difficult to imagine anyone feeling otherwise. After all, life is in some sense a series of problems, of obstacles to be overcome. If we can upgrade from a hammer to dynamite to blast through those, well, what are we waiting for? Let’s go nuclear.

A Computational Model of Problem Solving

Problem solving can be understood as a search problem. You start in some state, there’s a set of neighbor states you can move to, and a final state that you would like to end up in. Say you’re Ted Bundy. It’s midnight and you’re prowling around. You’re struck by a sudden urge to kill a woman. You have a set of moves you could take. You could pretend to be injured, lead some poor college girl to your car, and then bludgeon her to death. Or you could break into a sorority house and attack her there, along with six of her closest friends. These are possible paths to the final state, which in this macabre example is murder.

Similarly, for those who rolled lawful good instead of chaotic evil, we can imagine being the detective hunting Ted Bundy. You start in some initial state — the Lieutenant puts you on the case (at least, that’s how it works on television.) Your first move might be to review the case files. Then you might speak to the head detective about the most promising leads. You might ask other cops about similar cases. In this way, you’d keep choosing moves until reaching your goal.

Both of these are a graph. (Not to be confused with the graph of a function, which you learned about in algebra. This sort of graph — pictured below — is a set of objects with links between them.) The nodes of the graph are states of the world, while the links between the nodes are possible actions.


Problem solving, then, can be thought of as, “Starting at the initial state, how do I reach the goal state?”


On this simple graph, the answer is trivial:


On the sort of graph you’d encounter in the real world, though, it wouldn’t be so easy. The number of possible moves in a chess match — itself a simplification when compared to, you know, actual war — is \( 10^{120} \), while the number of atoms in the observable universe is a mere \( 10^{81} \). It’s a near certainty, then, that the human mind doesn’t consider an entire graph when solving a problem, but somehow approximates a graph search. Still, it’s sorta fun to imagine what a real world problem might look like.



A change in perspective is worth 80 IQ points.
—Alan Kay

Insight. In the shower, thinking about nothing much, it springs on us, unbidden and sudden. No wonder the Greeks thought creativity came from an outside source, one of the Muses. It’s like the heavens open up and a lightning bolt implants the notion into our heads. Like we took an extension cord, plugged it into the back of our necks, and hooked ourselves into the Way, the Tao, charging ourselves off the zeitgeist and, boom, you have mail.

It’s an intellectual earthquake. Our assumptions shift beneath us and we find ourselves reoriented. The problem is turned upside down — a break in the trees and a new path is revealed.

That’s what insight feels like. How does it work within the mind? There are a number of different theories and no clear consensus among the literature. However, with that said, I have a favorite. Insight is best thought of as a change in problem representation.

Consider how often insight is accompanied by the realization, “Ohmygod, I’ve been thinking about everything wrong.” This new way of thinking about the problem is a new representation of the problem, which suggests different possible approaches.

Consider one of the problems that psychologists use to study insight:

You enter a room in which two strings are hanging from the ceiling and a pair of pliers is lying on a table. Your task is to tie the two strings together. Unfortunately, though, the strings are positioned far enough apart so that you can’t grab one string and hold on to it while reaching for the other. How can you tie them together?

(The answer is below the following picture if you want to take a second and try to figure it out.)


The trick to this problem is to stop thinking about pliers as pliers, and instead to think of it as a weight. (This is sometimes called overcoming functional fixedness.) With that realization in hand, just tie the pliers to one rope and swing it. If you stand by the other rope, the pliers-rope should eventually swing back to you, and then you can tie them together.

In this case, the insight is changing the representation of pliers as tool-to-hold-objects-together to pliers as weight. More support for this view comes from another famous insight problem.

You are given the objects shown: a candle, a book of matches, and a box of tacks. Your task is to find a way to attach the candle to the wall of the room, at eye level, so that it will burn properly and illuminate the room.


The key insight in this problem is that the box that the tacks are contained in is not just for holding tacks, but can be used as a mount, too — again, a change in the representation.


In fact, the rate at which people solve this problem depends on how it’s presented. If you put people in a room with the tacks in the box, they’re less likely to solve it than if the tacks and box are separate.

The way we frame problems makes them more or less difficult. Insight is the spontaneous reframing of a problem. This suggests that we can increase our general problem solving ability by actively thinking of new ways to represent and think about a problem — different points of view. There are a couple of ways to accomplish this. Translating a problem into another medium is a cheap way of producing insight. Often, creating a diagram for a math problem, for example, can be enough to make the solution obvious, but we need not limit ourselves to things we can draw. We can ask ourselves, “How does this feel in the body?” or imagine the problem in terms of a fable.

Further, we can actively retrieve and create analogies. George Pólya in his How to Solve It writes (paraphrased), “You know something like this. What is it?” The history of science, too, is filled with instances of reasoning by analogy. Visualize an atom. What does it look like? If you received an education anything like mine, you think of it as like a solar system, with subatomic particles rotating a nucleus. This is not really what an atom looks like, though, but it has stuck with us by way of Rutherford.

Indeed, we can often gain cheap insights into something by borrowing the machinery from another discipline and thinking about it in those terms. Social interaction, for instance, can be thought of as a market, or as the behavior of electrons that think. We can think of the actions of people in terms of evolutionary drives, as those of a rational agent, and so on.

This perhaps explains some of the ability of some scientists to contribute to different disciplines with original insights. I’m reminded of Feynman’s work on the connection machine, where he analyzes the computer’s behavior with a set of partial differential equations — something natural for a physicist, but strange for a computer science who thinks in discrete rather than continuous terms.


We can think of problem solving like a walnut, a metaphor that comes to me by way of Grothendieck. There are two approaches to cracking a walnut. We can, with hammer and chisel, force it open, or we can soak the walnut in water, rubbing it from time to time, but otherwise leaving it alone to soften. With time, the shell becomes flexible and soft and hand pressure alone is enough to open it.

The soaking approach is called incubation. It’s the act of letting a problem simmer in your subconscious while you do something else. I find difficult problems easier to tackle after I’ve left them alone in a while.

The science validates this phenomena. A 2009 meta-analysis found significant interactions between incubation and problem solving performance, with creative problems receiving more of a boost. Going further, they also found that the more time that was spent struggling with the problem, the more effective incubation was.


Keep your subconscious starved so it has to work on your problem, so you can sleep peacefully and get the answer in the morning, free.
—Richard Hamming, You and Your Research


A 2004 study published in <em>Nature</em> examined the role of sleep in the process of generating insight. They found that sleep, regardless of time of day, doubled the number of subjects who came up with the insight solution to a task. (Presented graphically above.) This effect was only evident in those who had struggled with the problem, so it was the unique combination of struggling followed by sleep and not sleep alone that boosted insight.

The authors write, “We conclude that sleep, by restructuring new memory representations, facilitates extraction of explicit knowledge and insightful behaviour.”

The Benefits of Mind Wandering

Individuals with ADHD tend to score higher than neurotypical controls on laboratory measures of creativity. This jives with my experience. I have a cousin with ADHD. He’s a nice guy. He likes to draw. Now, I’ve never broken out a psychological creativity inventory at a family reunion and tested him, but I’d wager he’s more creative than normal controls, too.

There’s a good reason for this: mind-wandering fosters creativity. A 2012 study (results pictured below) found that any sort of mind-wandering will do, but the kind elicited during a low-effort task was more effective than even that of doing nothing.


This, too, is congruent with my experience. How much insight has been produced while taking a shower or mowing the lawn? Paul Dirac, the Nobel Prize winning physicist, would take long hikes in the wood. I’d bet money that this was prime mind-wandering time. I know walking without goal is often a productive intellectual strategy for me. Rich Hickey, known as inventor of the Clojure programming language, has sorta taken the best of both worlds — sleep and mind wandering — and combined them into what he calls hammock driven development.

But how does it work?

As is often the case in the social sciences, there is little consensus on why incubation works. One possible explanation, as illustrated by the Hamming quote, is that the subconscious keeps attacking the problem even when we’re not aware of it. I’ve long operated under this model and I’m somewhat partial to it.

Within cognitive science, a fashionable explanation is that during breaks we abandon approaches that are ineffective. Thus, next time we view a problem, we are prone to try something else. There is something to this, I feel, but some sources go too far when they propose that this is all incubation consists of. I have notice significant qualitative changes to the structure of my own beliefs that occur outside of conscious awareness. Something happens to knowledge when it ripens in the brain and forgetting is not all of that something.

In terms of our initial graph, I have a couple ideas. We still do not have a great grasp on why animals evolved the need to sleep, but it seems to be related to memory consolidation. Also note the dramatic change thought processes undergo while on the edge of sleep and while dreaming. This suggests that there are certain operations, certain nodes in our search graph, that can only be processed and accessed during sleep or rest. Graphically, it might look like:


This could be combined with a search algorithm like tabu search. During search, the mind makes a note of where it gets stuck. It then starts over, but uses this information to inform future search attempts. In this manner, it avoids getting stuck in the same way that it was stuck in the past.

Problem Solving Strategies

It is really a lot of fun to solve problems, isn’t it? Isn’t that what’s interesting in life?
—Frank Offher

There may be no royal road to solving every problem with ease, but that doesn’t mean that we are powerless in the face of life’s challenges. There are things you can do to improve your problem solving ability.


The most powerful, though somewhat prosaic, method is practice. It’s figuring out the methods that other people use to solve problems and mastering them, adding them to your toolkit. For mathematics, this means mastering broad swathes of the stuff: linear algebra, calculus, topology, and so on. For those in different disciplines, it means mastering different sorts of machinery. Dan Dennet writes about intuition pumps in philosophy, for instance, while a computer scientist might work at complexity theory or algorithmic analysis.

It is, after all, much easier to solve a problem if you know the general way in which such problems are solved. If you can retrieve the method from memory instead of inventing it from scratch, well, that’s a big win. Consider how impossible modern life would be if you had to reinvent everything, all of modern science, electricity, and more. The discovery of calculus took thousands of years. Now, it’s routinely taught to kids in high school. In terms of imagery, we can think of solving a problem from scratch as a complicated graph search, while retrieving a method from memory as a look-up in a hash table. The difference looks something like this:


All of this is to say that it’s very important that you familiarize yourself with the work of others on different problems. It’s cheaper to learn something that someone else already knows than to figure it out on your own. Our brains are just not powerful enough. This is, I think, one of the most powerful arguments for the benefits of broad reading and learning.


Moods can be thought of as mental lenses, colored sunglasses, that encourage different sorts of processing. A “down” mood encourages focus on detail, while an “up” mood encourages focusing on the greater whole.

Indeed, multiple meta-analyses suggests that those in happier moods are more creative. If you’ve ever met someone who is bipolar, you’ll notice that their manic episodes tend to look a lot like the processing of creative individuals. As someone once told me of his manic episodes, “There’s no drug that can get you as high as believing you’re Jesus Christ.”

This suggests that one ought to think about a problem while in different moods. To become happy, try dancing. To be sad, listen to sad music or watch a sad film. Think about the problem while laughing at stand-up comedy. Discuss it over coffee with a friend. Think about it while fighting, while angry at the world. The more varied states that you are in while considering your problem, the higher the odds you will stumble on a new insight.

Rubber Ducking

Rubber ducking is a technique for debugging that’s famous in the programming community. The idea is that simply explaining your problem to another person is often enough to lead to the eureka! In fact, the theory goes, you don’t even need to describe it to another person. It’s enough to tell it to a rubber duck.

I have noticed this a number of times. I’ll go to write up something I don’t understand, some problem I have on StackOverflow, and then bam, the answer will punch me in the face. There is something about describing something to someone else that solidifies understanding. Why do you think I’m going through the trouble of writing all of this up, after all?

The actual science is a bit mixed. In one study, describing current efforts on a problem reduced the likelihood that one would solve the problem. The theory goes that this forces one to focus on easy-to-verbalize parts of the problem, which may be irrelevant, and thus entrenches the bad approach.

In a different study, though, forcing students to learn something well enough to explain to another person increased their future performance on similar problems. A number of people have remarked that they never really understood something until they had to teach it, and this maybe explains some of the success of the researchers-as-teachers paradigm we see in the university system.

Even with the mixed research, I’m confident that the technique works, based on my own experience. If you’re stuck, try describing the problem to someone else in terms they can understand. Blogging works well for this.

Putting it All Together

In short, then:

  • Problem solving can be thought of as search on a graph. You start in some state and try to find your way to the solution state.
  • Insight is distinguished by a change in problem representation.
  • Insight can be facilitated by active seeking of new problem representations, for example via drawing or creating analogies.
  • Taking breaks during working on a problem solving is called incubation. Incubation enhances problem solving ability.
  • A night’s sleep improves problem solving ability to a considerable degree. This may be related to memory consolidation during sleep.
  • Mind-wandering facilitates creativity. Low effort tasks are a potent means of encouraging mind-wandering.
  • To improve problem solving, one should study solved problems, attack the problem while in different moods, and try explaining the problem to others.

Bill Thurston on Reading Hard Things

I was really amazed by my first encounters with serious mathematics textbooks. I was very interested and impressed by the quality of the reasoning, but it was quite hard to stay alert and focused. After a few experiences of reading a few pages only to discover that I really had no idea what I’d just read, I learned to drink lots of coffee, slow way down, and accept that I needed to read these books at 1/10th or 1/50th standard reading speed, pay attention to every single word and backtrack to look up all the obscure numbers of equations and theorems in order to follow the arguments.
—Bill Thurston, from the foreword to Teichmüller Theory and Applications to Geometry, Topology, and Dynamics

He goes on to talk about the importance of using one’s whole mind in understanding mathematics. You might want to check it out.