How the 2013 Boston Marathon Bombings Affected This Year’s Attendance

A year ago, I logged a prediction (at 60% confidence) that this year’s Boston Marathon attendance would be lower than the previous year’s as a result of the 2013 bombings.

Well, the numbers are in, and I wasn’t even close: Last year, 26839 people entered the race. This year? 35671 runners, about 33% growth. (In hindsight, what was I even thinking?)

I wanted to quantify just what sort of effect the bombings had on attendance, so I gathered all the data that’s readily available online, and plotted it:

boston-marathon-data

You’ll notice that 2014 seems like a pretty clear outlier. Fitting a line to the data allows us to quantify what normal growth probably would have looked like in an alternate universe where there were no bombings:

boston-marathon-data-plus-line

Running the numbers, the difference between the predicted turn out and the observed turn out is an additional 6087 runners. You might wonder: what kind of economic windfall is that? Well, the 2012 marathon generated $137.5 million in revenue, some $5123 per runner. This means that the additional 6087 runners should generate an additional 31.2 million in revenue, or about a tenth of the cost of the bombings’s damage (at least according to one NBC estimate).

On Bad Publicity

But we should go back and ask, “What was wrong with my intuition a year ago such that I expected marathon attendance to decrease?” I suspect I underestimated just how compelling a message like, “2014: Let’s race against terror” or “Racing in loving memory of Martin William Richard” would be.

I think I was thinking along the lines of, “Well, people died. People will think it’s dangerous, so they won’t go.” But of course that didn’t happen. Maybe I overestimated just how irrational people are. They probably figured the odds of a second attack were tiny.

Or maybe the sort of media coverage you get when someone attacks your marathon is just surprisingly effective advertising. The Boston Marathon was not even on my radar a year ago, but I’m sitting here and talking about it now, and I’m confident I wouldn’t be otherwise, so it certainly got my attention.

The most relevant comparison I can think of is the 2012 theater shooting, which marred the release of The Dark Knight Rises. Given the growth in Boston Marathon attendance, we might expect — perversely — that the shootings would be good for sales.

This doesn’t appear to be the case. The movie brought in 30 million less in sales than expected, and a 2013 analysis in the Journal of Criminal Analysis reports that the “Aurora theater shooting resulted in striking declines for Cinemark (the targeted theater) as well as major US competitors, but had no impact on overseas theater chains.”

The most salient difference between the two is, I expect, timeframe. A year’s passing makes the bombings feel distant (at least to someone not directly involved) while the bulk of the expected ticket sales took place soon after the theater shootings.

Finally, you might wonder: is there any truth to this whole notion of “no such thing as bad publicity”? Well, sorta: a 2004 study found that any reviews, positive and negative, increased book sales. A 2010 sorta replication found that negative reviews increased sales, but only of mostly unknown authors. Bad reviews of well-known authors, in contrast, hurt sales.

The paper itself offers a few interesting tidbits, too, including:

A wine described “as redolent of stinky socks,” for example, saw its sales increase by 5% after it was reviewed by a prominent wine website (O’Connell 2006). Similarly, although the movie Borat made relentless fun of the country of Kazakhstan, Hotels.com reported a “300 percent increase in requests for information about the country” after the film was released (Yabroff 2006, p. 8).

But, in general, bad publicity is bad publicity and we should stop paying too much attention to questionable adages:

Negative publicity often hurts. When a rumor circulated that McDonald’s used worm meat in its hamburgers, sales decreased by more than 25% (Greene 1978). Coverage of musician Michael Jackson’s bizarre behavior and brushes with the law destroyed his career. Viacom Inc. Chairman Sumner Redstone estimated that negative publicity cost Mission Impossible 3 more than $100 million in ticket sales (Burrough 2006), and film pundits have suggested that it is “almost impossible to recover from bad buzz” (James 2006).

Academic research corroborates this sentiment and casts further doubt on the old adage that “any publicity is good publicity.” Negative publicity about a product has been shown to hurt everything from product and brand evaluation (Tybout et al. 1981, Wyatt and Badger 1984) to firm net present value and sales (Goldenberg et al. 2007, Reinstein and Snyder 2005). Negative movie reviews, for example, decrease box office receipts (Basuroy et al. 2003).

Further Reading

  • If you want to use the data for anything, it’s available here.

Why I Like Surprises and You Should Too

I love surprises.

Imagine a man — oh, I’ll just pick a name at random, let’s call him James Randi. He’s a staunch materialist. And not the “I like to buy a lot of stuff” kind of materialist, but the sort that believes everything is made out of atoms and quarks (and whatever quarks are made out of) and that magic is physically impossible.

Now, imagine that this man, fed up with arguing with hippies and magical thinkers generally, snaps and declares, “I’ll bet anyone a million dollars that they can’t come into my laboratory and produce obviously supernatural phenomena.”

He declares this in a moment of exasperation, but he gets to thinking: y’know, this is a pretty good idea. A million dollars for the impossible. If someone is psychic, it’s free money for them. And if someone won’t take free money just to demonstrate their claimed power, well, they must be a fraud.

free-money

So he gets on the phone with the New York Times and he tells them about his brand new “One Million Dollar Paranormal Challenge”.

In the lab

Let’s fast forward a couple of months, to the point where money-seeking cranks have started slithering out of the Earth like worms in a storm. Imagine that this definitely fictional James Randi is in the lab with one of these worms. This worm, I’ll pick another name at random, let’s call her Theresa Caputo.

the-clairvoyant-crawler

And this Theresa, who is fictional and definitely not a real life charlatan, claims that she can communicate with the dead. So James sighs, “Okay, Theresa. Tell me something that only a dead person could know. Something I know and they knew, but you have no way of knowing.”

Here’s where the universe diverges. In one branch, the usual happens. Theresa says something about James’s mother loving the water and boats, while James makes agreeable sounding noises — except he’s leading her on. His mom couldn’t swim and the water terrified her. (She had seen one too many “World’s Deadliest Sharks” specials on Animal Planet.)

But in the other branch, the surprising happens. Theresa details the family dog James had as a child. A fluffy, white contraption named Houdini, who loved milk so much that Randi’s mom would share her bowl of cereal with him after she had finished with it.

And on and on, like the time Randi had tried to grow his first beard and everyone called him Fuzz Aldrin. Or when he asked Sally Banks to prom and she turned him down and it crushed him, but actually she was just a lesbian and how could that be a reflection on him?

She’s saying this stuff, anecdote after anecdote that no one could know, and James grows more and more freaked out, the color of his face shifting from its typical rosy hue to a pale-moon grey, as if someone had opened up a picture of him in Photoshop and slowly moved the saturation from 100% to zero — until finally he bellows, “Okay, stop, enough! I get it.”

Theresa stops for a moment, a pause, silence, and then she chirps, “…but we haven’t even gotten to the mind reading bit yet.”

Contrasting consequences

Okay, now, consider how the lives of each James will unfold over the next couple of months. The James in the first branch will continue going about his life as he’s been doing it. Giving an interview here and there, maybe working on a book about skepticism (dubbed Citation Needed), and occasionally debunking so-called psychics in his lab. Business as usual, you know, the grind.

The other James has just had his belief system wrecked. Assuming that he’s not hallucinating and that this woman’s abilities hold up to further scrutiny, our best models of the universe are just wrong. The supernatural exists! Consciousness continues on after death.

This second James will be tasked with picking up the pieces of his belief system after this intellectual Earthquake, that has not only shook but toppled his belief systems and proved that the foundations were air all along. He’ll have to ask himself stuff like, “Is this proof of the existence of God? Should I be converting to some religious movement? Which one? If I’m wrong about this, what else am I wrong about?”

And that’s just James’s personal struggle. Consider the far reaching implications such a discovery would have. Proof of the supernatural! This would be a larger scientific discovery than anything before. More than Newton discovering classical mechanics, more than special relativity, even bigger than the ancient Greek’s discovery that, yes, the universe contains regularities that can be described by simple mathematical equations.

We’ll want to know: how does this woman communicate with the dead? Where are they? What’s the causal mechanism here? Is there some as-of-yet undiscovered physical phenomena taking place here? Or is a mysterious, inexplicable force somehow fundamental to the universe? Maybe it’s not so much that we don’t know, but that we can’t know.

Plus a billion more mundane questions: Can humans use their psychic powers to communicate faster than light? What about to communicate with the long dead to write history books? How about to make crops grow faster or to find oil? Can every human do this or just one? Can you train this ability? Is it located in a region of the brain? Can psychics predict the stock market?

And, of course, there would be a religious superstorm, a mad rush to claim that we called it, we knew it all along, that this woman was a product of our god. (Like terrorists taking responsibility for an attack, if you will.)

Information content

The value of information is usually defined as the amount someone would pay to know something prior to making a decision, but I like to this of it as the amount your behavior would change if you had the answer. That is, if you were clairvoyant — you know, like you could somehow plug your brain into the heavens and have always-on-access to a line of the best kind of credit: pure truth.

For instance, you might consider a firm doing medical research on new drugs. The economic incentive here is massive: the total revenue of Lipitor alone is something like $141 billion.

What’s the value of information here? Well, what if you knew before doing a bunch of expensive research and development that a certain chemical was going to be bust? The median cost of research and development per drug is something like 1 billion USD. Since 19 out of 20 medications in experimental development fail, the value of knowing ahead of time is going to be worth at least half a billion (and probably a lot more).

This can be converted to degrees, too. Reducing uncertainty from 38 to 27 percent might be worth something like 55+ million.

From this view, surprising information is more valuable than not surprising information: it leads to greater shifts in behavior. For the first James, he’s just found out that his model of the world was right, that psychics really are tricking people and whatever, and he gets to go on with his life as usual. Not too valuable. It maybe shifted his confidence from 99.99 percent to 99.990001 percent.

The second James has his world — well, at least his model of the world — torched. He found out that he’s been wrong about damn near everything. And what’s the value of this information? “What does it matter?,” you might ask. “Wouldn’t he be happier oblivious?”

Maybe not — consider the ramifications again. What if James had continued to live oblivious to the existence of the supernatural, remaining a staunch atheist, and the rapture comes along and, whoops, no heaven for James.

genesis-snake-gardenOr, even more prosaically, consider the Earthly value of information here. James, as a result of his lab interview, might learn how to harness his own psychic powers, and maybe he has some prescient abilities. He can see the future and foresees a typhoon in India. After googling, he realizes that India is one of the world’s largest producers of coconuts, and this typhoon is going to coincide with the harvest. So he buys up coconut futures and sells them for a cool billion when the typhoon hits.

Or, you know, he could totally use that information to save a bunch of lives.

Surprise as an indicator of incorrect models

As you’re probably noticing and, if not, the header should have given it away, surprise indicates that your model of the world is incorrect — that there’s something that you’ve failed to take into account.

I don’t know if you’ve seen the movie Groundhog Day but, basically, Bill Murray repeats the same day over and over again, and he’s the only one who’s aware that it’s happening. There’s a scene where he abuses his near omniscience to duck in behind an untended armored vehicle and steal a bag of money.

groundhog-day-armored-truck-heistHe’s able to do this because he can anticipate everything that’s going to happen (he’s lived it before). His model of the world is perfect and, indeed, if you look at the original script, the author intended him to have spent like 10,000 years going through this same day. He was supposed to be godlike. There’s even a line in the movie where Murray says, “Well maybe the real God uses tricks, you know? Maybe he’s not omnipotent. He’s just been around so long he knows everything.”

This is to say that, when you have a perfect model of the world, you can anticipate everything that’s going to happen. Nothing should be surprising. When someone faceplants directly into their wedding cake, you saw it coming.

Surprise is all about the violation of expectations, and if an expectation has been violated, an implicit model has been violated. And that means that you’re wrong about something.

Every surprise indicates something you’ve failed to take into account. Like when I found out that cats are lactose intolerant and shouldn’t be given milk, I was surprised. I had failed to connect some pieces of knowledge in my head, namely, that most animals don’t drink milk past infancy, so why wouldn’t they be lactose intolerant? In this respect, humans are a bit of an anomaly (and it’s a relatively recent anomaly, too, at about 7500 years old.) ## Recovering from surprise

Which brings me to my next point, which is that, given surprise indicates something wrong with your model of the world, whenever you’re surprised, you should fix your model.

Ask yourself, “How could I have anticipated this?” and, usually, once you’ve answered that question, you’ve fixed everything.

As an example, I like Moravec’s paradox, which was the discovery that it’s much easier to teach a computer chess than it is to teach a computer to walk or recognize faces, while the reverse is true for humans. In a sense, what’s easy for computers is hard for humans, and vice versa.

Why should this be the case? Well, recall that the prefrontal cortex is a relatively new part of the brain. Evolution has not spent too many computational cycles optimizing our ability to reason, and virtually none on chess (modern chess is only about 550 years old.)

On the other hand, motor skills, sight, object recognition, facial recognition, that sort of stuff — that’s gone through a lot of iterations, something like 410 million years of iterations. So, yeah, that one is going to be a bit harder for humans to reverse engineer and implement on a general purpose computer.

Moravec’s paradox, then, can be anticipated by considering how long, on evolutionary timescales, a “feature” has existed. Duplicating human sight? Probably difficult. Mathematics? Easy, especially stuff like multiplication, division, whatever. (Trickier when you get to, say, proving the Riemann hypothesis.)

The sheer superiority of surprise over other forms of noticing wrongness

Okay, so we’ve covered:

  • Surprises contain more information than the expected. * Surprises indicate incorrect models. * To fix incorrect models, ask how you could have anticipated a surprise.

Finally, let me point out the sheer superiority of surprise over the alternatives. Say you want to improve your model of the world, what are you going to do? Well, you could try to notice any tiny, nagging doubts you have about something, the sort that your mind quietly brushes over and you never even notice.

This is hard. I’ve spent more than a year meditating daily and I’m still not very good at it, and most of the time I don’t even notice those doubts until afterwards, when I’m like, “Huh, should have noticed that sound was his wooden foot and not rationalized it away as a funny brand of shoe.”

Surprise, on the other hand, demands your attention. You don’t even have to think about it. It turns out that your eyes and attached-frontal-brain-region-plus-amygdala automatically filter out nonsurprising information and direct your visual system towards surprising stuff in your environment. So you don’t even have to make an effort to notice something surprising. You just will.

why-i-like-surprisesYou could try to enjoy being wrong. Then you’ll naturally seek out opportunities for wrongness and wrongitude, chances to actively test your beliefs. There are people who say they can actually do this.

I suspect these people are just lying. I don’t like being wrong about anything. I have to make a conscious decision when someone disagrees with me to be like, “Wait, maybe they have a point and are objectively right, even though my monkey brain is too worried about status to admit to it.”

Surprise, on the other hand, is easy. Want to know some surprising information? Hell yeah, I want to know some surprising information. And, hopefully, you do, too, now that I’ve told you why I love surprises.

Further Reading

Web Roundup: Links For June

Prolonged Eye Contact and Attraction: What The Science Tells Us

 

 

valentines-day-squid

Belladonna means “beautiful woman” in Italian, but it’s also the name of a type of plant. The origins of the term belladonna are uncertain, but date back to at least 1554.

It’s been suggested (and this is my favorite theory) that the name might be related to belladonna’s use as a cosmetic. Women would consume the plant in order to dilate their pupils, in an attempt to enhance beauty.

The only problem? Belladonna (sometimes called nightshade) is poisonous.

Richard Pultney’s 1757 paper, “A brief botanical and medical history of the Solanum Lethale, Bella-donna, or Deadly Nightshade,” recounts this tale:

Its relaxing quality is very surprising, as appears by that memorable case… of a lady’s applying a leaf of it to a little ulcer, suspected to be of the cancerous kind, a little below her eye, which rendered the pupil so paralytic, that it lost all its motion for some time afterward: and that this event was really owing to that application, appears from the experiment’s being repeated with the same effect three times.

belladonna-in-eyeBut they were really onto something! This is the craziest part of the whole thing. (Suffering for fashion is passé.) Hess (1965) took two pictures of the same woman and presented it to male subjects and asked them to describe the woman

in the picture. The researchers altered the photos so that one had slightly larger pupils. By and large, the male subjects preferred the woman with the larger pupils.

Try it:

woman-small-pupils

woman-large-pupils

(The one on the bottom is the one that you’re supposed to find more attractive, although I’ve just terrifically biased you by telling you that.)

This has since been replicated at least five times.

Let’s just take a minute and reflect on this. Women in 16th century Italy anticipated the findings of modern scientific research by about 400 years. They not only discovered that belladonna reliably increases pupil size, but they also noticed that men were attracted to that.

I propose a hypothesis similar to the efficient markets hypothesis. We’ll call it the efficient beauty hypothesis: if a beauty-increasing cosmetic intervention exists, some enterprising individual somewhere will discover it.

You might wonder, then: are women interested in men with large pupils? Tombs and Silverman’s 2004 paper, “Pupillometry: A sexual selection approach” tried to answer this question. The paper includes this graph:

The realtionship between prolonged eye contact and attraction.

The relationship between prolonged eye contact and attraction.

You’ll notice that women find average pupil sizes (on men) the most attractive, while men subscribe to the Texan, bigger-is-better philosophy. The authors additionally report that, “Further investigation revealed that females attracted by large pupils also reported preferences for proverbial bad boys as dating partners.”

At this point, you might wonder why men find large pupils attractive. And, of course, evolution has good reason for that, as confirmed by a 2007 study:

We found an increase in mean pupil diameter for sexually significant stimuli during the fertile phase and this pupillary change was also specific to pictures of the participants’ actual sexual partners. Moreover, this effect was only seen for women who did not use oral contraceptives. These findings confirm that women’s attention for sexually significant stimuli is higher during their fertile phase of the menstrual cycle, and that changes in sexual interest are implicitly measurable using pupillometry.

Or, in plain English, fertile women tend to have larger pupils.

Motivation

In Elana Clift’s Honors thesis, “Picking Up and Acting Out: Politics of Masculinity in the Seduction Community,” she argues that the “pick up artist” movement is the result of the lack of available dating scripts for young men. Back in, say, Victorian England, everyone knew how this whole relationship thing worked. Today, we’re all horribly confused.

I was sorta convinced by that for a while, and I think that explains some of it, but now I’m plagued by doubt. Lots of pick-up strikes me as actively toxic. I mean, yeah, especially to women — there are a disproportionate amount of vocal misogynists associated with the “manosphere” generally, but I mean to men, too: Pick-up is an advertiser’s wet dream. Nothing sells better than insecurity, and what more poignant insecurity than masculine identity and status anxiety about attractiveness? (Whenever you hear the phrase “real” men, ask what they’re selling.)

Of course, my concerns here are hardly limited to men, although I’m more familiar with the struggles of young men everywhere. Cosmopolitan magazine is the female-equivalent of pick-up, telling young women that they need to fit into some sort of mold in order to attract a guy — that they shouldn’t answer the phone on the first ring or whatever — and I’m sure lots more nonsense which isn’t even on my radar, but probably ought to be.

Which brings me to the topic at hand: eye contact. These unsavory actors sell prolonged eye contact as some sort of panacea. An actual example I found with 10 seconds of googling: “Master These Eye Contact Techniques To Create Powerful Attraction,” complete with tips that the author promised “will blow my mind.” (Hint: they didn’t.) Another blog targeted at “Helping men reclaim their masculinity and their relationships,” (gag) includes this gem: “…strong eye contact is difficult to maintain if you do not have the confidence to back it up (thus making it an honest signal).”

Yeah, right. Because if you don’t maintain strong eye contact, it’s because you lack confidence, and definitely not because you haven’t yet mastered the serial killer’s thousand yard stare.

Frankly, this all smacks of the purest bullshit. Evolution has spent billions of years and computational cycles optimizing male-female relations. If maintaining eye contact with your crush is so effective, why don’t people just do it naturally? Could advising people to maintain strong eye contact be harmful? Maybe unnaturally strong eye contact just comes off as creepy.

I decided to find out.

The Evidence on Prolonged Eye Contact

An interlude during which the author does a lot of research.

My (somewhat begrudging) subjective feeling after reading through 5 or 6 relevant papers is that, yes, the pick-up artists are right, the majority of men ought to be making more eye contact. The case for women is less clear. As far as I can tell, too much eye contact is always better than too little, and eye contact combined with a smile is difficult to get wrong.

My neat evolution-has-optimized-eye-contact argument has at least one damning flaw: children learn the association between eye contact and liking. It’s not innate.

The association between gaze and liking appears to be learned. Children do not use eye contact to judge affiliation and friendship until about age 6 (Abramovitch & Daly, 1978; Post & Hetherington, 1974).

Now, is there such thing as too much gaze? Yes. Moderate gaze is better than constant gaze:

Gaze also influences people’s liking for each other, with moderate amounts of gaze generally preferred over constant or no gaze (Argyle, Lefebvre, & Cook, 1974; Exline, 1971).

Bu-u-u-ut constant eye contact is still better than no eye contact:

British college students rated a same-sex peer they met in an experiment as more pleasant and less nervous when the person gazed at them continuously rather than not at all (Cook & Smith. 1975).

Compare that with a mock interview study, which had students either exhibit low, natural, or high gaze. Notably, researchers defined high gaze here as near-constant eye contact. They found no difference in likability between normal and high gaze:

High-levels of gaze do not differ from normative gaze patterns in earning more favorable endorsements for hiring from an interviewer, in conferring greater credibility, in increasing attraction and in receiving favorable relational communication interpretations.

Indeed, there were even some benefits to near-constant gaze. Interviewers labeled near-constant gazers (not to be confused with goats) as more attractive, more intimate, and more dominate than those who displayed normal levels of eye contact. So, again, more evidence that too much eye-contact is way better than too little.

Those who make lots of eye contact are even judged to be more intelligent (!):

Wheeler, Baron, Michell, and Ginsburg (1979) reported a positive correlation between an interviewee’s eye contact with an interviewer and estimates made by observers of the interviewee’s intelligence.

And it’s not even confined to those you look at. If someone sees you making a lot of eye contact with someone, they’ll like you more than if you didn’t:

The positive feelings associated with gaze generalize to observers, who favor people when they gaze at moderate rather than low levels while approaching others (Gary. 1978a) or in social interactions (Abele, 1981; Shrout & Fiske, 1981).

Of course, people like it most of all when you look at them, which a 2005 study, “The look of love: gaze shifts and person perception,” verified.

Ratings of likability were elevated when social attention was directed toward rather than away from the raters.

In the same study, men rated women who paid attention to them not only as more likable, but more attractive, too:

Whereas gaze cues elevated ratings of likability among both male and female participants, only the men displayed gaze-related effects on person evaluation when the physical attractiveness of the targets was assessed.

Here’s another belief I held that turns out to be wrong. I’ve observed that people look at the speaker while listening, and look away while speaking. But this turns out to be totally okay to violate (surprise!) and you can stare all the time if you want (or, at least, high status people do it):

Equivalent amounts of gazing while speaking and listening were found with research participants who were given high status or who were discussing issues on which they had expertise (Ellyson, Dovidio, & Corson, 1981; Ellyson, Dovidio, Corson. & Vinicur. 1980).

And more eye contact makes you more powerful:

Dovidio and Ellyson (1982) reported that high gazing-while-speaking ratios were directly related to ratings of power in an interaction.

Want to make friends? Have you tried staring at people?

College women gazed more at a female confederate when they were trying to make friends (Pellegrini, Hicks, & Gordon, 1970), and college men gazed more at a woman when they wanted to interest her in a social conversation (Lefebvre, 1975).

It even holds for imaginary friends!

Mehrabian (I968a, 1968b) reported that research participants gazed more when they approached an imaginary person they liked rather than disliked.

And real ones, too:

Russo (1975) reported greater amounts of eye contact between elementary school children who were friends rather than nonfriends.

What does eye contact mean, though?

While doing keyword research for this, I noticed that a lot of men and women are confused about what prolonged eye contact means. Does it indicate sexual interest? Well, it definitely can!:

Participants in a study by Griffitt, May, and Veitch (1974) gazed more at opposite-sex peers when they had previously been exposed to sexually arousing slides.

It might even imply that you’re smokin’ hot (and trust me, gentle reader, you totally are):

Coutts and Schneider (1975) reported positive correlations between gaze directed by research participants toward opposite-sex peers and experimenter ratings of the peers’ physical attractiveness.

But not always. People will look at you more even if you’re just plain nice to them:

People gazed more after receiving positive evaluations (Coutts, Schneider, & Montgomery, 1980; Exline & Winters, 1965; Walsh et al., 1977) or warm nonverbal responses (Ho & Mitchell, 1982).

Is eye contact ever bad?

Even if you’re hitchhiking, more eye contact is better:

Drivers were more likely to stop for gazing hitchhikers (M. Snyder, Grether, & Keller. 1974), pedestrians were more likely to help a gazing experimenter pick up dropped coins (Valentine, 1980) and dropped questionnaires (Goldman & Fordyce, 1983), and bystanders were more likely to help an injured gazing jogger (Shetland & Johnson, 1978).

Or when you’re buying cereal, according to the 2014 study, “Why Is Cap’n Crunch Looking Down at My Child?”:

We showed that eye contact with cereal spokes-characters increased feelings of trust and connection to the brand, as well as choice of the brand over competitors

Now, you might wonder: are there ever times where you shouldn’t make so much eye contact? Well, when waiting for a green light:

Ellsworth et al., (1972) and Greenbaum and Rosenfeld (1978) had experimenters stand on street corners and gaze constantly or not at all at pedestrians and motorists who were waiting for a red light. When the light changed to green, pedestrians and drivers crossed the intersection significantly faster when they had received constant gaze from the experimenter.

But just dress nice and you’re okay:

For example, pedestrians did not cross the street as fast to escape a staring experimenter when the experimenter was dressed and made up to be physically attractive (Kmiecik, Mausar, & Banziger, 1979)

Or add a smile:

People were also less likely to avoid a staring experimenter when the experimenter smiled (Elman, Schulte. & Bukoff. 1977).

Sex Differences

It turns out, though, that there are sex differences. Women (on average) respond positively to lots of eye contact, while men prefer less. For instance, if you want a female friend to reveal all her secrets, eye contact is good:

Female speakers disclosed more personal information about themselves to listeners who gazed. Female speakers also liked gazing listeners more than nongazing listeners. (Ellsworth and Ross 1975)

For men, though, the opposite is true:

Male speakers, in contrast, disclosed more and felt greater liking when the listener did not gaze.

A similar phenomenon holds with asking for help when picking up coins:

For example, women gave more help in picking up dropped coins to a female experimenter who gazed at them (Valentine & Ehrlichman, 1979). Men gave more help to a male experimenter who did not gaze at them.

Women even like it when they’re told that a man looked at them an unusually high amount:

Kleinke et al. (1973) introduced college men and women in pairs and left them in a room to get acquainted. After their conversation, an experimenter told participants that one person (whose gaze was supposedly recorded through a one-way mirror) had gazed at the other person an unusually high, an average, or an unusually low amount of the time. Women were most favorable toward men whose gaze had ostensibly been high.

But not men:

Men’s reactions were exactly opposite. Men were most favorable toward women when they were told the woman’s gaze or their own gaze had been low.

I wonder if this is just male insecurity? If I was told some chick had been staring at me, I might wonder, “Is there something wrong with my hair? Has one of my legs grown two legs and walked off of its own volition?”

Does eye contact cause love?

To see is to devour.

—Victor Hugo, Les Misérables

Finally, though, what you really want to know: if I maintain eye contact with my crush, will they fall madly and deeply in love with me? Well, sorta. If you convince someone to maintain eye contact with you for ~2 minutes, they’ll (on average) be more attracted to you. The experimenters in this study told their subjects to maintain eye contact in order to “tune their extra-sensory abilities” and, afterwards, they rated their partners as significantly more attractive than controls. Hey, worth a shot, right?

Actually, it turns out, just tricking your crush into thinking they look at you a lot is enough. (“Hey, Maria, why do you keep looking at me? Is it because you’re in lo-o-o-ove with me?”)

In one of these, Kleinke, Bustos, Meeker, and Staneski (1973) did not actually induce their subjects to gaze at their partners. Instead the subjects were told that they had done so. This produced modest increases in attraction for the partner.

Further Reading

  • If you want to settle down with a book on relationships, the best scientific overview I’ve read is the Handbook of Relationship Initiation. For lighter fare, The Moral Animal is pretty entertaining.
  • If you liked this, you’ll love the Social Issues Research Centre’s “Guide to Flirting.”
  • If you want to dive into the original sources for yourself (or look up references), start with “Gaze and eye contact: a research review,” which is where the bulk of this information came from. (Where it didn’t, I’ve indicated in the text.)
  • One of the most useful bits of research to come out of the study of human relationships is the notion of the “mere exposure effect” which suggests that the more you see someone (or something), the more you’ll come to like them.

100+ Interesting Data Sets for Statistics

If we have data, let’s look at data. If all we have are opinions, let’s go with mine.

—Jim Barksdale

I’m not too fond of the phrase “information age.” It sounds like someone sat down and was like, “Hey, there’s a ton of information today… what should we call it? How about the information age?”

First of all, that’s just lazy and, second of all, it doesn’t capture how overwhelming it all is, the sort of angst and helplessness you feel when confronted with… everything. Just all of it.

A phrase that captures it a bit better is “drinking from the firehose.” I haven’t ever tried to drink from an actual firehose, but the metaphor certainly seems apt.

firehose

Maybe instead of information age, we could call it the saturation age, you know, because our brains are full to bursting. Or maybe just the overload age. Or how about the age of inundation?

One thing is certain, anyways. Some of us are drowning in data, most of us are oblivious, and some lucky few are surfing on it.  We can do things that we couldn’t in the past (e.g. without Project Gutenberg, neither of my two analyses of the relationship between creativity and compression would have been possible.)

And that got me wondering: just what other interesting data sets are out there? As part of my research, I decided to put together this sort of guided tour, a curated list if you will — adding a bit of structure to the firehose’s deluge.

Here’s my attempt at making it all just a bit more manageable.

Interesting Data Sets

interesting-data-sets-for-statistics

  • If, tomorrow, you get an email congratulating you on your new status as future Jeopardy contestant, how are you going to prepare? Well, one approach might be to download this archive of 216,930 past Jeopardy questions and plug them into your favorite spaced repetition system. Combine that with reading up on Jeopardy betting strategies, and you’re well on your way to becoming the next Arthur Chu (except hopefully nicer).

  • Ever get a morbid curiosity about what it’s like to be on death row? (Yeah, me neither.) But in case you ever have, Texas has graciously placed the last words of every inmate executed since 1984 online. So… sentiment analysis, anyone? (“How upbeat are death row inmates days before execution? With a little help from some data, we found out!”)

  • Speaking of prison, there’s more data on prisoners, including information about “their current offense and sentence, criminal history, family background and personal characteristics, prior drug and alcohol use and treatment programs, gun possession and use, and prison activities, programs, and services” available here.

  • How about reading other people’s emails? Ever wanted to do that, but can’t be bothered to train l33t hacking skills (and never mind the legality of it)? (Okay, this one I have thought about.) Well, I’ve got you covered. Check out the Enron corpus. It contains more than half a million emails from about 150 users, mostly senior management of Enron, organized into folders. Wikipedia calls it “unique in that it is one of the only publicly available mass collections of ‘real’ emails easily available for study.” Business idea: figure out what sort of information gets leaked in the emails that will later harm the execs at trial or whatever, then build a software system to automatically mine those out of real email. Either sell it to law enforcement or to corporate executives as the finest cover-your-ass email system.

  • Wondering what the internet really cares about? Well, I don’t know about that, but you could answer an easier question: What does Reddit care about? Someone has scraped the top 2.5 million Reddit posts and then placed them on GitHub. Now you can figure out (with data!) just how much Redditors love cats. Or how about a data backed equivalent of r/circlejerk? (The original use case was determining what domains are the most popular.)

  • Speaking of cats, here are 10,000 annotated images of cats. This ought to come in handy whenever I get around to training a robot to exterminate all non-cat lifeforms. (Or, if you’re Google, you could just train a cat recognition algorithm and then send those users cat-specific advertising.)

bad-dalek-drawing

  • If you’re interested in building financial algorithms or, really, just predicting arbitrage opportunities for one of America’s largest cash crops, check out this data set, which tracks the price of marijuana from September 2nd, 2010 until about the present.

  • Who’s using what drugs and how often?

  • The earliest recorded chess match dates back to the 10th century, played between a historian from Baghdad and a student. Since then, it’s become a tradition for moves to be recorded – especially if a game has some significance, like a showdown between two strong players. As a consequence, today, students of the game benefit from one of the richest data sets of any game or sport. Perhaps the best freely available data set of games is known as the “Million Base,” boasting some 2.2 million matches. You can download it here. I can imagine an app that calculates your chess fingerprint, letting you know what grandmaster your play is most similar to, or an analysis of how play style has changed over time.

  • On the topic of games, for soccer fans, I recently came across this freely available data set of soccer games, players, teams, goals, and more. If that’s not enough, you can grab even more data via this Soccermetrics API python wrapper. I imagine that this could come in handy for coaches attempting to get an edge over opponent teams and, more generally, for that cross-section between geeks and gamblers attempting to build analytic models to make better bets.

  • Google has put made all their Google Books n-gram data freely available. An n-gram is an n word phrase, and the data set includes 1-grams through 5-grams. The data set is “based originally on 5.2 million books published between 1500 and 2008.” I can imagine using it to determine the most overused, cliche phrases, and those phrases that are in danger of becoming cliched. (Quick! Someone register the domain clichealert.com!)

  • Amazon has a number of freely available data sets (although I think you need to run your analysis on top of their cloud, AWS), including more than 2.8 billion webpages courtesy Common Crawl. The possibilities are endless, but an old business idea I had: analyze the Common Crawl data and determine cheap or not-currently-registered domains which are, for whatever reason, linked to buy many websites. Buy these up and then resell them to people involved in SEO. (Or you could, you know, try to build the next Google.)

  • How well do minorities do on the computer science advanced placement exam? You can find out and tell me.

  • There’s the Million Song data set, which contains information about a million different songs, including a metric “danceability.” Might be nice to pair that with a media player specialized for parties — start with “conversation” music, and slowly shift to more danceable stuff as the night drags on. The data could also be used for a clustering algorithm (automatic genre detection, maybe), but I’m not sure how useful that’d be. A number of people have tried to build recommendation algorithms based on the data, including Kagglers and a team from Cornell. One possible use: analyzing music by year — How danceable, fast, etc. were the 70s? 80s? 90s? (Or how about looking for a follow-the-leader effect. If one song goes viral with a unique style, do a bunch of copycats follow?)

  • Speaking of music data sets, last.fm has music data available. Collected from ~360,000 users, it’s in the form of “user, artists, ## of plays”. This would be good for clustering algorithms that automatically determine label genre or recommender systems. (Even a “this artist is most similar to” thing would be sorta cool.)

  • When I think geeks, I think math and computer geeks, but there are many more. Terry Pratchett geeks (dated one!), Whovians, anime geeks, theater geeks and, with some relevance to this next data set, comic book geeks. Cesc Rosselló, Ricardo Alberich, and Joe Miro have put together a “social graph” of the Marvel Universe, and the data is freely available. Ideas for use: Maybe it could be overlaid on Facebook’s social graph to produce a new take on the “What superhero are you?” quiz.

  • Yelp has a freely available subset of their data, including restaurant rankings and reviews. One business idea: use tweets to predict restaurant star ratings. This would enable you to build out a Yelp competitor without requiring an active user base — you could just mine Twitter for data!

  • If you’re interested in data about data (metadata!), Jürgen Schwärzler, a statistician from Google’s public data team, has put together a list of the most frequently searched for data. The top 5 are school comparisons, unemployment, population, sales tax, and salaries. I was surprised that school comparisons were number 1 but, then again, I don’t have any brats running around (yet?). This list would be a good first step in researching what sort of data comparisons people actually care about.

  • Some of my readers are, no doubt, evil geniuses. Others want to save the world. There’s a subset of both of these groups who are interested in superintelligent robots. But to build such a robot, you’re going to have to teach it facts. All the things we take for granted, like that every person has one father. It would be a pain to insert those 10 million facts by hand (and, at a fact a minute, take more than 19 years). Thankfully, Freebase has done part of the job for you, making more than 1.9 billion facts freely available.

  • Maybe your plans are slightly less ambitious. You don’t want to build a superintelligent machine, just one smarter than your run of the mill mathematician. If that’s the case, you’re going to need to teach your machine a lot about mathematics, probably in the form of proofs and theorems. In that case, check out the Mizar project, which has formalized more than 9400 definitions and 49000 theorems.

  • And let’s say you build this mathematician and, sure, it can help you with proofs, but so what? You long for someone you can connect with on a deeper level. Someone who can summarize any topic imaginable. In that case, you might want to feed your robot on Wikipedia data. While all of Wikipedia is freely available, DBpedia is an attempt to synthesize it into a more structured format.

  • Now, you get tired of mathematics and Wikipedia. It turns out that proofs don’t pay the bills, so instead you decide to become a software engineer. Somehow, though, you’ve managed to build these machines without ever a rudimentary understanding of programming, and you want a machine that will teach it to you. But where to find the data for such a thing? You might start with downloading all 7.3 million StackOverflow questions. (Actually, all the StackExchange data is freely available, so you could feed it more math information from both MathOverflow and the other math stackexchange. Plus statistics from Cross Validated, and so on.)

  • Ever wanted to study true friendship? (C’mon! Free your inner child social scientist.) Y’know, genuine, platonic love, like the kind embodied by dolphins? Well, now you can! All thanks to your humble author and Mark Newman, who’s placed a network of “frequent associations between 62 dolphins in a community living off Doubtful Sound, New Zealand.” Business idea: Flippr. It’s like Facebook, but for dolphins, with plans to expand into emerging whale and sea turtle markets. Most revenue will come from sardine sales.

  • Do left-leaning blogs more often link to other left-leaning blogs than right-leaning ones? Well, I don’t know, but it sounds reasonable. And, thanks to permission from Lada Adamic, you can download her network of hyperlinks between weblogs on US politics, recorded in 2005. (Or you could just read her paper. Spoilers: conservatives more freely link to other conservatives than liberals link to liberals so, if you’re interested in link building, maybe you should register Republican.1

    )

  • Who’s friendlier: the average jazz musician or the average dolphin? You could find out by combining the dolphin data set mentioned earlier with Pablo M. Gleiser and Leon Danon’s jazz musicians network data set.

  • What about 1930s southern women or prisoners? Who’s friendlier? How about fraternity members or HAM radio operators? All this and more can be figured out with these network data sets.

  • How about dolphins or Slashdotters?

  • Web 2.0 websites (like Reddit) are sometimes gamed by “voting rings,” which are groups of people that intentionally vote up each other’s content, regardless of quality. I’ve often wondered if the same thing happens in academic circles. Like, you know, one night during your first year in grad school, you’re kidnapped in the middle of the night and made to swear a blood oath that you’ll cite every other member of the club. Or something. Well, Stanford has put online Arxiv’s High Energy Physics paper citation network, so you could find out.

  • You read this blog, so you’re pretty smart, right? And maybe you’d like to be rich, you know, so you can found the next Bill and Melinda Gates Foundation and save the world. (Because that’s why you want to be rich, right?) Well, then maybe you ought to develop some new-fangled trading algorithm and pick up like a trillion pennies from in front of the metaphorical steam-roller that is the market. (Quantitative finance!) But, in such a case, you’d better at least test your strategy on historical market data. Market data which you can get here.

  • The Open Product Data website aims to make barcode data available for every brand for free. Business idea: a specialty tattoo parlor that only does barcode tattoos, but lets customers pick whatever product they want. Think about it: “What’s your tattoo mean?” “It’s a Twinkie barcode, because Twinkies last forever, man, just like my faith.”

  • The European Center for Medium-Range Weather Forecasts has an impressive looking collection of weather data. Why, you ask, does the weather matter? The economic incentives for predicting the weather are absurd. When should you plant crops? Plan a big event? Launch a space shuttle? Go deep sea fishing? But I want to talk about the most fun application of weather data I’m aware of: The financial industry. I have a lot of respect for finance, mostly because of the crazy stuff they do. The only practical application of neutrinos I’ve heard of, for instance, is “because finance.” Should your algorithm buy Indonesian sesame seed futures? With weather data, it might know.

complaints-about-weather

  • If you need nutrition data about food, the USDA has you covered. Business idea: A phone application called, “Am I allergic to that?” Then, lobby for your state to pass some law regulating each school into buying a license of it for every student.

  • For a wordsmith, a good dictionary is indispensable, and when it comes to word data, you could do a lot worse than check out the freely available WordNet. WordNet has significant advantages over your run of the mill dictionary as it focuses on the structure of language, grouping words into “sets of cognitive synonyms (synsets), each expressing a distinct concept.” It also has some information about relationships, such as “a chair has legs.”

  • We’ve already established that some of you are evil geniuses, in which case, where are you going to build your secret lair? I mean, a volcano is pretty cool, but is it evil and genius enough for competing in today’s modern world? You know what the other evil geniuses don’t have? A secret base on a planet outside of the solar system. With NASA’s list, you can get busy commissioning someone to build you a base on KOI-3284.01.2

  • The Federal Railroad administration keeps a list of “railroad safety information including accidents and incidents, inventory and highway-rail crossing data.” Someone (like the NY Times) could overlay this on a map of the United States and figure out if people in poor regions are more likely to be hit by trains or something.

  • If you need a database of comprehensive book data, perhaps to build a competitor to Goodreads or an online digital library, the Open Library allows people to freely download their entire database.

  • Who is the United States killing with drones? If you’re content with Pakistan specific data, there is a list of drone strikes available here.

  • If you’re interested in building a Papers2 competitor with support for automatically importing citation data (please do this), CrossRef metadata search might be a good place to check out.

  • Mnemosyne is a virtual flash card program that takes advantage of spaced repetition to maximize learning. (As you might recall, I’m a big fan of spaced repetition.) The project has been collecting user data for years, and gwern has graciously agreed to freely host the data for a few months. Perhaps one could run some sort of unsupervised learning algorithm over it and try to discover heretofore unknown information about human memory.

  • How much would it cost to hire Justin Bieber to play at your wedding? The fine lads at Priceconomics have figured out how much it would cost to hire your favorite band. You could take this data and calculate some sort of popularity to price ratio — What’s the most fame for your buck?

  • I’ve mentioned in a few of the other data sets just how lucrative it is to be able to better predict the stock market than everyone else. In 2011, researchers found that they could use data from twitter to do just that: they went through tweets, found one’s related to publicly traded companies, and then calculated a mood score. With this they write, ” We find an accuracy of 86.7% in predicting the daily up and down changes in the closing values of the DJIA.” A number of Twitter data sets are freely available here.

  • A 2014 paper by Clifford Winston and Fred Mannering reports that vehicle traffic costs the United States 100 billion dollars each year.3 There’s money to be made, then, in routing traffic more efficiently. One way to do this would be to feed an algorithm historical traffic data and then use that to predict hotspots, which you would route people around. Lots of that data is available on data.gov.

  • On the other hand, if you were building an app to track current traffic data, you’ll need a different data source.

  • If you want to launch a spam-fighting service, or maybe just analyze what type of emails spammers are sending, you’ll need data. UC Irvine has you covered.

  • But maybe you want to extend your spam-fighting service to text messages. Still got you covered.

  • There is a wealth of data sets available for R and all you have to do is install a package. Ecdat is one of those packages, containing gobs of econometric data. How about an analysis of how math levels correlate with number of cigarettes smoked? I’d read that.

the-erdos-peak

  • Ever wondered about how one person will be on the board of directors of several companies and it’s like, hey, maybe Condoleezza Rice with her ties to government surveillance isn’t the best choice for Dropbox? What if you could analyze those connections? Well, with this data set, you can. But only for Norway — it’s a network of the board members of public companies in Norway.

  • Ever seen a TV show where a government determines that someone is a terrorist based on their social ties? I always figured that data would be locked down tight somewhere, y’know, classified. But it turns out it isn’t. You, too, can analyze the social networks of terrorists.

  • There’s been a fair bit of controversy around all the bureaucracy of Wikipedia. But how does one become a bona fide Wikipedia big shot? Who’s the ideal Wikipedia administrator? Well, they’re voted for, and the data is available for download.

  • Harvard has opened up its set of “over 12 million bibliographic records for materials held by the Harvard Library, including books, journals, electronic resources, manuscripts, archival materials, scores, audio, video and other materials.”

  • If you need small data sets for students, check out DASL. One at random: does sterilizing dominant males in a wild mustang population reduce the population?

  • GET-Evidence has put up public genomes for download. I think Steven Pinker’s data is in there someone. Maybe you could make yourself a clone?

  • Oh, and speaking of genomes, the 1000 Genomes project has made ~260 terabytes of genome data downloadable.

  • In what is the smallest data set on this list, the survival rates of men and women on the Titanic. Female passengers were ~4x times more likely to survive than male passengers.

  • Want an super specific breakdown of the contents of your food? You’re in luck. (Thanks Canada!)

  • There’s a similar database of the metabolites in the human body. I’m not sure what you could do with it, but it might come in handy in some sort of dystopian future where humans are raised like cattle for their nutrients. (Maybe someone could use this to build a viral marketing campaign along the lines of, “How nutritious is your mom?”)

calories-in-a-human* The Reference Energy Disaggregation Data Set has about 500 GB of compressed data on home energy use. Obvious use candidates: improving home efficiency or creating a visualization of just where people’s energy bills are going.

computer-declares-war

  • Did you know that you can download all the PDFs on Arxiv? Once we manage to teach machines natural language, we can just have a computer read it all and give us the cliff notes (and the scientific breakthroughs).

  • If you need economic census data on any industry, check out census.gov’s industry statistics portal. If finance is really evil, you ought to be able to find something damning in the data.

  • For those unfamiliar with Usenet, it’s sort of like a huge, text-only forum. It was much more popular before the rise of the world wide web. Anyways, you can download a huge data set of postings to Usenet here. It might be pretty good for some kind of textual analysis project or training a machine learning algorithm (maybe a spellchecker?) You could use the data to build out a Google Groups competitor, too.

  • Nick Bostrom has a very interesting paper called “Existential Risk Prevention as Global Priority.” The basic intuition is that preventing even small risks of human extinction is worthwhile if we consider all the human generations it would save. One way to start saving all those future lives might be by digging into this data set of every recorded meteor impact on Earth from 2500 BCE to 2012.

  • How do gender and mental illness affect crime? This data set was collected explicitly with that question in mind.

  • Speaking of mental health, if you’re interested in how it affects minorities specifically, try this.

  • There are a lot of lonely men and women out there, and some of those lonely men and women have excellent analytical skills. For those lonely people, I suggest using this data set, which “surveyed how Americans met their spouses and romantic partners, and compared traditional to non-traditional couples” to determine the best way to meet that special someone.

drawing-of-world-news-tonight

  • Tons of data on what is called “adolescent health” available here, but is actually more, including a bunch of relationship data and biomarkers. (Not creatine levels, unfortunately.)

  • Here’s a question: Are modern jobs worse than those of the past? My grandparents built tires at Firestone. Today, people rarely have that level of control and visceral experience of the finished product of their work. This set of five surveys regarding how different groups experience employment could answer that question. I can see the article now — “Is everything getting slightly worse? We found out.”

  • Stanford has 35 million Amazon reviews available for download. Lot’s of stuff you could do with this: use it to improve recommendation algorithms, figure out whether or not there’s a follow-the-leader effect with reviews (i.e. Do early positive reviews beget more positive reviews?)

  • Based on some of my research prior to writing this, the google keyword “data sets on serial killers” is 1) really specific and 2) weirdly popular, but I guess there’s no accounting for taste. And, of course, we’ve got data for that, thanks to the Serial Killer Information Center.4

  • In this gruesome vein, the University of Maryland has a “Global Terrorism Database,” which is a set of more than 113,000 terror incidents. You can download it after filling out a form. Ideas for use: visualization of terror incidents by location over time, predicting and preventing terror attacks, and creating early alert systems for vulnerable areas.

  • The MNIST Database is a classic in the field of machine learning. It’s a set of labeled hand-written characters, which are necessary for OCR algorithms. Today, some algorithms are actually more accurate than human judges! This would have been nice to have back when I was in grade school. I distinctly recall once arguing with a teacher over missing a question because she insisted that I had written the letter j when it was clearly a d. In the future, we’ll let the machines decide.

  • UCI has a poker hand data set available. My poker-fu is fairly weak, but I’m sure there’s some interesting analysis to be done there. I’ve heard second hand that humans still maintain some advantage over machines when it comes to poker, but I’m unable to verify that via Google. Machines have won in at least one tournament.

  • Another data set from UCI: images labeled as either advertisements or non-advertisements. This is good for building up classification algorithms that decide whether or not a new image is an ad or not, which might be good for, say, automatic ad blocking or spam detection. Or maybe a Google Glass application that filters out real life advertisements. That’d be cool. Look at a billboard and instead see a virtual extension of the natural landscape.

  • Remember the whole Star Wars Kid debacle? Wikipedia informs me that Attack of the Show rated it the number 1 viral video of all time. Andy Baio, one of the guys who was in on it before it was cool and coined the phrase “Star Wars Kid” has made his server logs from the time publicly available. Someone could take this data and produce a visualization of who saw it when via maps, along with annotations of where the traffic was coming from.

  • Who’s linking to who (and what) on WordPress? (Tidbit: most of the links to this site come from WordPress blogs.) With this WordPress crawl, you can find out. Visualizing the network might be sorta cool, but it’d be cooler still to uncover some information about “supernodes” that either are linked to often or put out a lot of links (or maybe both). Or maybe clustering people by interest.

  • Is Obama in bed with big oil? Or extremist environmentalists? Or the corn lobbies? And who was backing that Herman Cain dude, anyways? The 2012 Presidential Campaign Finance data is available for download. It would be neat to see an analysis of what industries prefer what candidates.

  • Which private colleges are the best value?

  • Which public colleges are the best value?

  • Cigarette data by state. Kentucky smokes the most, with West Virginia as a close second. Given the massive social harm of tobacco, a good analysis could very well save a lot of lives.

superhero

Further Reading

Footnotes

  1. With apologies to JFK: “Let us seek not the Democrat link or the Republican link, but the right link.”

  1. Wikipedia says: “KOI-3284.01 is believed to be the most Earth like exoplanet to be found so far by the Kepler space probe. It is predicted to have a radius 1.5 times that of Earth’s. It is predicted to be located at the proper distance from the sun to sustain liquid water.”

  1. “The Texas Transportation Institute’s latest Urban Mobility Report puts the annual cost of congestion to the nation, including both travel delays and expenditures on fuel, at more than $100 billion.”

  1. If that’s not enough, there seems to be a fair amount of research around “murder topology” which is not, as you might naively expect, a super badass branch of mathematics, but rather concerned with the movement patterns of serial killers.

Surprisingly Dangerous Jobs In America

You can’t avoid danger.

—Jeannette Walls, Half Broke Horses

Yeah, you can. Don’t get one of these jobs, for instance.

—me

David Henderson has rightly earned the title contrarian with his latest post which, to kick off National Police Week, points out that it’s more dangerous to be a farmer than a policeman — “For every 100,000 police, the annual fatality rate is 20. For every 100,000 farmers, it is 40% higher, at 28.” (Source.)

Now, on this blog, we’re good empiricists, and nothing warms the heart of an empiricist more than refuting a well-known, common sense “truth” with, you know, observations and data.

So that got me thinking: What jobs are more (or less) dangerous than one might naively suspect?

I present to you this delightful graph, taken from the Bureau of Labor Statistics:

jobs-by-dangerNotice that the data here for farmers agrees with David’s. He has 28 per 100,000 versus the charts 25.3 per 100,000. (And given the endemic underreporting, the 28 number might well be more accurate.) Police officer isn’t included on the chart but David’s data would make it about as dangerous as… taxi driver. That’s right, folks. The brave folks keeping the peace of our nation? Just as brave as your local cabby. (Actually, given that police deaths dropped 20 percent in 2012, cab drivers might be braver.)

I’m going to propose we replace Police Week with Fisherman Week, because it’s about 6 times more dangerous to be a fisherman than a police officer. (And who doesn’t love a good tuna steak?)

Or maybe we should keep Police Week, but dedicate 6 weeks to celebrating fishermen. It’s only fair. And, of course, three weeks to pilots and people involved with flying, along with a solid two weeks for garbage men.

Some other fun facts

Digging a bit further into the data, we find this somewhat troubling statistic: 92% of workplace fatalities are men. (Do we blame the patriarchy for this one?)

And if you were wondering what state is the most dangerous: North Dakota. It’s about as dangerous to work in North Dakota as it is to be a police officer. From the AFL-CIO’s “Death on the Job” report:

Among all of the states, North Dakota stands out as an exceptionally dangerous and deadly place to work. The state’s job fatality rate of 17.7100,000 workers is alarming. It is more than five times the national average and is one of the highest state job fatality rates ever reported for any state.

So probably we should have a week celebrating North Dakatons, too.

Further Reading

Web Roundup: More Links For May

Curiosity is only vanity. Most frequently we wish to know but to talk.

—Blaise Pascal, Pensées

The Unreasonable Effectiveness of Checklists

checklist-salesDr. Peter Provonost had a problem. People were dying and — to borrow a line from Fight Club — not in the Sylvia Plath, Tibetan Buddhist, we’re-all-dying-so-get-used-to-it sense of the word.

No, this is hospital kind of death we’re talking. I mean death in all of its macabre horror. You know, the horror we cover up with euphemisms like “passing away” and pretend that white sheets and a sterile environment somehow make the notion of oblivion no longer panic-inducing. That kind of death.

And not the inevitable sort. Not of the “his body just gave out” or “there was nothing we could do” kind of death, although I’m sure plenty of attendings fell back on that convenient cliche. No, I mean preventable death. Death of the there-but-for-the-grace-of-unwashed-hands-now-I’m-dead kind. I mean the kind where you’re in the hospital for a routine procedure and some dumbass with 15 years of schooling forgets to wear gloves so now you’re profoundly, absolutely dead. That kind. The sort of death where if the average doctor had one more percentile of conscientious you wouldn’t be dead because he wouldn’t have killed you.

The sort of deaths that define why hospitals are a dangerous place.

That sort of death was Dr. Provonost’s problem. Mistakes were killing people at his hospital. Not some podunk care center, either, but critical care at Johns Hopkins.

So he did the obvious, boring thing. He implemented a checklist for one basic-but-still-error-prone-and-infectious procedure, inserting a central venous catheter, and everyone had to follow it. And this checklist of his wasn’t complicated. These weren’t instructions where, in order to understand them, you need to rack up the equivalent of the GDP of a small nation state in medical school debt. There were five whole things, and they boil down to two: clean yourself and the patient, wear a mask and gloves. Not super tricky, only-clever-people-know steps.

These were the hospital equivalent of brushing your teeth before bed and wearing deodorant. The absolute basics. Stuff everyone is supposed to do, but sometimes people forget. Except when you forget to wear deodorant at a hospital, it’s a lot worse than spending a day fretting over whether or not your crush has discovered that your natural smell is not coffee-cinnamon-woodland, but something decidedly funky. When you forget at a hospital, someone catches Legionnaires’ disease and dies forever.

And maybe you’re skeptical: “A checklist for five things? I can remember five things no problem. How many mistakes could doctors possibly be making?” (And that’s how I know you’re not a programmer.) But you wouldn’t be alone. Dr. Gawande, a surgeon at Brigham and Women’s Hospital in Boston told the New York Times, “It seemed silly to make a checklist for something so obvious.”

Except, you know, this stupid checklist of five whole things totally worked. After a year, the rate of infection on this specific procedure dropped from 11 percent to zero. By two years, it had saved the hospital 2 million dollars, prevented 8 unnecessary deaths, and avoided 43 infections. Consequently, the hospital implemented still more checklists – reducing the average ICU stay by half and saving 21 lives.

A 2009 study duplicated this success in 8 other hospitals: “its use improved compliance with standards of care by 65% and reduced the death rate following surgery by nearly 50%.”

Checklists are awesome.

How awesome are they? A brief review

The bulk of the evidence for the effectiveness of checklists comes from medicine and is relatively recent. While other disciplines, such as aviation and engineering, have long used checklists, they haven’t bothered to actually vet that they work. A 2002 study puts it this way:

Aviation safety … was not built on evidence that certain practices reduced the frequency of crashes (but) relied on the widespread implementation of hundreds of small changes in procedures, equipment and organization (to produce) an incredibly strong safety culture and amazingly effective practices. These changes made sense; were usually based on sound principles, technical theory or experience; and addressed real-life problems, but few were subjected to controlled experiments

This is less surprising when we consider when the pre-flight checklist was implemented. They’ve been a constant in the airline industry since 1937.

Even newer and emerging disciplines, like software engineering and quality assurance, have done little to empirically verify the effectiveness of checklists. A 2007 paper, “Best Practices in Code Inspection for Safety-Critical Software,” is a typical example. Though it focuses solely on the use of checklists to improve software quality, it presents no evidence on the actual effectiveness of checklists. Similarly, a 1999 review further calls using checklists to inspect software a “best practice,” but again assumes their efficacy.

Reassuringly, though, the evidence from medicine is near overwhelming. Checklists have been found effective in scenarios as diverse as oxytocin administration to pregnant mothers (which decreased the rate of cesarean delivery by a quarter), actually giving patients medication, measuring lipid levels, screening stroke patients, and improving the report quality of RCT trials.

This and more is captured in a 2012 review and meta-analysis, which finds that checklists reduce the risk of mortality by nearly half:

This review shows that with the use of the checklist the relative risk for mortality is 0.57 and for any complications 0.63.

It should come as no surprise, then, that checklists are cost-effective, with the ability to save hospitals anywhere between $103,829 and $2,671,253.

If you’re still not convinced, not only do checklists save lives and money, but they may also improve process efficiency and productivity:

Use of a “preflight checklist” in Kaiser Permanente Southern California’s operating rooms resulted in improved nurse retention as turnover decreased from 23% to 7%. Also, after implementation of Kaiser Permanente’s checklist, there was a decrease in the number of operative cases that were canceled or delayed.

So, if the question is “How awesome are checklists?” I’d say: pretty awesome.

Further Reading

Web Roundup: Links for May

Why Category Theory Matters

I hope most mathematicians continue to fear and despise category theory, so I can continue to maintain a certain advantage over them.

—John Baez

growth-of-category-theory

The above is a graph of the number of times the phrase “category theory” has been used in books, from about 1950 through the present. It speaks for itself.

But why? What’s the big deal? Why does category theory matter? I’m about a quarter of the way through Conceptual Mathematics: A First Introduction to Categories and still not sure why I’m bothering with fleshing out all this theory. Is this just set theory for hipsters?

What category theory is about

Category theory is, essentially, the study of mathematical structure. It’s the study of things and the mappings between those things, the translations of these objects. These are usually called objects and morphisms (or arrows, if you prefer). Objects can be thought of as sets and arrows as functions, though they are not limited to this interpretation.

category-theory

The subject’s major insight is, in order to understand something, focus on the structure preserving mappings of that something — the legal translations.

What the excitement is about

The vast applicability and expressiveness of category theory leads to the observation that most structures in mathematics are best understood from a category theoretic or higher category theoretic viewpoint.

nLab

Category theory is one of, if not the most, abstract fields of mathematics. It’s even been dubbed, as one might tease a younger sibling, “abstract nonsense.” After all, the field throws out all the specific properties of objects and instead focuses solely on their translations.

This extreme generality of category theory means that it can say something about anything, but nothing too specific. In other words, part of the growth of category is probably because you can use it to talk about damn near anything. (See the applications below for examples.)

In this respect, category theory is like set theory. The popularity of set theory is a result of the fact that, hey, it’s a pretty good language for talking about a lot of different types of mathematics. Most things can be formalized as a combination of sets and first order logic, and it’s not that unnatural to think in terms of sets so, bam, popularity.

In Categories for Software Engineering, the authors put it this way: “The way we like to present category theory is as a toolbox similar to set theory: as a kind of mathematical lingua franca in the sense that it can be used for formalizing concepts that arise in our day-to-day activity.”

This generality mirrors the difference between strong and weak methods in artificial intelligence. General methods, while widely applicable, don’t typically scale up to hard problems. Instead, specialized tools are necessary. In the same way, category theory is more of a tool for elucidating connections between mathematical structures than for solving problems — in contrast with something like linear algebra, or really any field of applied math.

Benefits of category theory over set theory

God made the integers, all the rest is the work of man.

—Leopold Kronecker

To be honest, I don’t like set theory. It’s artificial — the axioms aren’t obviously true, but rather the product of a search for a paradox-free foundation for mathematics. It’s sort of complicated, maybe not at the lowest levels, but definitely once you try to build up something like the real numbers. (A 2008 issue of the AMS reports, “…to expand the definition of the number 1 fully in terms of Bourbaki primitives requires over 4 trillion symbols.”)

The whole enterprise is bizarre. As humans, we didn’t start out with sets and then build out mathematics. No, the Egyptians did arithmetic and some algebra. (The oldest extant mathematical records deal with the Pythagorean theorem.) Animals have some notion of magnitude and many can even count. Set theory, rather than a natural extension of mathematical enterprise, seems more like something forced — the difference between English and Esperanto.

As far as I can tell, the mathematical community agrees with me. Paul Cohen is the only person to ever win a Fields medal for work on foundations and today “foundations of mathematics” is code not for mathematics, but philosophy.

So, immediately, category theory has an advantage over set theory in that it’s a less artificial construction, given that it stems directly from work in algebraic topology.

But, beyond that, is their anything else exciting about category theory? The main draw is its ability to connect otherwise disparate fields, a sort of skeleton to hang other knowledge on. Mike Stay and John Baez write about this in their “Physics, topology, logic and computation: a Rosetta Stone”, where they use category theoretic constructs to speak about the similarities between — you guessed it — physics, topology, logic, and computation.

Jocelyn Ireson-Paine puts it this way, “category theory is a great source of unifying concepts and organizing principles.” This is the benefit of all the abstraction — by throwing away all the details, an object’s structure reveals itself.

As a concrete example, consider one of the most profound mathematical achievements, Descartes’s discovery of analytic geometry — the realization that geometry can be translated into cartesian coordinates and, thus, the power of algebra can be brought to bear on the subject.

With category theory, this discovery can be expressed in what has to be one of the most satisfying formulas of all time:

\( P \xrightarrow{\quad f \quad} \mathbb{R}^{2} \)

Applications of category theory

The above is nice and all, but it’s still just sort of hey-take-my-word-for-it, which is not so satisfying. Here are some actual examples:

I will leave you with the following:

[Category theory] does not itself solve hard problems in topology or algebra. It clears away tangled multitudes of individually trivial problems. It puts the hard problems in clear relief and makes their solution possible.

“The Last Mathematician from Hilbert’s Gottingen”

Further Reading