Mathematics

The Correlation That Killed People (And the One That Didn't)

How tobacco companies weaponized 'correlation is not causation' to delay cancer warnings for 40 years—and why chocolate consumption predicts Nobel prizes.

Hyle Editorial·

When Statistics Become Weapons

The tobacco industry used the phrase 'correlation is not causation' to delay cancer warnings for 40 years. The same sentence appears in every introductory statistics textbook. One use killed people. The other is correct.

In 1950, Ernst Wynder and Evarts Graham published a landmark study showing that 96.5% of lung cancer patients in their sample were smokers. The correlation was stark, undeniable—and according to the tobacco industry, meaningless. For four decades, cigarette manufacturers deployed a sophisticated statistical defense: merely showing that two variables move together, they argued, proves nothing about whether one causes the other.

They were technically right. They were also morally catastrophic.

The Statistical Battlefield: 1950–1998

The correlation between smoking and lung cancer was one of the strongest ever observed in epidemiology. British statistician Richard Doll's 1950 study found that heavy smokers were 50 times more likely to develop lung cancer than non-smokers. By 1964, the U.S. Surgeon General's report had compiled evidence from over 7,000 studies.

Yet the tobacco industry's defense hinged on a legitimate statistical principle. Correlation measures the degree to which two variables move together, quantified by Pearson's correlation coefficient:

$$r = \frac{\sum(x_i - \bar{x})(y_i - \bar{y})}{\sqrt{\sum(x_i - \bar{x})^2 \sum(y_i - \bar{y})^2}}$$

Values range from $r = -1$ (perfect negative correlation) to $r = +1$ (perfect positive correlation). Smoking and lung cancer showed correlations consistently above $r = 0.9$.

But correlation alone cannot distinguish causation from confounding. Perhaps a genetic factor caused both smoking behavior and lung cancer. Perhaps environmental pollutants were the true culprit. The industry funded the Council for Tobacco Research to explore these alternative explanations—$300 million between 1954 and 1997.

[!INSIGHT] The phrase "correlation is not causation" is neither false nor trivial. It represents one of statistics' most important cautionary principles. The tobacco industry's sin was not mathematical error but strategic exploitation of scientific uncertainty.

Bradford Hill's Answer

In 1965, British epidemiologist Austin Bradford Hill proposed nine criteria for establishing causation from correlational data:

  1. Strength: Stronger associations are more likely causal
  2. Consistency: Results replicate across studies and populations
  3. Specificity: One cause leads to one effect
  4. Temporality: Cause must precede effect
  5. Biological gradient: Dose-response relationship
  6. Plausibility: Mechanism makes biological sense
  7. Coherence: Fits with existing knowledge
  8. Experiment: Intervention changes outcome
  9. Analogy: Similar causes produce similar effects

Smoking met all nine criteria. By the 1980s, the causal case was overwhelming. Yet legal and regulatory battles continued until the 1998 Master Settlement Agreement—48 years after Wynder and Graham's original study.

*"It is not enough to demonstrate that there is an association between smoking and lung cancer; one must demonstrate that smoking causes lung cancer.
Tobacco Institute position, 1979

The delay had a human cost. The U.S. Centers for Disease Control estimates that 16 million Americans live with smoking-related diseases, and cigarette smoking causes more than 480,000 deaths annually in the United States alone.

The Chocolate Nobel Prize Correlation: $r = 0.791$

In 2012, Franz Messerli published a paper in the New England Journal of Medicine showing a remarkable correlation: countries with higher chocolate consumption per capita won more Nobel prizes per capita. The correlation coefficient was $r = 0.791$—stronger than many accepted medical findings.

Switzerland topped both lists: 11.9 kg of chocolate per person annually and 31.4 Nobel prizes per 10 million population. China ranked near the bottom on both measures.

Messerli wrote: "[T]he total number of Nobel prizes won by each country could be predicted from the country's per capita chocolate consumption with a coefficient of correlation of 0.791."

Was chocolate causing scientific brilliance? Flavonoids in cocoa do improve cognitive function in short-term studies. The biological pathway was plausible—dopamine enhancement, improved cerebral blood flow.

[!NOTE] Messerli's paper was published in the journal's "Occasional Notes" section and has been interpreted as a satirical commentary on inferring causation from ecological correlations. The author later clarified this interpretation.

But the correlation was almost certainly spurious. Wealthy countries consume more chocolate and invest more in scientific research. The causal variable—national GDP—lay hidden beneath both observed variables. This is the classic confounder problem.

Why Two Very Different Correlations Use the Same Language

Both the tobacco-lung cancer correlation and the chocolate-Nobel correlation invite the same statistical question: does correlation imply causation?

In the first case, the answer was "no, but the weight of evidence overwhelmingly supports causation through multiple converging lines of research." In the second case, the answer was "no, and the correlation likely reflects confounding rather than any direct causal relationship."

The critical difference lies not in the statistics but in the totality of evidence:

CriterionSmoking-Lung CancerChocolate-Nobel Prizes
Correlation strength$r > 0.9$$r = 0.791$
TemporalitySmoking precedes cancerUnknown
Dose-responseClear gradientWeak
MechanismCarcinogens identifiedPlausible but unproven
Experimental evidenceAnimal studies positiveNone
Alternative explanationsTested and rejectedObvious (wealth)

[!INSIGHT] The tobacco industry exploited the fact that statistical correlation alone can never prove causation. But science doesn't require mathematical proof—it requires converging evidence that makes alternative explanations untenable. By 1965, the smoking-cancer link had met this standard. By 2024, the chocolate-Nobel link has not.

The Harm of Uncritical Skepticism

There's a danger in applying "correlation is not causation" too broadly. When every correlation is dismissed as potentially spurious, we lose the ability to act on probabilistic evidence.

The philosopher Elizabeth Anscombe captured this tension: "Causation is not something that can be established by statistics alone. But statistics can make certain causal hypotheses so improbable that we are justified in rejecting them."

The tobacco industry's strategy worked precisely because they demanded certainty in a domain where only probability was available. They positioned absence of mathematical proof as evidence of absence—a logical fallacy that delayed public health action for decades.

The Lessons

Key Takeaway: "Correlation is not causation" is a warning, not a conclusion. The tobacco industry transformed a valid statistical principle into a weapon of delay by demanding impossible standards of proof. The chocolate-Nobel correlation reminds us that some correlations are spurious—but distinguishing the meaningful from the meaningless requires careful investigation, not reflexive dismissal.

The difference between a correlation that kills and one that merely amuses lies not in the statistics but in the evidence that surrounds them. Hill's criteria, Bradford's framework, and modern causal inference methods (including randomized controlled trials and natural experiments) provide tools for distinguishing causation from coincidence.

When you hear "correlation is not causation," ask: What is the total evidence? What alternative explanations have been tested? What is the cost of waiting for certainty?

Sources: Doll, R. & Hill, A.B. (1950). Smoking and carcinoma of the lung. BMJ. Messerli, F.H. (2012). Chocolate consumption, cognitive function, and Nobel laureates. NEJM. Proctor, R. (2012). Golden Holocaust: Origins of the Cigarette Catastrophe. Brandt, A. (2007). The Cigarette Century. U.S. Surgeon General (1964). Smoking and Health.

Related Articles