The average misleads more than it informs. From Bill Gates skewing income to wartime survival bias, discover why median and mode reveal the truth.
Hyle Editorial·
The average American has less than one ovary and less than one testicle. The average tells you almost nothing about any individual — and yet it drives almost every policy decision. In 2023, the U.S. Census Bureau reported a mean household income of $74,580, but that number describes exactly zero actual households. When Bill Gates walks into a bar, everyone inside becomes a millionaire on average — but no one's bank balance changes. So why do we keep building systems, policies, and predictions on a foundation that collapses under the weight of real-world data?
The arithmetic mean — what we colloquially call "the average" — is seductively simple. Sum all values, divide by count. For symmetric distributions, it works beautifully. But here's what your statistics textbook may have glossed over: most real-world distributions aren't symmetric.
Consider income distribution in the United States. The mean household income sits at approximately $74,580, but the median — the middle value when all incomes are sorted — hovers around $54,000. That $20,000 gap isn't an error. It's a mathematical signature of right-skewness, where extreme high values pull the mean away from the bulk of the data.
“[!INSIGHT] The mean is not a "center”
— it's a balance point. In a skewed distribution, the mean sits in the direction of the tail, not where most data points cluster.
When Bill Gates enters that hypothetical bar of 10 people, the mean net worth explodes from perhaps $50,000 to over $10 billion. The median? It moves by approximately zero. The median is robust to outliers; the mean is held hostage by them.
The Statistical Formula Behind the Distortion
For a dataset $x_1, x_2, ..., x_n$:
Mean: $\bar{x} = \frac{1}{n}\sum_{i=1}^{n} x_i$
Median: The middle value when data is ordered (or average of two middle values for even $n$)
The mean incorporates every data point's magnitude. Add a value of $10^9$ to a dataset of values around $10^2$, and the mean shifts dramatically. The median simply notes the new ordering and adjusts minimally.
Survival Bias: When the Average Hides the Dead
During World War II, the U.S. military faced a critical problem: too many bombers were being shot down. The Statistical Research Group at Columbia University, including the legendary Abraham Wald, was tasked with determining where to add armor to improve survival rates.
Military commanders examined returning aircraft and cataloged bullet hole locations. The data showed concentrated damage on wing tips, tail sections, and fuselage centers. The obvious conclusion? Armor those areas.
Wald's counterintuitive insight: armor the places where the returning planes had NO bullet holes.
“"You are looking at the holes that didn't matter. The planes with holes in the engines never came back.”
— Abraham Wald, 1943
The commanders had computed a "average damage pattern" from survivor data alone. They weren't seeing where bullets hit planes — they were seeing where bullets hit planes that survived. The mean damage location on returning aircraft told them nothing about the lethal zones.
This is survival bias in its purest form: when your sample excludes the outcome you're trying to understand, your average is a lie.
[!INSIGHT] Survival bias corrupts averages whenever the selection mechanism correlates with the variable being measured. In wartime, destroyed planes don't report their damage locations.
Modern example: A 2022 study of "successful entrepreneurs" found that 67% worked over 60 hours per week. Conclusion? Hard work drives success. The missing data? The millions of failed entrepreneurs who also worked 60+ hours but weren't surveyed because they'd returned to corporate jobs.
Simpson's Paradox: When the Average Reverses Reality
In 1973, UC Berkeley was accused of gender discrimination in graduate admissions. The aggregate data seemed damning: 44% of male applicants were admitted versus only 35% of female applicants.
But when statisticians dissected the data by individual departments, a shocking pattern emerged. Most departments actually admitted women at HIGHER rates than men. How could the overall average show discrimination when most components showed the opposite?
The answer lay in application patterns:
Department
Female Applicants
Female Admit Rate
Male Applicants
Male Admit Rate
A (Easy)
108
82%
825
62%
B (Hard)
432
68%
560
63%
Women disproportionately applied to competitive departments with low admission rates for everyone. Men concentrated in departments with higher overall acceptance rates.
[!NOTE] Simpson's Paradox occurs when a trend appears in several groups of data but disappears or reverses when these groups are combined. It's a warning that aggregated averages can obscure or invert underlying relationships.
The mathematical condition for Simpson's Paradox:
$$P(A|B) < P(A|B^c) \text{ yet } P(A|B, C=c_i) > P(A|B^c, C=c_i) \text{ for all } i$$
This isn't rare. A 2021 analysis of COVID-19 mortality rates found that overall, younger patients had higher mortality in some hospitals — until researchers stratified by comorbidity status. Within each health category, older patients consistently fared worse. The aggregate average had inverted the truth.
The Honest Alternatives: Median and Mode
If the mean lies, what tells the truth?
The Median: Robust to Extremes
The median income ($54,000) describes the typical American household far better than the mean ($74,580). It's the value that splits the distribution exactly in half. Mathematical property: the median minimizes the sum of absolute deviations:
For decision-makers, this means the median is the "best guess" if your cost of being wrong is proportional to distance from truth, regardless of direction.
The Mode: Where the Data Actually Lives
The mode — the most frequently occurring value — tells you where observations cluster. In multimodal distributions (like income, which peaks near $25,000 and again near $150,000), multiple modes reveal subpopulations that averages flatten into meaninglessness.
When to Use What
Distribution Type
Best Central Tendency
Symmetric
Mean = Median (both work)
Right-skewed (income, home prices)
Median
Categorical/Nominal
Mode
Multimodal
Report multiple modes
“[!INSIGHT] The "best" measure of central tendency isn't mathematical”
— it's contextual. Always ask: "What decision will this number inform?" If outliers would mislead the decision, the mean is the wrong tool.
The Real-World Cost of Average-Driven Decisions
Policy built on means creates systematic failures:
Flood Planning: The "100-year flood" average leads developers to build in zones that flood every 20 years, because the mean recurrence interval ignores clustered extreme events.
Medical Treatment: Drug dosing based on mean pharmacokinetics fails pediatric and elderly patients whose metabolisms sit far from average.
Economic Policy: Tax cuts designed for mean households ($74,580 income) provide minimal benefit to median households ($54,000) while directing windfalls to the right-tail outliers.
Education Reform: School funding formulas using mean property values systematically underfund districts with bimodal distributions — wealthy enclaves adjacent to working-class neighborhoods.
A 2024 Federal Reserve analysis found that 61% of Americans earn less than the mean household income — not because they're "below average" in any meaningful sense, but because the mathematical properties of the mean in skewed distributions guarantee this result.
Conclusion
The average is not evil. It's a tool. But it's a tool we misuse with stunning regularity. We compute means for skewed distributions, aggregate across heterogeneous groups, and ignore the survivor bias that corrupts our samples.
Key Takeaway
The mean is mathematically valid but contextually treacherous. Before reporting or consuming an average, ask three questions: Is the distribution symmetric? Are there hidden subgroups? Does the sample exclude failures? If any answer is "yes," the median or mode will serve you better — and might completely change your conclusion.
The average American has one breast and one testicle. That statistic is mathematically precise and completely useless. The same applies to most averages we encounter. Numbers don't lie — but the people computing them often choose the wrong operation for the wrong distribution, and then build policies on the resulting fiction.
Sources: U.S. Census Bureau, Current Population Survey (2023); Wald, A. (1943). A Method of Estimating Plane Vulnerability. Statistical Research Group; Bickel, P.J. et al. (1975). Sex Bias in Graduate Admissions. Science; Federal Reserve Distributional Financial Accounts (2024)
This is a Premium Article
Hylē Media members get unlimited access to all premium content. Sign up free — no credit card required.