Not Found — Hylē Media

“[!INSIGHT] The mean is not a "center”

— it's a balance point. In a skewed distribution, the mean sits in the direction of the tail, not where most data points cluster.

When Bill Gates enters that hypothetical bar of 10 people, the mean net worth explodes from perhaps $50,000 to over $10 billion. The median? It moves by approximately zero. The median is robust to outliers; the mean is held hostage by them.

The Statistical Formula Behind the Distortion

For a dataset $x_1, x_2, ..., x_n$:

Mean: $\bar{x} = \frac{1}{n}\sum_{i=1}^{n} x_i$
Median: The middle value when data is ordered (or average of two middle values for even $n$)

The mean incorporates every data point's magnitude. Add a value of $10^9$ to a dataset of values around $10^2$, and the mean shifts dramatically. The median simply notes the new ordering and adjusts minimally.

Survival Bias: When the Average Hides the Dead

During World War II, the U.S. military faced a critical problem: too many bombers were being shot down. The Statistical Research Group at Columbia University, including the legendary Abraham Wald, was tasked with determining where to add armor to improve survival rates.

Military commanders examined returning aircraft and cataloged bullet hole locations. The data showed concentrated damage on wing tips, tail sections, and fuselage centers. The obvious conclusion? Armor those areas.

Wald's counterintuitive insight: armor the places where the returning planes had NO bullet holes.

“"You are looking at the holes that didn't matter. The planes with holes in the engines never came back.”

— Abraham Wald, 1943

The commanders had computed a "average damage pattern" from survivor data alone. They weren't seeing where bullets hit planes — they were seeing where bullets hit planes that survived. The mean damage location on returning aircraft told them nothing about the lethal zones.

This is survival bias in its purest form: when your sample excludes the outcome you're trying to understand, your average is a lie.

[!INSIGHT] Survival bias corrupts averages whenever the selection mechanism correlates with the variable being measured. In wartime, destroyed planes don't report their damage locations.

Modern example: A 2022 study of "successful entrepreneurs" found that 67% worked over 60 hours per week. Conclusion? Hard work drives success. The missing data? The millions of failed entrepreneurs who also worked 60+ hours but weren't surveyed because they'd returned to corporate jobs.

Simpson's Paradox: When the Average Reverses Reality

In 1973, UC Berkeley was accused of gender discrimination in graduate admissions. The aggregate data seemed damning: 44% of male applicants were admitted versus only 35% of female applicants.

But when statisticians dissected the data by individual departments, a shocking pattern emerged. Most departments actually admitted women at HIGHER rates than men. How could the overall average show discrimination when most components showed the opposite?

The answer lay in application patterns:

Department	Female Applicants	Female Admit Rate	Male Applicants	Male Admit Rate
A (Easy)	108	82%	825	62%
B (Hard)	432	68%	560	63%

Women disproportionately applied to competitive departments with low admission rates for everyone. Men concentrated in departments with higher overall acceptance rates.

[!NOTE] Simpson's Paradox occurs when a trend appears in several groups of data but disappears or reverses when these groups are combined. It's a warning that aggregated averages can obscure or invert underlying relationships.

The mathematical condition for Simpson's Paradox:

$$P(A|B) < P(A|B^c) \text{ yet } P(A|B, C=c_i) > P(A|B^c, C=c_i) \text{ for all } i$$

This isn't rare. A 2021 analysis of COVID-19 mortality rates found that overall, younger patients had higher mortality in some hospitals — until researchers stratified by comorbidity status. Within each health category, older patients consistently fared worse. The aggregate average had inverted the truth.

The Honest Alternatives: Median and Mode

If the mean lies, what tells the truth?

The Median: Robust to Extremes

The median income ($54,000) describes the typical American household far better than the mean ($74,580). It's the value that splits the distribution exactly in half. Mathematical property: the median minimizes the sum of absolute deviations:

$$\tilde{x} = \arg\min_m \sum_{i=1}^{n} |x_i - m|$$

For decision-makers, this means the median is the "best guess" if your cost of being wrong is proportional to distance from truth, regardless of direction.

The Mode: Where the Data Actually Lives

The mode — the most frequently occurring value — tells you where observations cluster. In multimodal distributions (like income, which peaks near $25,000 and again near $150,000), multiple modes reveal subpopulations that averages flatten into meaninglessness.

When to Use What

Distribution Type	Best Central Tendency
Symmetric	Mean = Median (both work)
Right-skewed (income, home prices)	Median
Categorical/Nominal	Mode
Multimodal	Report multiple modes

“[!INSIGHT] The "best" measure of central tendency isn't mathematical”

— it's contextual. Always ask: "What decision will this number inform?" If outliers would mislead the decision, the mean is the wrong tool.

The Real-World Cost of Average-Driven Decisions

Policy built on means creates systematic failures:

Flood Planning: The "100-year flood" average leads developers to build in zones that flood every 20 years, because the mean recurrence interval ignores clustered extreme events.
Medical Treatment: Drug dosing based on mean pharmacokinetics fails pediatric and elderly patients whose metabolisms sit far from average.
Economic Policy: Tax cuts designed for mean households ($74,580 income) provide minimal benefit to median households ($54,000) while directing windfalls to the right-tail outliers.
Education Reform: School funding formulas using mean property values systematically underfund districts with bimodal distributions — wealthy enclaves adjacent to working-class neighborhoods.

A 2024 Federal Reserve analysis found that 61% of Americans earn less than the mean household income — not because they're "below average" in any meaningful sense, but because the mathematical properties of the mean in skewed distributions guarantee this result.

Conclusion

The average is not evil. It's a tool. But it's a tool we misuse with stunning regularity. We compute means for skewed distributions, aggregate across heterogeneous groups, and ignore the survivor bias that corrupts our samples.

Key Takeaway The mean is mathematically valid but contextually treacherous. Before reporting or consuming an average, ask three questions: Is the distribution symmetric? Are there hidden subgroups? Does the sample exclude failures? If any answer is "yes," the median or mode will serve you better — and might completely change your conclusion.

The average American has one breast and one testicle. That statistic is mathematically precise and completely useless. The same applies to most averages we encounter. Numbers don't lie — but the people computing them often choose the wrong operation for the wrong distribution, and then build policies on the resulting fiction.

Sources: U.S. Census Bureau, Current Population Survey (2023); Wald, A. (1943). A Method of Estimating Plane Vulnerability. Statistical Research Group; Bickel, P.J. et al. (1975). Sex Bias in Graduate Admissions. Science; Federal Reserve Distributional Financial Accounts (2024)

Why Average Is Almost Always Wrong

The Statistical Formula Behind the Distortion

Survival Bias: When the Average Hides the Dead

Simpson's Paradox: When the Average Reverses Reality

The Honest Alternatives: Median and Mode

The Median: Robust to Extremes

The Mode: Where the Data Actually Lives

When to Use What

The Real-World Cost of Average-Driven Decisions

Conclusion

This is a Premium Article

Related Articles

The Correlation That Killed People (And the One That Didn't)

The P-Value Crisis Nobody Told You About

How to Lie with a Graph (Without Changing Any Numbers)