Digital Humanities

Garbage In, Discrimination Out

When an algorithm trained on biased arrest data deemed Black defendants 'high risk' at twice the rate of whites, it exposed how 'objective' AI amplifies injustice.

Hyle Editorial·

The Sentence That Changed Everything

A US judge sentenced a man to prison based on an algorithm trained on the arrest records of a justice system that arrested Black men at twice the rate. The algorithm was "objective." The outcome was not.

In 2016, the investigative journalism organization ProPublica published a study that would shake the foundations of criminal justice AI. They analyzed the COMPAS algorithm—a proprietary risk assessment tool used in courts across 11 states to predict the likelihood of recidivism. Their finding? Black defendants were 77% more likely to be flagged as higher risk of violent recidivism and 45% more likely to be flagged for general recidivism than white defendants with identical actual reoffense rates. The algorithm didn't just reflect historical bias—it amplified it, wrapped it in mathematical legitimacy, and served it to judges as neutral truth.

What Is COMPAS and Why Does It Matter?

COMPAS (Correctional Offender Management Profiling for Alternative Sanctions) is a risk assessment algorithm developed by Northpointe, Inc. (now Equivant). Since 1998, it has been used in over 11,000 courtrooms to help judges make decisions about bail, sentencing, and parole. The algorithm takes 137 data points about a defendant—criminal history, age at first arrest, education level, employment status, family background—and produces a risk score from 1 to 10.

The appeal is understandable. Human judges are subject to fatigue, implicit bias, and inconsistency. An algorithm, its proponents argue, applies the same standards to everyone. It doesn't get tired. It doesn't discriminate based on skin color because it never sees skin color—it only sees data.

[!INSIGHT] The claim of algorithmic objectivity rests on a fundamental category error: confusing procedural consistency with substantive fairness. An algorithm can be perfectly consistent while consistently perpetuating injustice.

But here's the problem: COMPAS doesn't make predictions from first principles. It learns from historical data. And American criminal justice history is, to put it mildly, not a neutral dataset.

The Architecture of Bias Inheritance

When COMPAS was trained on criminal justice data, it learned patterns from a system with well-documented racial disparities. Consider these foundational facts:

Arrest disparities: Black Americans are arrested for drug offenses at rates 2 to 5 times higher than white Americans, despite using and selling drugs at comparable rates. This isn't because Black people commit more drug crimes—it's because policing concentrates in certain neighborhoods, and those neighborhoods are disproportionately Black due to historical housing policies like redlining.

Prosecutorial discretion: Black defendants are 75% more likely to have charges that carry mandatory minimum sentences. Charge bargaining—the practice of reducing charges in exchange for guilty pleas—happens less frequently for Black defendants.

Historical context: The "War on Drugs" and "tough on crime" policies of the 1980s and 1990s created a massive increase in arrests and incarcerations that fell disproportionately on Black communities.

When COMPAS treats "prior arrests" as a neutral predictor, it imports this entire history into its risk scores. An 18-year-old Black man from an over-policed neighborhood and an 18-year-old white man from an under-policed suburb may have identical behavior patterns—but vastly different arrest records. The algorithm sees only the arrests.

"The past is never dead. It's not even past.
William Faulkner

ProPublica's Investigation: The Numbers Behind the Claims

In their 2016 investigation, ProPublica obtained data on over 7,000 people arrested in Broward County, Florida, between 2013 and 2014. They tracked these individuals for two years to see who actually reoffended. Then they compared reality to COMPAS predictions.

The results were damning:

MetricBlack DefendantsWhite Defendants
Labeled higher risk, didn't reoffend44.9%23.5%
Labeled lower risk, did reoffend28%47.7%

This is what statisticians call a "false positive disparity." Black defendants were nearly twice as likely to be wrongfully classified as high-risk. White defendants were more likely to be wrongfully classified as low-risk when they actually posed danger.

The Counter-Argument and Why It Fails

Northpointe responded to ProPublica's analysis with a technical defense: COMPAS is calibrated, they argued. At any given risk score, Black and white defendants have similar actual recidivism rates. A score of 7 means roughly the same probability of reoffending regardless of race.

This is technically true—and spectacularly misses the point. Calibration measures whether predicted probabilities match observed frequencies. But it says nothing about the distribution of errors. The algorithm can be calibrated and still produce more false positives for one group.

[!INSIGHT] Algorithmic fairness isn't a single mathematical property that can be optimized. Researchers have proven that it's impossible to simultaneously satisfy all intuitive notions of fairness when base rates differ between groups. The choice of which metric to prioritize is not technical—it's ethical and political.

The fundamental issue is what computer scientists call "proxy variables." Even if an algorithm never directly uses race, it can reconstruct race through correlated variables: zip code, education level, income, family structure. COMPAS asks about residential stability, employment, and social ties—all variables that are themselves shaped by historical discrimination.

The Feedback Loop: How Bias Amplifies Itself

The most insidious aspect of predictive policing and risk assessment is the feedback loop it creates.

Step 1: Historical over-policing of Black neighborhoods creates skewed arrest data.

Step 2: Algorithm trains on this data and learns that people from these neighborhoods are "higher risk."

Step 3: Risk scores influence sentencing, leading to longer prison terms for Black defendants.

Step 4: Longer criminal records (from longer sentences) feed back into the algorithm as stronger risk signals.

Step 5: Next generation of algorithm is even more confident that Black defendants are high-risk.

This is what scholar Virginia Eubanks calls the "digital poorhouse"—automated systems that don't just reflect inequality but actively deepen it. And because the algorithm is proprietary, defendants cannot examine how the score was calculated. In some cases, defense attorneys were told they couldn't challenge the algorithm's accuracy because the company's intellectual property rights trumped due process.

[!NOTE] The Wisconsin Supreme Court eventually ruled in Loomis v. Wisconsin (2016) that judges could consider COMPAS scores but must be aware of their limitations. However, the court also held that defendants cannot challenge the algorithm's scientific validity due to its proprietary nature—a legal precedent that shields algorithmic decision-making from meaningful scrutiny.

Beyond COMPAS: The Broader Pattern

COMPAS is not an anomaly. It's a warning sign for every domain where AI meets high-stakes decision-making.

In healthcare, algorithms trained on healthcare spending data systematically underestimated the medical needs of Black patients because Black patients historically receive less care for the same conditions. One widely-used algorithm assigned identical risk scores to Black patients who were considerably sicker than their white counterparts.

In hiring, AI systems trained on successful employee data learned to prefer candidates resembling current employees—reproducing existing demographic imbalances. Amazon scrapped an internal AI recruiting tool after it systematically downgraded resumes containing the word "women's" or from women's colleges.

In lending, algorithms trained on historical loan approval data perpetuate redlining in digital form, with fintech algorithms charging higher interest rates to minority borrowers even when legally prohibited from using race as a variable.

The pattern is consistent: when historical data encodes historical injustice, algorithms trained on that data will learn to reproduce it. The more sophisticated the algorithm, the better it becomes at discovering the hidden correlations that serve as proxies for protected characteristics.

What Would Real Algorithmic Justice Look Like?

The COMPAS controversy forced a reckoning in the AI ethics community. What emerged was a set of principles for more equitable algorithmic systems:

  1. Transparency over opacity: Defendants should have the right to examine and challenge algorithmic scores. Trade secrecy cannot override due process.

  2. Audits and accountability: Independent researchers must be able to test algorithms for disparate impact before and during deployment. ProPublica's investigation should have been standard practice, not investigative journalism.

  3. Human-in-the-loop safeguards: Algorithms should inform, not replace, human judgment. Judges should receive training on algorithmic limitations.

  4. Data curation as justice work: Before training any high-stakes algorithm, we must ask whether the training data itself reflects a world we want to reproduce.

  5. Multiple fairness metrics: Accept that no single metric captures "fairness." Systems should be evaluated on multiple dimensions—calibration, false positive rates, and disparate impact—with explicit choices about trade-offs.

Some jurisdictions have responded. In 2020, California banned the use of risk assessment algorithms in pretrial detention decisions. New York City passed an algorithmic accountability bill requiring audits of automated decision systems. The European Union's AI Act classifies criminal justice AI as "high-risk," subject to strict transparency and oversight requirements.

Key Takeaway The COMPAS algorithm didn't invent racism—it inherited it. Its failure reveals a fundamental truth about AI systems: algorithms trained on historical data will learn historical patterns, including patterns of discrimination. The choice to deploy such systems is never purely technical. It's a choice about what kind of future we want to build: one that automatically reproduces the past, or one that consciously intervenes to create something more just. The algorithm was "objective" only in the narrowest sense—it applied biased rules consistently. True justice requires more than consistency. It requires confronting the garbage at the input.

Sources: ProPublica, "Machine Bias" (2016); Virginia Eubanks, "Automating Inequality" (2018); Cathy O'Neil, "Weapons of Math Destruction" (2016); Wisconsin Supreme Court, Loomis v. Wisconsin (2016); Obermeyer et al., "Dissecting Racial Bias in an Algorithm Used to Manage the Health of Populations" (Science, 2019); Partnership on AI, "Report on Algorithmic Risk Assessment Tools in the U.S. Criminal Justice System" (2019).

Related Articles