Science

How Bias Gets In — and Stays In

The mechanics of algorithmic bias: from training data to real-world harm

Learning Objectives

By the end of this module you will be able to:

Explain the mechanisms by which biased training data produces biased model outputs.
Describe how proxy variables and indirect markers enable discrimination even when protected attributes are excluded.
Identify the stages in the algorithm lifecycle at which bias can be introduced or amplified.
Explain why removing race or gender features from an algorithm does not guarantee fairness.
Analyze at least two real-world examples of high-stakes algorithmic bias and trace the mechanisms responsible.

Core Concepts

1. Algorithms Learn from History — Including Its Mistakes

Machine learning systems do not reason from first principles. They detect patterns in data and optimize to reproduce those patterns. This means that whatever social dynamics are baked into historical data get baked into the model too.

The fundamental mechanism

When an algorithm is trained on data that reflects past discrimination, it learns to replicate that discrimination. This is not a glitch — it is the system working exactly as designed.

Research on hiring algorithms is explicit on this: models trained on historically biased hiring data learn and amplify those disparities. Barocas and Selbst's foundational work "Big Data's Disparate Impact" frames it clearly — data mining algorithms discover statistical regularities that encode preexisting patterns of exclusion. Unthinking reliance on historically biased data perpetuates discrimination against vulnerable populations.

This is sometimes called historical bias amplification: the algorithm doesn't just inherit bias passively; optimization pressure can actually intensify existing disparities, because the model is rewarded for fitting the discriminatory pattern more precisely.

2. Representation Gaps Are Not Neutral

A related but distinct mechanism is representation bias: when certain groups are underrepresented in training data, the algorithm learns patterns that work well for the majority and poorly for everyone else.

This isn't an error in any conventional sense. It's the direct mathematical consequence of how supervised learning works: systems optimize their performance for populations that dominate training data. As one study of face recognition datasets found, when training sets are overwhelmingly composed of lighter-skinned subjects — 79.6% in IJB-A and 86.2% in Adience — algorithms structurally deprioritize accurate classification of darker-skinned individuals.

In healthcare, this has direct clinical consequences. Studies on medical AI document that algorithms trained predominantly on White patient cohorts show reduced diagnostic accuracy when applied to racial and ethnic minorities. Convolutional neural networks trained on chest X-ray datasets from academic facilities underdetect disease in Black patients, Hispanic patients, and women. Melanoma detection algorithms trained on light-skin-tone images perform worse on darker skin tones. Representation bias is cited as a dominant source of reduced generalizability in healthcare AI across populations.

3. Proxy Variables: Discrimination Without Protected Attributes

One of the most persistent misconceptions about algorithmic fairness is that removing race, gender, or other protected attributes from a model is enough to prevent discrimination. It is not.

The reason is proxy variables: features that are facially neutral but highly correlated with protected characteristics. When those features are present in the model, they act as stand-ins — the algorithm learns to use them to sort people by race or gender, even though neither label appears anywhere in the input.

Removing race from an algorithm is not the same as removing the signal that race carries. Other variables may carry the same signal just as well.

Zip code is a canonical example from credit lending. It appears neutral, but it is heavily correlated with race and has a long history of use in discriminatory redlining. Algorithms trained on historical lending data pick up this correlation and reproduce its effects — producing what researchers call algorithmic redlining — even when race is explicitly excluded from the model inputs.

This is documented empirically. Studies on automated underwriting systems show that minority applicants — particularly Black and Hispanic borrowers — remain less likely to receive algorithmic approval even in systems designed to be "blind" to race. Black and Hispanic applicants' approval rates are approximately 1.5 percentage points lower even after controlling for creditworthiness.

The practice of removing protected features while hoping for fairness has a name in the research literature: fairness through unawareness. The consensus finding is that it does not work. Proxy variables, or redundant encoding, allow structural bias to survive feature exclusion.

4. Indirect Markers: The Algorithm Finds Its Own Proxies

Proxy discrimination does not require a human to consciously select a correlated variable. Machine learning models find their own proxies, including ones that a human auditor might not think to check.

Amazon's hiring algorithm (covered in depth in the case study below) learned to use language patterns as indirect markers for gender. After engineers removed explicitly gendered language from the model, it adapted: it penalized résumés using verbs statistically more common in female engineers' applications, and flagged the names of all-women's colleges, and words like "women's" appearing in candidate profiles. According to the ACLU's analysis and MIT Technology Review, the algorithm had learned to infer gender from statistical associations in language patterns — then systematically screen those candidates out.

This illustrates why examining which features a model explicitly includes is insufficient. What matters is what statistical relationships the model has learned — and those can involve features that seem entirely innocuous.

5. Pre-Trained Models Inherit and Transmit Bias

Most modern AI applications don't train their models from scratch. They use pre-trained models — large models trained on broad datasets — as foundation components, then adapt or build on top of them. This adds a hidden layer to the bias problem.

Research on algorithmic hiring audits documents that pre-trained models used for facial analysis, voice analysis, speech transcription, and natural language processing carry their own bias characteristics, which propagate downstream into applications that depend on them. An organization can audit its own data and decision logic while remaining unaware that a borrowed model component is introducing discrimination through a pathway they haven't examined.

This extends the scope of responsible bias assessment beyond the immediate model to every upstream component the system relies on.

6. Bias Can Enter at Every Stage of the Lifecycle

It is tempting to think of algorithmic bias as a training-data problem — fix the data and you fix the bias. But research across healthcare AI systems documents that bias can be introduced at every stage of the algorithm lifecycle:

Problem formulation — how the prediction task is defined, which outcomes are chosen as targets
Data selection and preparation — which populations are included or excluded, what historical records are used
Algorithm development and validation — which metrics are optimized, which validation sets are used
Deployment and integration — how the model interacts with human decision-makers, who uses it and on whom
Monitoring and maintenance — whether real-world performance disparities are tracked after launch

Early-stage bias is hardest to fix

Biases introduced during problem formulation and data selection are significantly harder to address after the fact than those caught during development. Post-hoc fixes cannot substitute for upstream care.

This lifecycle view matters because it distributes responsibility — no single actor (data team, model developer, deploying organization) holds all the leverage.

7. Feedback Loops: Bias That Grows Over Time

In certain high-stakes applications, deployed algorithms shape the data that will be used to retrain future versions. This creates feedback loops that can amplify bias over time.

Predictive policing offers the clearest example. If an algorithm sends more police to certain neighborhoods, those neighborhoods generate more reported crime — not because crime is actually higher, but because detection increases with police presence. That additional recorded crime is fed back into the training data, reinforcing and amplifying the algorithm's initial predictions.

Simulation studies of PredPol documented this directly: when additional crime data was simulated to reflect increased police presence in targeted areas, the algorithm's predicted crime rate in those neighborhoods escalated from approximately 25% to over 70%. The algorithm's own deployment had created the pattern it was now detecting.

8. Intersectional Bias: Where Single-Axis Analysis Fails

Most bias analyses examine one protected characteristic at a time — race, or gender, but not both simultaneously. But individuals hold multiple identities, and discrimination compounds.

Research on AI resume screening systems found that resumes with white-associated names were preferred 85% of the time versus 9% for Black-associated names. But intersectional analysis revealed something that single-axis analysis would miss entirely: resumes with Black male-associated names were selected 14.8% of the time — compared to 0% for certain name combinations. The harm is not simply additive. It operates differently for people at the intersection of multiple marginalized identities.

Brookings research describes this as intersectional bias: disparities that exceed what you would predict by examining race bias and gender bias separately. New York City's Local Law 144 on automated employment decision tools explicitly requires auditors to measure bias across intersectional categories, precisely because single-category analysis systematically underestimates harm.

Annotated Case Studies

Case Study 1: Amazon's Resume Screening Algorithm

What happened. Starting in 2014, Amazon developed a machine learning tool to automatically screen résumés for technical roles. By 2017, they had scrapped it entirely after discovering it systematically discriminated against women.

The mechanism. The algorithm was trained on ten years of historical hiring data from a male-dominated industry. It learned to identify the characteristics of successful hires — who were overwhelmingly men. The model did not receive the label "prefer men." It inferred that preference from statistical pattern in the data.

Engineers tried to correct this by removing explicitly gendered language. The model adapted. It learned to use indirect markers — verb choices statistically more common on male-authored résumés, references to all-women's colleges, the word "women's" as in "captain of the women's chess team" — to infer gender and downweight those candidates.

Why the fix didn't work. This is a textbook illustration of why fairness through unawareness fails. Protected attribute removal does not eliminate the information the model has learned to use. As long as correlated signals remain in the data, the model can reconstitute the discrimination through different pathways.

What it illustrates. This case demonstrates historical bias amplification, indirect marker discrimination, and the fundamental limitation of feature exclusion as a fairness strategy — all in a single real-world example.

Case Study 2: COMPAS in Criminal Sentencing

What happened. COMPAS (Correctional Offender Management Profiling for Alternative Sanctions) is a risk assessment algorithm used across the United States criminal justice system to estimate the likelihood that a defendant will reoffend. ProPublica's 2016 investigation found that the algorithm was nearly twice as likely to falsely flag Black defendants as future criminals compared to white defendants — and nearly twice as likely to incorrectly flag white defendants as low risk.

The stakes. This is not a consumer product. COMPAS scores have been cited directly in sentencing decisions in multiple U.S. states. In a documented Wisconsin case, a judge cited the algorithm's score in imposing an 8.5-year sentence. Algorithmic bias here translates directly into years of incarceration.

The mechanism. Criminal justice algorithms inherit bias from historical data because police records and arrest databases reflect discriminatory policing practices, not actual crime patterns. Barocas and Selbst's analysis puts this directly: data mining algorithms discover regularities that encode preexisting patterns of exclusion. An algorithm trained on who was arrested — rather than who committed a crime — learns the biases of policing, not the distribution of offending.

The disagreement. Northpointe, the vendor, disputed ProPublica's findings and challenged their methodology. This dispute itself reveals something important: there is no single definition of "fairness" that all stakeholders agree on. Multiple independent peer-reviewed studies later validated ProPublica's core finding of racial disparities, while acknowledging genuine complexity in how fairness criteria can conflict. The dispute was about statistical definitions, not about whether disparate outcomes existed.

What it illustrates. COMPAS demonstrates data inheritance bias, the feedback-loop risks of deploying biased historical data in a system that influences future conditions, and the real-world consequence of algorithmic bias in high-stakes decision-making.

Case Study 3: Healthcare Cost as a Proxy for Health Need

What happened. A landmark 2019 study published in Science analyzed a widely-used commercial algorithm designed to identify patients who would benefit from additional care coordination. The algorithm was used across U.S. health systems and was estimated to affect millions of patients.

The mechanism. The algorithm used healthcare costs and insurance payouts as a proxy for health need — a common design choice, since actual health metrics are harder to compile than billing data. The problem: Black patients with equal health needs systematically receive lower healthcare costs, because structural racism in the healthcare system has historically resulted in Black patients receiving less care. The algorithm learned this spending pattern and interpreted it as a signal of lower need.

The consequence. The study found that Black patients assigned the same risk score as white patients were demonstrably sicker. The algorithm was systematically undertriaging Black patients — not through any explicit racial variable, but through a proxy that encoded historical healthcare inequality.

The confirmation of mechanism. When researchers reformulated the algorithm to remove cost-based proxies for health need, racial bias was reduced by 84%. This is unusually direct evidence that the proxy variable itself was the mechanism — not the underlying data quality or some other confound.

What it illustrates. This case demonstrates proxy variable discrimination at its most consequential: a facially neutral design choice (using cost data) encoding structural inequality into a clinical tool, with direct implications for patient care.

Common Misconceptions

"If we remove race and gender from the model, it can't discriminate."

This is the most common misunderstanding in algorithmic fairness. As documented above, other variables in the model — zip code, language patterns, historically correlated behavioral features — can carry the same signal. Fairness through unawareness does not produce fairness; it just makes the discrimination harder to find.

"Bias is a data quality problem. Better data solves it."

Better data helps, but it does not solve the problem on its own. Bias can enter at problem formulation (what question you're training the model to answer), at feature engineering (what proxies are included), at deployment (how the model is used), and through feedback loops over time. The algorithm lifecycle framework shows that bias has many entry points, and addressing one does not close the others.

"This algorithm was used in criminal sentencing — it must have been validated."

Legal use does not imply fairness validation. Under U.S. law (Title VII of the Civil Rights Act), employers are liable for disparate impact discrimination even when it results from seemingly neutral algorithmic practices, and cannot escape liability by delegating the algorithm to a vendor. But legal accountability and technical validation are different things — systems can be deployed at scale before their disparate impacts are independently measured. The legal framework creates incentive for rigorous independent auditing but does not guarantee it occurs.

"Algorithmic bias means someone programmed the algorithm to be biased."

In the cases documented here, no one explicitly programmed a preference for white candidates, or against Black defendants, or against Black patients. The discrimination emerged from optimization on biased data, proxy variable selection, and indirect learned associations. As documented in Amazon's case, engineers actively tried to remove bias and failed — not due to bad faith but because the structural source of bias in the training data was not addressed. Discrimination can be the output of a system that no individual designed to discriminate.

"Measuring bias by race or gender is enough."

Single-axis analysis systematically underestimates harm for people holding multiple marginalized identities. Empirical studies show that intersectional disparities exceed what single-axis analysis predicts. Resumes with certain name combinations received 0% selection rates in audits — a disparity invisible to analyses that examine race and gender independently.

Key Takeaways

Training data encodes history. Algorithms trained on historically biased data don't produce neutral outputs — they learn to reproduce and often amplify those patterns. This is called historical bias amplification.
Removing protected attributes is not a fix. Proxy variables, indirect markers, and redundant encoding allow discrimination to persist even when race, gender, and other protected characteristics are explicitly excluded. Fairness through unawareness is a false solution.
Bias enters across the full algorithm lifecycle. From how the problem is formulated through to ongoing monitoring after deployment, bias has multiple entry points. Addressing one stage does not guarantee fairness at others.
Deployment creates feedback loops. In high-stakes domains like policing, model outputs shape future data, which retrains future models. Biased predictions can compound over time, not merely persist.
Intersectionality matters for measurement. Single-axis bias analysis misses harms that fall on people at the intersection of multiple marginalized identities. Meaningful bias auditing requires intersectional measurement.

Further Exploration

Foundational reading

Big Data's Disparate Impact — Barocas & Selbst — The foundational legal-academic paper on how data mining encodes discrimination. Dense but worth the effort.
Machine Bias — ProPublica — The original investigation into COMPAS. Readable, well-documented, and still the clearest public account of the sentencing algorithm story.

On proxy discrimination and fairness

Unlawful Proxy Discrimination: A Framework for Challenging Inherently Discriminatory Algorithms
Equalizing Credit Opportunity in Algorithms — On aligning algorithmic fairness research with fair lending regulation — a good bridge between technical and legal perspectives.

On intersectional bias

Gender, race, and intersectional bias in AI resume screening — Brookings — Accessible summary of recent empirical evidence on compounded disparities in hiring AI.
Gender Shades — Joy Buolamwini & Timnit Gebru — The landmark study that documented how face recognition algorithms fail disproportionately on darker-skinned women.

On the algorithm lifecycle

Bias recognition and mitigation strategies in AI healthcare applications — npj Digital Medicine — A clear framework for when and how bias enters across the full lifecycle, with practical mitigation guidance.

On Amazon's hiring algorithm

Why Amazon's Automated Hiring Tool Discriminated Against Women — ACLU — Clear lay explanation of the mechanism.
Amazon ditched AI recruitment software because it was biased against women — MIT Technology Review — The contemporaneous news account with technical detail.