who promised to change the world with a drop of blood,
who raised billions on a test that never worked?
No more needles. No more vials. No more waiting.
Investors believed. Walgreens believed. The Pentagon believed.
They gave her $9 billion.
The test was wrong. The baby was healthy.
But how many women, receiving the same news, made different decisions?
and the lie was dressed in certainty,
and no one questioned the numbers."
This is why we study Diagnostic Test Accuracy.
there are only four possible truths.
Two are blessings. Two are curses.
Every Test Result Has a Reality Behind It
Test: Positive
Test: Positive
Test: Negative
Test: Negative
True Positive (TP)
Sick person correctly identified.
The test told the truth.
False Positive (FP)
Healthy person wrongly alarmed.
The test lied.
False Negative (FN)
Sick person wrongly reassured.
The deadliest lie.
True Negative (TN)
Healthy person correctly cleared.
The test told the truth.
The 2x2 Confusion Matrix
| Disease Present | Disease Absent | |
|---|---|---|
| Test Positive | TP True Positive |
FP False Positive |
| Test Negative | FN False Negative |
TN True Negative |
Know them by name.
TP, TN: the test spoke true.
FP, FN: the test lied."
Sensitivity asks: Can it find the sick?
Specificity asks: Can it spare the healthy?
High sensitivity = few false negatives = few missed cases.
High specificity = few false positives = few false alarms.
Lower the threshold to catch more sick people? You'll alarm more healthy people.
Raise the threshold to spare healthy people? You'll miss more sick people.
This is the threshold effect—the seesaw of diagnosis.
SnNout: Sensitive tests rule OUT
A highly sensitive test, when negative, rules out disease. If it didn't find it, it's probably not there.
SpPin: Specific tests rule IN
A highly specific test, when positive, rules in disease. If it says you have it, you probably do.
SpPin: Specific Positive rules IN
Specificity spares the well.
But no test masters both perfectly—
This is the burden we must bear."
the world needed a test that could find the infected quickly.
But what if the rapid test missed too many?
In people WITH symptoms:
Sensitivity: 73% (missed 27% of cases)
In people WITHOUT symptoms:
Sensitivity: 55% (missed 45% of cases)
Nearly half of infected asymptomatic people were told they were clear.
Thanksgiving Dinners
Families tested negative in the morning, gathered indoors, unknowingly infected grandparents
Workplace Outbreaks
Workers tested negative, came to work, infected colleagues in the break room
Hospital Transmission
Patients tested negative, admitted to wards, infected vulnerable patients
and the family gathered,
and the grandfather embraced his grandchildren,
and by winter's end, he was gone."
But the patient asks a different question:
"I tested positive. What are my chances?"
Your patient tests positive for a rare disease (prevalence 1 in 1000).
Question: What is the probability they actually have the disease?
Most doctors say 95%. The real answer? About 2%.
Specificity tells how many well it will spare.
But only the likelihood ratio answers:
What does this result mean for THIS patient?"
that found too much?
When does finding disease become causing harm?
Mammography could detect tumors too small to feel.
Women were told: "Annual mammograms save lives."
But what if some of those "cancers" would never have killed?
The woman is diagnosed, treated with surgery, radiation, chemotherapy— for a disease that would never have harmed her.
Independent UK Panel on Breast Cancer Screening. Lancet. 2012;380:1778-1786
from breast cancer
(treated unnecessarily)
(anxiety, biopsies)
Is this a good trade? The answer depends on values, not just numbers.
and called it disease,
and the woman was cut and burned and poisoned—
for a shadow that would never have darkened her days."
This is the problem of overdiagnosis.
But when you gather all the studies,
when you weigh their evidence together—
The truth becomes harder to hide.
More Precision
Combining studies gives narrower confidence intervals, reducing uncertainty
Detect Heterogeneity
Why do different studies give different answers? Setting? Population? Threshold?
Expose Publication Bias
Are negative studies being hidden? Funnel plots reveal asymmetry
Explore Thresholds
Build SROC curves to understand the sensitivity-specificity trade-off
They are correlated: when one goes up, the other tends to go down (the threshold effect).
The bivariate model accounts for this correlation, giving valid pooled estimates.
Reitsma JB et al. J Clin Epidemiol. 2005;58:982-990
ROC Space
The curve shows the trade-off
Higher = better test
Diagonal line = useless test (random guessing)
The curve = summary of all studies' performance
begin to reveal the truth.
The SROC curve is the path of evidence—
showing what the test can truly do."
One study says sensitivity is 95%.
Another says 60%.
Which truth do you believe?
High heterogeneity means the studies are measuring different things— or the test performs differently in different settings.
Threshold Differences
Different cutoffs for "positive" result (e.g., different HbA1c thresholds for diabetes)
Population Differences
Disease severity, age, comorbidities differ between studies
Setting Differences
Primary care vs. specialist clinic vs. emergency room
Quality Differences
Risk of bias, verification bias, spectrum bias
Studies agree
Some disagreement
Major disagreement
You cannot average apples and oranges. You must explain why studies differ before pooling them.
do not silence the dissent.
Ask: Why do they see differently?
The disagreement itself teaches."
Sensitivity & Specificity
How well the test performs on sick vs. healthy people
Likelihood Ratios (LR+, LR-)
How much a result changes the probability of disease
Diagnostic Odds Ratio (DOR)
Single measure of test discrimination (DOR = LR+ / LR-)
Area Under the SROC Curve (AUC)
Overall test performance across all thresholds (0.5 = useless, 1.0 = perfect)
bivariate meta-analysis
for DTA reviews
open access tool
Rutter & Gatsonis 2001 - HSROC model
Cochrane Handbook Ch. 10 - DTA methods
Was there a valid reference standard?
Gold standard test applied to all patients?
Were interpreters blinded?
Test readers unaware of diagnosis, and vice versa?
Was the spectrum appropriate?
Patients similar to your clinical population?
Was the threshold pre-specified?
Or was it chosen to maximize results?
armed with the SROC and the measure of agreement,
you can see through the lie of the test—
and judge its truth for yourself."
When a machine claims to see what no other machine can see,
and no one asks: "Show me the proof"?
FDA found:
• Results varied by 146% between runs on the same sample
• Edison machines failed 87% of proficiency tests
• Zero peer-reviewed validation studies published
• Patients received HIV-positive results for samples that were negative
Sources: FDA Warning Letter 2016; Carreyrou J. Bad Blood. 2018; CMS Inspection Reports.
What Do You Choose?
Face lawsuits
Harm patients
Protect Your Patients
Avoid Scandal
A $9 billion valuation became a criminal fraud conviction.
Every hospital that demanded validation data before signing
was protected from the lie.
Every hospital that trusted the marketing
became complicit in harming patients.
The absence of evidence is not a marketing problem.
It is a patient safety emergency.
who pays the price?
The test result comes in 15 minutes.
But what if the result is 15 minutes of false confidence?
Real-world performance (Cochrane 2022):
• Symptomatic individuals: 73% sensitivity (missed 27%)
• Asymptomatic individuals: 58% sensitivity (missed 42%)
• Early infection (days 0-3): ~50% sensitivity
Nearly half of infected asymptomatic people were told they were "clear."
Source: Dinnes J et al. Cochrane Database Syst Rev. 2022;7:CD013705
What Do You Choose?
School closure
Three hospitalizations
Teacher isolates
Outbreak prevented
It means: "not detected."
The difference between these two phrases
is measured in lives.
is almost meaningless.
SnNout only works when sensitivity is HIGH.
Know your test's limits before trusting its verdict.
still cause harm?
What if the cancer it finds
would never have hurt you?
Sensitivity: ~85% | Specificity: ~90%
For 1,000 women screened annually for 10 years:
• 1 death prevented from breast cancer
• 5 women overtreated for cancers that would never have harmed them
• 100-500 false alarms leading to biopsies, anxiety, repeat imaging
Overdiagnosis rate: 19-30% of screen-detected cancers
Source: Independent UK Panel on Breast Cancer Screening. Lancet. 2012;380:1778-1786
What Do You Choose?
Tumor was indolent (DCIS)
Would never have harmed her
Understands benefits AND harms
Autonomy preserved
A test can be accurate and still cause harm.
When overdiagnosis exceeds lives saved,
we must ask: Is finding always helping?
can exceed the benefit from true positives.
Always weigh benefits against harms.
Screening is not always saving.
is worse than missing it?
What if the treatment causes more suffering
than the disease ever would?
• Sensitivity for high-grade cancer: 21%
• Detects many indolent cancers that would never harm
Lower cutoff to 2.5 ng/mL:
• Sensitivity rises to: 40%
• But overdiagnosis doubles
Treatment consequences:
• 20-30% of men experience incontinence after prostatectomy
• 30-70% experience erectile dysfunction
Source: US Preventive Services Task Force. JAMA. 2018;319(18):1901-1913
What Threshold Do You Choose?
Thousands of unnecessary
biopsies and treatments
But most missed are indolent
Fewer unnecessary treatments
Some preventable deaths
No overtreatment harm
Every threshold trades sensitivity for specificity,
detection for overdiagnosis.
The choice is not medical. It is ethical.
It depends on what harms you are willing to accept.
It is a values problem.
Before choosing a cutoff, ask:
What is worse: missing disease or overtreating the healthy?
Different truths.
How can identical numbers
mean opposite things?
Sensitivity: ~80% | Specificity: ~95%
In high-prevalence setting (TB prevalence 10%):
• Positive Predictive Value: 85%
• A positive test usually means TB
In low-prevalence setting (TB prevalence 0.1%):
• Positive Predictive Value: 15%
• A positive test is usually a false positive
Source: Pai M et al. Lancet Infect Dis. 2014;14(8):765-773
What Do You Conclude?
Patient infects family
Delays diagnosis for months
Chest X-ray, sputum
Treat early if confirmed
PPV and NPV are properties of the population.
The same result means different things
in different people.
A positive test in a high-risk patient means disease.
The same positive in a low-risk patient means probably nothing.
Context is everything.
Theranos: Demand Validation
No peer-reviewed data = no trust, regardless of marketing claims
COVID Rapid Tests: Know Sensitivity Limits
"Not detected" is not the same as "not infected"
Mammography: Weigh Benefits vs. Harms
Finding is not always helping; overdiagnosis causes real harm
PSA: The Threshold is a Values Choice
Every cutoff trades sensitivity for specificity; there is no "right" answer
TB Test: Context Determines Meaning
The same result means different things in different populations
Key Sources Cited in This Course
- Carreyrou J. Bad Blood: Secrets and Lies in a Silicon Valley Startup. Knopf, 2018.
- Dinnes J, et al. Rapid, point-of-care antigen tests for diagnosis of SARS-CoV-2 infection. Cochrane Database Syst Rev. 2022;7:CD013705.
- Independent UK Panel on Breast Cancer Screening. The benefits and harms of breast cancer screening. Lancet. 2012;380:1778-1786.
- Reitsma JB, et al. Bivariate analysis of sensitivity and specificity produces informative summary measures in diagnostic reviews. J Clin Epidemiol. 2005;58:982-990.
- Rutter CM, Gatsonis CA. A hierarchical regression approach to meta-analysis of diagnostic test accuracy evaluations. Stat Med. 2001;20:2865-2884.
- Deeks JJ, et al. The performance of tests of publication bias in systematic reviews of diagnostic test accuracy. J Clin Epidemiol. 2005;58:882-893.
- Macaskill P, et al. Cochrane Handbook for Systematic Reviews of Diagnostic Test Accuracy. Chapter 10. 2023.
- Higgins JPT, Thompson SG. Quantifying heterogeneity in a meta-analysis. Stat Med. 2002;21:1539-1558.
- US Food and Drug Administration. Warning Letter to Theranos Inc. 2016.
- US Preventive Services Task Force. Screening for Prostate Cancer. JAMA. 2018;319(18):1901-1913.
- Pai M, et al. Tuberculosis. Lancet Infect Dis. 2014;14(8):765-773.
the two virtues of a test,
the cruel trade-off of the threshold,
and the art of pooling evidence.
When the next test lies to you—
you will know how to see through it."
When the Test Lies — Now You Know.