DTA > The Fraud
Have you not heard the tale of the woman
who promised to change the world with a drop of blood,
who raised billions on a test that never worked?
who promised to change the world with a drop of blood,
who raised billions on a test that never worked?
Palo Alto, 2003
STANFORD UNIVERSITY
A nineteen-year-old dropped out with a vision: hundreds of blood tests from a single drop.
Investors believed. Walgreens believed. The Pentagon believed.
They gave her $9 billion.
But the tests gave wrong results. Patients were told they had HIV when they didn't. Patients were told their blood was normal when they were dying.
Investors believed. Walgreens believed. The Pentagon believed.
They gave her $9 billion.
But the tests gave wrong results. Patients were told they had HIV when they didn't. Patients were told their blood was normal when they were dying.
Carreyrou J. Bad Blood. 2018
The Decision Tree of Deception
What Theranos Did vs. What Should Happen
New Diagnostic Test
↓
SHOULD DO
Validate Against Gold Standard
↓
Publish TP/FP/FN/TN
↓
FDA Approval
THERANOS DID
Skip Validation
↓
Hide Failures
↓
Harm Patients
"And the test lied,
and the lie was dressed in certainty,
and no one asked for the 2x2 table."
and the lie was dressed in certainty,
and no one asked for the 2x2 table."
This is why we study Diagnostic Test Accuracy.
When a test speaks,
there are only four possible truths.
Two are blessings. Two are curses.
there are only four possible truths.
Two are blessings. Two are curses.
The Tree of Outcomes
Every Test Result Has a Reality Behind It
Patient Tested
↓
What is the TRUTH?
Has Disease
D+
↓
TPTest +
FNTest -
No Disease
D-
↓
FPTest +
TNTest -
The Sacred 2x2 Table
HIV Rapid Test Example (Real Data)
| HIV+ | HIV- | Total | |
|---|---|---|---|
| Test + | 98 | 3 | 101 |
| Test - | 2 | 895 | 897 |
| Total | 100 | 898 | 998 |
FROM THIS TABLE COMES ALL TRUTH
Sensitivity = 98/100 = 98%
Specificity = 895/898 = 99.7%
Specificity = 895/898 = 99.7%
"Two outcomes save. Two outcomes harm.
TP, TN: the test spoke true.
FP, FN: the test lied.
Know them by name, for they determine fate."
TP, TN: the test spoke true.
FP, FN: the test lied.
Know them by name, for they determine fate."
Have you not heard of the blood that was tested,
found clean,
and given to thousands—
while death swam within it?
found clean,
and given to thousands—
while death swam within it?
The Blood Supply Crisis, 1985
UNITED STATES
When HIV testing began, doctors celebrated: they could now screen the blood supply.
But the test had a window period—weeks after infection when the virus was present but undetectable.
Blood was tested. Blood was "negative." Blood was transfused.
8,000-12,000 Americans were infected through transfusions before better tests closed the window.
But the test had a window period—weeks after infection when the virus was present but undetectable.
Blood was tested. Blood was "negative." Blood was transfused.
8,000-12,000 Americans were infected through transfusions before better tests closed the window.
CDC. MMWR. 1987;36(49):833-840
The Window Period Decision Tree
Why False Negatives Are Deadly
Person Recently Infected
↓
Time Since Infection?
< 2 weeks
Test NEGATIVEVirus present!
↓
Blood DonatedOthers infected
> 4 weeks
Test POSITIVECorrectly detected
↓
Blood DiscardedSupply safe
Sensitivity Changes Over Time
0%
Day 1-7
Eclipse period
Eclipse period
~50%
Day 14
Seroconversion
Seroconversion
~95%
Day 21
Most detected
Most detected
99.9%
Day 45+
Window closed
Window closed
THE LESSON
Sensitivity is not fixed. It depends on when you test.
A "99% sensitive" test may be 0% sensitive in early infection.
"And the test said 'clean,'
for the virus had not yet shown its face.
And the blood was shared,
and the infection spread to the innocent."
for the virus had not yet shown its face.
And the blood was shared,
and the infection spread to the innocent."
A test has two virtues and two vices.
Sensitivity: Can it find the sick?
Specificity: Can it spare the healthy?
Sensitivity: Can it find the sick?
Specificity: Can it spare the healthy?
Sensitivity: The Hunter
THE FORMULA
Sensitivity = TP / (TP + FN)
"Of all the sick, how many did we catch?"
Worked Example: COVID PCR Test
Given: 200 infected patients tested
TP = 196 (correctly positive), FN = 4 (missed)
Sensitivity = 196 / (196 + 4) = 196/200 = 98%
Interpretation: Test catches 98 of every 100 infected people
Specificity: The Guardian
THE FORMULA
Specificity = TN / (TN + FP)
"Of all the healthy, how many did we spare?"
Worked Example: Same COVID PCR Test
Given: 1000 uninfected people tested
TN = 999 (correctly negative), FP = 1 (false alarm)
Specificity = 999 / (999 + 1) = 999/1000 = 99.9%
Interpretation: Test correctly clears 999 of every 1000 healthy people
The Memory Rules
When to Use Which Test
What Do You Need?
RULE OUT disease
Use HIGH SENSITIVITY
↓
SnNoutSensitive Negative = OUT
RULE IN disease
Use HIGH SPECIFICITY
↓
SpPinSpecific Positive = IN
"Sensitivity catches the sick.
Specificity spares the well.
But no test masters both perfectly—
this is the burden we bear."
Specificity spares the well.
But no test masters both perfectly—
this is the burden we bear."
Have you not seen the physician
who saw 99% accurate
and believed a positive result meant 99% certainty?
This is the deadliest error in medicine.
who saw 99% accurate
and believed a positive result meant 99% certainty?
This is the deadliest error in medicine.
The Base Rate Fallacy
THE PUZZLE
A disease affects 1 in 1000 people.
A test is 99% sensitive and 99% specific.
A patient tests positive.
What is the probability they have the disease?
Most doctors say ~99%. The real answer is about 9%.
A test is 99% sensitive and 99% specific.
A patient tests positive.
What is the probability they have the disease?
Most doctors say ~99%. The real answer is about 9%.
The Math Revealed
Testing 100,000 People (Prevalence 1/1000)
Step 1: 100 have disease, 99,900 healthy
Step 2: Of 100 sick: 99 test positive (TP), 1 negative (FN)
Step 3: Of 99,900 healthy: 999 test positive (FP), 98,901 negative (TN)
Step 4: Total positives = 99 + 999 = 1,098
PPV = TP / All Positives = 99 / 1,098 = 9%
91% of positive results are FALSE POSITIVES!
The Decision Tree of Prevalence
Same Test, Different Settings
Test: 99% Sens, 99% Spec
↓
Where Is Testing Done?
General Population
Prevalence 0.1%
Prevalence 0.1%
PPV = 9%91% false positives!
High-Risk Clinic
Prevalence 10%
Prevalence 10%
PPV = 92%8% false positives
Confirmatory Test
Prevalence 50%
Prevalence 50%
PPV = 99%1% false positives
"And the doctor said '99% accurate,'
and the patient heard '99% certain,'
and both were deceived—
for they forgot to ask: How rare is this disease?"
and the patient heard '99% certain,'
and both were deceived—
for they forgot to ask: How rare is this disease?"
Have you not heard of the test for men
that found cancers that would never kill,
and led to treatments that destroyed lives?
that found cancers that would never kill,
and led to treatments that destroyed lives?
The PSA Screening Tragedy
UNITED STATES, 1990s-2010s
PSA (Prostate-Specific Antigen) could detect prostate cancer early.
Doctors screened millions of men. Cancers were found. Prostates were removed.
But many of these "cancers" would never have caused symptoms. The surgery caused impotence and incontinence in men who would have died of old age, not cancer.
Doctors screened millions of men. Cancers were found. Prostates were removed.
But many of these "cancers" would never have caused symptoms. The surgery caused impotence and incontinence in men who would have died of old age, not cancer.
Moyer VA. Ann Intern Med. 2012;157:120-134
The Numbers of Harm
1
Life saved from
prostate cancer
per 1000 screened
prostate cancer
per 1000 screened
30-40
Men made impotent
or incontinent
per 1000 screened
or incontinent
per 1000 screened
100+
False positives
(biopsies, anxiety)
per 1000 screened
(biopsies, anxiety)
per 1000 screened
THE REVERSAL
In 2012, the US Preventive Services Task Force recommended against
routine PSA screening. The test was finding too much that didn't need finding.
The Screening Decision Tree
The Unintended Consequences of Screening
1000 Men Screened
↓
~120 Positive PSA
↓
~30 Biopsies Show Cancer
↓
~25 Would Never
Have Harmed
Have Harmed
~5 Truly
Aggressive
Aggressive
~880 Negative PSA
↓
Reassured(But ~3 have aggressive cancer missed)
"And the test found the shadow,
and the surgeon cut,
and the man lived—impotent, incontinent—
from a cancer that would never have woken."
and the surgeon cut,
and the man lived—impotent, incontinent—
from a cancer that would never have woken."
Sensitivity describes the test.
Specificity describes the test.
But the patient asks:
"I tested positive. What are MY chances?"
Specificity describes the test.
But the patient asks:
"I tested positive. What are MY chances?"
Likelihood Ratios
POSITIVE LIKELIHOOD RATIO
LR+ = Sensitivity / (1 - Specificity)
How much more likely is a + result in sick vs healthy?
NEGATIVE LIKELIHOOD RATIO
LR- = (1 - Sensitivity) / Specificity
How much more likely is a - result in sick vs healthy?
The Fagan Nomogram
From Pre-Test to Post-Test Probability
Pre-Test
Probability
Probability
99%
50%
20%
5%
1%
Likelihood
Ratio
Ratio
100
10
1
0.1
0.01
Post-Test
Probability
Probability
99%
80%
50%
20%
1%
Draw a line from pre-test through LR to find post-test probability
Interpreting Likelihood Ratios
How Powerful Is This Test?
What Is the LR+?
LR+ > 10Strong rule-in
LR+ 5-10Moderate
LR+ 2-5Weak
LR+ 1-2Useless
What Is the LR-?
LR- < 0.1Strong rule-out
LR- 0.1-0.2Moderate
LR- 0.2-0.5Weak
LR- 0.5-1Useless
"Sensitivity tells of the sick.
Specificity tells of the well.
But the likelihood ratio answers:
What does this result mean for THIS patient?"
Specificity tells of the well.
But the likelihood ratio answers:
What does this result mean for THIS patient?"
Have you not seen the child with fever in the village,
the rapid test that said negative,
and the Plasmodium that kept multiplying?
the rapid test that said negative,
and the Plasmodium that kept multiplying?
The Malaria RDT Problem
SUB-SAHARAN AFRICA
Malaria kills 600,000 people yearly, mostly children under 5.
Rapid Diagnostic Tests were meant to guide treatment in remote areas without microscopes or laboratories.
But when parasitemia is low—the RDT misses cases. And when P. falciparum deletes the HRP2 gene— the RDT sees nothing at all.
Rapid Diagnostic Tests were meant to guide treatment in remote areas without microscopes or laboratories.
But when parasitemia is low—the RDT misses cases. And when P. falciparum deletes the HRP2 gene— the RDT sees nothing at all.
WHO. Malaria RDT Performance. 2022
The Clinical Decision Tree
Child with Fever in Malaria-Endemic Area
Febrile Child
↓
Perform RDT
↓
RDT Positive
↓
Treat for Malaria
RDT Negative
↓
Clinical Suspicion?
High
Treat Anyway
or Microscopy
or Microscopy
Low
Look for
Other Cause
Other Cause
Sensitivity Varies by Parasitemia
95%
High parasitemia
(>200/μL)
(>200/μL)
75%
Low parasitemia
(100-200/μL)
(100-200/μL)
50%
Very low
(<100/μL)
(<100/μL)
THE CLINICAL LESSON
A negative RDT does not rule out malaria in endemic areas.
Clinical judgment must override the test when suspicion is high.
"And the test said 'negative,'
and the child was sent home,
and the parasites multiplied in the dark,
and by morning the child could not wake."
and the child was sent home,
and the parasites multiplied in the dark,
and by morning the child could not wake."
In the year of pestilence,
the world needed a test that was fast.
But fast is not the same as accurate.
the world needed a test that was fast.
But fast is not the same as accurate.
The Cochrane Verdict
COVID-19 Rapid Antigen Tests (155 Studies Pooled)
| Population | Sensitivity | Missed Cases |
|---|---|---|
| Symptomatic | 73% | 27% missed |
| Asymptomatic | 55% | 45% missed |
| First 7 days of symptoms | 80% | 20% missed |
Dinnes J et al. Cochrane Database Syst Rev. 2022;7:CD013705
The False Security Decision Tree
Thanksgiving 2020: What Happened
Family Member Tests Negative
↓
Is This Person Truly Negative?
55% chance if asymptomatic
True NegativeSafe to gather
45% chance if asymptomatic
FALSE NegativeInfectious!
↓
Gathers with FamilyGrandparents infected
"And the test said 'negative,'
and the family embraced,
and by winter's end,
the grandfather was buried."
and the family embraced,
and by winter's end,
the grandfather was buried."
Have you not heard of the screening
that found cancers that would never kill,
and led to treatments that caused more harm than the disease?
that found cancers that would never kill,
and led to treatments that caused more harm than the disease?
The Overdiagnosis Problem
3-4
Lives saved
per 10,000 screened
per 10,000 screened
~15
Overdiagnosed
(treated unnecessarily)
(treated unnecessarily)
~500
False alarms
(anxiety, biopsies)
(anxiety, biopsies)
THE QUESTION
To save 3-4 lives, ~15 women receive surgery, radiation, and chemotherapy
for cancers that would never have harmed them.
Is this trade-off worth it?
Is this trade-off worth it?
The Screening Decision Tree
10,000 Women Screened Over 10 Years
10,000 Women
↓
~1,000 RecalledAbnormal mammogram
↓
~500 False AlarmAnxiety only
~500 Biopsy~50 cancer found
~9,000 ClearedContinue screening
Of ~50 Cancers Found
~35 Would KillTreatment saves 3-4
~15 Would Never KillOverdiagnosed
"And the test found the shadow,
and called it cancer,
and the woman was cut and burned—
for a shadow that would never have darkened her days."
and called it cancer,
and the woman was cut and burned—
for a shadow that would never have darkened her days."
One study may deceive.
One study may flatter.
But when you gather all the evidence—
the truth becomes harder to hide.
One study may flatter.
But when you gather all the evidence—
the truth becomes harder to hide.
Why DTA Meta-Analysis Is Different
THE PROBLEM
Sensitivity and specificity are correlated.
When one goes up, the other tends to go down.
You cannot pool them separately like treatment effects. You need the bivariate model.
You cannot pool them separately like treatment effects. You need the bivariate model.
The SROC Curve
Reading ROC Space
Top-Left CornerPerfect Test
↓ (curve shows trade-off)
Diagonal LineUseless Test (Chance)
WHAT THE SROC SHOWS
Each dot = one study's sensitivity & specificity
The curve = summary of all studies
Closer to top-left = better test
The curve = summary of all studies
Closer to top-left = better test
"One study may deceive.
Many studies, weighed together,
trace the path of truth—
the SROC curve that reveals what the test can truly do."
Many studies, weighed together,
trace the path of truth—
the SROC curve that reveals what the test can truly do."
But what if the studies disagree?
One says sensitivity is 95%.
Another says 60%.
Which truth do you believe?
One says sensitivity is 95%.
Another says 60%.
Which truth do you believe?
Sources of Heterogeneity
Why Studies Disagree
Same Test, Different Results?
ThresholdDifferent cutoffs
PopulationSeverity, age
SettingPrimary vs specialist
QualityBias, blinding
Measuring Disagreement: I²
I² < 25%
Low
Studies agree
Studies agree
I² 25-75%
Moderate
Some variation
Some variation
I² > 75%
High
Major disagreement
Major disagreement
THE WARNING
When I² > 75%, the pooled estimate may be meaningless.
Explain the disagreement before averaging.
"When the studies disagree,
do not silence the dissent.
Ask: Why do they see differently?
The disagreement itself teaches."
do not silence the dissent.
Ask: Why do they see differently?
The disagreement itself teaches."
Your DTA Toolkit
The essential measures and when to use them
The Checklist
✓
Was there a valid reference standard?
Gold standard applied to ALL patients?
✓
Were interpreters blinded?
Test readers unaware of diagnosis?
✓
Was the spectrum appropriate?
Patients similar to your population?
✓
Was the threshold pre-specified?
Or chosen to maximize results?
When Results Don't Match Suspicion
The Clinical Override Decision Tree
Test Negative, High Suspicion
↓
What Is the LR-?
LR- < 0.1
Strong rule-outAccept negative
LR- 0.1-0.5
Consider repeat testOr different test
LR- > 0.5
Trust clinical judgmentTest is weak
"Armed with sensitivity, specificity, likelihood,
armed with the SROC and the measure of agreement,
you can see through the lie of the test—
and judge its truth for yourself."
armed with the SROC and the measure of agreement,
you can see through the lie of the test—
and judge its truth for yourself."
References
Key Sources
- Carreyrou J. Bad Blood. Knopf, 2018.
- CDC. MMWR. 1987;36(49):833-840. [HIV blood supply]
- Dinnes J et al. Cochrane Database Syst Rev. 2022;7:CD013705. [COVID RAT]
- Moyer VA. Ann Intern Med. 2012;157:120-134. [PSA screening]
- UK Panel. Lancet. 2012;380:1778-1786. [Mammography]
- WHO. Malaria RDT Performance. 2022.
- Reitsma JB et al. J Clin Epidemiol. 2005;58:982-990. [Bivariate model]
- Deeks JJ et al. J Clin Epidemiol. 2005;58:882-893. [Publication bias]
- Macaskill P et al. Cochrane Handbook Ch. 10. 2023.
A test is 99% sensitive and 99% specific. Disease prevalence is 1/1000. A patient tests positive. What is the probability they have the disease?
99%
90%
About 9%
50%
Why did the blood supply become contaminated with HIV despite testing?
The tests had low specificity
The tests had a window period with low sensitivity in early infection
The tests were not performed correctly
The tests were too expensive
What does "SnNout" mean?
A highly Sensitive test, when Negative, rules OUT disease
A highly Specific test, when Negative, rules OUT disease
Sensitivity should be used for screening
Specificity should be above 90%
✔
Course Complete
"Now you know the four outcomes,
the two virtues of a test,
the fallacy of the base rate,
and the art of pooling evidence.
When the next test lies to you—
you will know."
the two virtues of a test,
the fallacy of the base rate,
and the art of pooling evidence.
When the next test lies to you—
you will know."
1 / 4