테스트가 거짓말을 할 때: 진단 테스트 정확도에 대한 과정(강화)

그 여자의 이야기를 들어보셨나요
who promised to 피 한 방울로 세상을 바꿉니다,
who raised billions on a test that never worked?

Palo Alto, 2003

STANFORD UNIVERSITY

19세 소년이 비전을 가지고 중퇴했습니다: 수백 개의 피 한 방울로 테스트를 진행했습니다.

Investors believed. Walgreens believed. The Pentagon believed.

They gave her $9 billion.

그러나 테스트 결과가 잘못된 결과가 나왔습니다. 환자들은 HIV에 감염되지 않았는데도 HIV에 걸렸다는 말을 들었습니다. 환자들은 dying.

Carreyrou J. Bad Blood. 2018

기만의 결정나무

What Theranos Did vs. What Should Happen

New Diagnostic Test

↓

SHOULD DO

Validate Against Gold Standard

↓

Publish TP/FP/FN/TN

↓

FDA Approval

THERANOS DID

Skip Validation

↓

Hide Failures

↓

Harm Patients

"그리고 테스트는 거짓말을 했습니다.
거짓말은 확실하게 차려입었고
일 때 혈액이 정상이라는 말을 들었고 아무도 2x2 테이블을 요청하지 않았습니다."

이것이 우리가 진단 테스트 정확도를 연구하는 이유입니다.

When a test speaks,
오직 four possible truths.

두 가지만이 축복입니다. 두 가지는 저주입니다.

결과 트리

Every Test Result Has a Reality Behind It

Patient Tested

↓

진실은 무엇입니까?

Has Disease

D+

↓

TPTest +

FNTest -

No Disease

D-

↓

FPTest +

TNTest -

성스러운 2x2 표

HIV Rapid Test Example (Real Data)

	HIV+	HIV-	Total
Test +	98	3	101
Test -	2	895	897
Total	100	898	998

이 표에서 모든 진실이 나옵니다

Sensitivity = 98/100 = 98%
Specificity = 895/898 = 99.7%

"Two outcomes save. Two outcomes harm.
TP, TN: 테스트 결과는 사실입니다.
FP, FN: 테스트는 거짓말이었습니다.
Know them by name, for they determine fate."

검사를 받고
found clean,
수천 명에게 주어진 혈액에 대해 들어보지 못하셨습니까?
while death swam within it?

1985년 혈액 공급 위기

UNITED STATES

When HIV testing began, doctors celebrated: they could now screen the blood supply.

그러나 검사 결과는 a window period—바이러스가 존재했지만 undetectable.

혈액 검사 결과가 "음성"이었습니다. 혈액이 수혈되었습니다.

8,000-12,000 Americans 더 나은 검사가 종료되기 전에 수혈을 통해 감염되었습니다.

CDC. MMWR. 1987;36(49):833-840

The Window Period Decision Tree

Why False Negatives Are Deadly

Person Recently Infected

↓

Time Since Infection?

< 2 weeks

Test NEGATIVEVirus present!

↓

Blood DonatedOthers infected

> 4 weeks

Test POSITIVECorrectly detected

↓

Blood DiscardedSupply safe

민감도 변화 시간

0%

Day 1-7
Eclipse period

~50%

Day 14
Seroconversion

~95%

Day 21
Most detected

99.9%

Day 45+
Window closed

THE LESSON

감도는 고정되어 있지 않습니다. It depends on when you test. A "99% sensitive" test may be 0% sensitive in early infection.

"그리고 검사 결과는 '깨끗함'
바이러스가 아직 모습을 드러내지 않았기 때문입니다.
그리고 혈액을 공유했고
감염이 퍼졌습니다. "

A test has two virtues and two vices.

Sensitivity: 아픈 사람을 찾을 수 있습니까?

Specificity: 건강한 사람을 살려줄 수 있을까요?

감도: 헌터

THE FORMULA

Sensitivity = TP / (TP + FN)

"Of all the sick, how many did we catch?"

Worked Example: COVID PCR Test

Given: 200 infected patients tested

TP = 196 (correctly positive), FN = 4 (missed)

Sensitivity = 196 / (196 + 4) = 196/200 = 98%

Interpretation: Test catches 98 of every 100 infected people

특이성: 가디언

THE FORMULA

Specificity = TN / (TN + FP)

"Of all the healthy, how many did we spare?"

Worked Example: Same COVID PCR Test

Given: 1000 uninfected people tested

TN = 999 (correctly negative), FP = 1 (false alarm)

Specificity = 999 / (999 + 1) = 999/1000 = 99.9%

Interpretation: Test correctly clears 999 of every 1000 healthy people

기억 규칙

When to Use Which Test

무엇이 필요합니까?

RULE OUT disease

Use HIGH SENSITIVITY

↓

SnNoutSensitive Negative = OUT

RULE IN disease

Use HIGH SPECIFICITY

↓

SpPinSpecific Positive = IN

"민감함이 병을 잡습니다.
특이성이 장점을 살려줍니다.
But no test masters both perfectly—
이것이 우리가 짊어져야 할 부담입니다. 곰."

의사를 만나본 적 없나요
who saw 99% accurate
and believed a positive result meant 99% certainty?

이건 의학계에서 가장 치명적인 오류입니다.

기본율 오류

THE PUZZLE

A disease affects 1 in 1000 people.
검사의 민감도는 99%, 특이도는 99%입니다.
A patient tests positive.

질병에 걸릴 확률은 얼마나 됩니까?

Most doctors say ~99%. 실제 대답은 대략 다음과 같습니다. 9%.

밝혀진 수학

Testing 100,000 People (Prevalence 1/1000)

Step 1: 100 have disease, 99,900 healthy

Step 2: Of 100 sick: 99 test positive (TP), 1 negative (FN)

Step 3: Of 99,900 healthy: 999 test positive (FP), 98,901 negative (TN)

Step 4: Total positives = 99 + 999 = 1,098

PPV = TP / All Positives = 99 / 1,098 = 9%

긍정적인 결과의 91%는 거짓 긍정입니다!

확산의 결정 트리

Same Test, Different Settings

Test: 99% Sens, 99% Spec

↓

Where Is Testing Done?

General Population
Prevalence 0.1%

PPV = 9%91% false positives!

High-Risk Clinic
Prevalence 10%

PPV = 92%8% false positives

Confirmatory Test
Prevalence 50%

PPV = 99%1% false positives

"그리고 의사가 '99% 정확하다'고 하더군요."
환자는 '99% 확실하다'는 말을 들었습니다.
그리고 둘 다 속았습니다.
왜냐하면 그들은 '이 질병은 얼마나 희귀한가?'라고 묻는 것을 잊어버렸기 때문입니다."

남자에 대한 시험을 들어본 적이 없나요?
암을 발견한 것은 never kill,
그리고 다음과 같은 치료로 이어졌습니다. destroyed lives?

PSA 검사의 비극

UNITED STATES, 1990s-2010s

PSA (Prostate-Specific Antigen) could detect prostate cancer early.

의사들은 수백만 명의 남성을 검사했습니다. 암이 발견되었습니다. 전립선이 제거되었습니다.

그러나 이러한 "암" 중 상당수는 증상을 유발하지 않았을 것입니다. 수술로 인해 발생한 발기부전과 요실금 in men who would have died of old age, not cancer.

Moyer VA. Ann Intern Med. 2012;157:120-134

피해의 수

1

생명을 구한
prostate cancer
per 1000 screened

30-40

Men made impotent
or incontinent
per 1000 screened

100+

False positives
(biopsies, anxiety)
per 1000 screened

THE REVERSAL

In 2012, the US Preventive Services Task Force recommended against 정기적인 PSA 검사. 테스트에서는 찾을 필요가 없는 것을 너무 많이 찾았습니다.

심사 결정 트리

선별의 의도하지 않은 결과

1000 Men Screened

↓

~120 Positive PSA

↓

~30 Biopsies Show Cancer

↓

~25 Would Never
Have Harmed

~5 Truly
Aggressive

~880 Negative PSA

↓

Reassured(But ~3 have aggressive cancer missed)

"그리고 테스트 결과 그림자가 발견되었습니다.
그리고 외과 의사가 잘라냈어요.
그 사람은 살았습니다. 무력하고 자제할 수 없었습니다.
결코 깨어나지 않을 암에서."

감도는 테스트를 설명합니다.
특이성은 테스트를 설명합니다.

그런데 환자가 이렇게 묻는다.
"I tested positive. What are MY chances?"

Likelihood Ratios

POSITIVE LIKELIHOOD RATIO

LR+ = Sensitivity / (1 - Specificity)

How much more likely is a + result in sick vs healthy?

NEGATIVE LIKELIHOOD RATIO

LR- = (1 - Sensitivity) / Specificity

How much more likely is a - result in sick vs healthy?

페이건 노모그램

사전 테스트에서 사후 테스트 확률까지

Pre-Test
Probability

99%

50%

20%

5%

1%

Likelihood
Ratio

100

10

1

0.1

0.01

Post-Test
Probability

99%

80%

50%

20%

1%

Draw a line from pre-test through LR to find post-test probability

Interpreting Likelihood Ratios

이 테스트는 얼마나 강력한가요?

What Is the LR+?

LR+ > 10Strong rule-in

LR+ 5-10Moderate

LR+ 2-5Weak

LR+ 1-2Useless

What Is the LR-?

LR- < 0.1Strong rule-out

LR- 0.1-0.2Moderate

LR- 0.2-0.5Weak

LR- 0.5-1Useless

"민감함은 아픈 사람을 말해준다.
특이성은 우물을 말해줍니다.
But the likelihood ratio answers:
이 결과는 이 환자에게 무엇을 의미합니까?"

마을에서 열병에 걸린 아이를 본 적이 없나요?
그 급속한 테스트는 말했다 negative,
and the Plasmodium 그게 계속 늘어나는 거야?

말라리아 RDT 문제

SUB-SAHARAN AFRICA

Malaria kills 600,000 people yearly, mostly children under 5.

Rapid Diagnostic Tests were meant to guide treatment in remote areas without microscopes or laboratories.

But when parasitemia is low—RDT는 사례를 놓쳤습니다. And when P. falciparum HRP2 유전자를 삭제합니다. the RDT sees nothing at all.

WHO. Malaria RDT Performance. 2022

임상 결정 트리

Child with Fever in Malaria-Endemic Area

Febrile Child

↓

Perform RDT

↓

RDT Positive

↓

말라리아 치료

RDT Negative

↓

Clinical Suspicion?

High

Treat Anyway
or Microscopy

Low

Look for
Other Cause

Sensitivity Varies by Parasitemia

95%

High parasitemia
(>200/μL)

75%

Low parasitemia
(100-200/μL)

50%

Very low
(<100/μL)

임상 수업

A negative RDT does not rule out malaria in endemic areas. Clinical judgment must override the test when suspicion is high.

"그리고 검사 결과는 '음성'으로 나왔고
그리고 아이는 집으로 보내졌고
어둠 속에서 기생충이 번식했고
아침이 되자 아이는 wake."

역병이 창궐하던 해에
세계는 다음과 같은 테스트가 필요했습니다. fast.

그러나 빠른 것은 accurate.

코크란 평결

COVID-19 Rapid Antigen Tests (155 Studies Pooled)

Population	Sensitivity	Missed Cases
Symptomatic	73%	27% missed
Asymptomatic	55%	45% missed
First 7 days of symptoms	80%	20% missed

Dinnes J et al. Cochrane Database Syst Rev. 2022;7:CD013705

The False Security Decision Tree

Thanksgiving 2020: What Happened

Family Member Tests Negative

↓

이 사람이 정말로 부정적인가?

55% chance if asymptomatic

True NegativeSafe to gather

45% chance if asymptomatic

FALSE NegativeInfectious!

↓

가족과 함께Grandparents infected

"그리고 검사 결과는 '음성'으로 나왔고
모이는 것과는 다릅니다. 포옹하고
겨울이 끝날 무렵
할아버지는 땅에 묻혔습니다."

검진
암을 발견했다는 소식을 들어보신 적이 있으신가요? would never kill,
그리고 다음과 같은 치료로 이어졌습니다. caused more harm than the disease?

과잉 진단 문제

3-4

Lives saved
per 10,000 screened

~15

Overdiagnosed
(treated unnecessarily)

~500

False alarms
(anxiety, biopsies)

THE QUESTION

3~4명의 생명을 구하기 위해 최대 15명의 여성이 자신에게 결코 해를 끼치지 않을 암에 대한 수술, 방사선 및 화학 요법을 받습니다.

이렇게 절충할 가치가 있습니까?

심사 결정 트리

10회 이상 선별검사를 받은 10,000명의 여성 몇 년 동안

10,000 Women

↓

~1,000 RecalledAbnormal mammogram

↓

~500 False AlarmAnxiety only

~500 Biopsy~50 cancer found

~9,000 ClearedContinue screening

Of ~50 Cancers Found

~35 Would KillTreatment saves 3-4

~15 Would Never KillOverdiagnosed

"그리고 테스트 결과 그림자가 발견되었습니다.
암이라고 불렀고
그 여자는 상처를 입고 화상을 입었습니다.
그녀의 나날을 결코 어둡게 하지 않을 그림자를 위해."

한 연구는 속일 수도 있습니다.
한 연구는 더 그럴듯할 수도 있습니다.

하지만 모이면 증거—
the truth becomes harder to hide.

Why DTA Meta-Analysis Is Different

THE PROBLEM

민감도와 특이도는 correlated. When one goes up, the other tends to go down.

치료 효과처럼 따로 모아둘 수는 없습니다. bivariate model.

SROC 곡선

Reading ROC Space

Top-Left CornerPerfect Test

↓ (curve shows trade-off)

Diagonal LineUseless Test (Chance)

SROC가 보여주는 내용

Each dot = one study's sensitivity & specificity
곡선 = 모든 연구의 요약
Closer to top-left = better test

"한 연구는 속일 수 있습니다.
많은 연구, 함께 무게를 달아
진실의 경로를 추적하세요.
테스트가 실제로 수행할 수 있는 작업을 보여주는 SROC 곡선입니다."

하지만 연구를 한다면 어떨까요? disagree?

One says sensitivity is 95%.
Another says 60%.

어떤 진실을 믿습니까?

Sources of Heterogeneity

연구 결과가 일치하지 않는 이유

같은 테스트, 다른 결과?

ThresholdDifferent cutoffs

PopulationSeverity, age

SettingPrimary vs specialist

QualityBias, blinding

Measuring Disagreement: I²

I² < 25%

Low
Studies agree

I² 25-75%

Moderate
Some variation

I² > 75%

High
Major disagreement

THE WARNING

When I² > 75%, the pooled estimate may be meaningless. Explain the disagreement before averaging.

"연구 결과에 동의하지 않을 경우
반대 의견을 침묵시키지 마십시오.
Ask: Why do they see differently?
동의 불일치 자체가 교훈을 줍니다."

DTA 툴킷

필수 조치 및 사용 시기

The Checklist

✓

Was there a valid reference standard?

Gold standard applied to ALL patients?

✓

통역사의 눈이 멀었나요?

Test readers unaware of diagnosis?

✓

스펙트럼이 적절했습니까?

귀하의 모집단과 유사한 환자가 있습니까?

✓

임계값이 미리 지정되어 있습니까?

아니면 결과를 최대화하기 위해 선택되었습니까?

When Results Don't Match Suspicion

The Clinical Override Decision Tree

Test Negative, High Suspicion

↓

What Is the LR-?

LR- < 0.1

Strong rule-outAccept negative

LR- 0.1-0.5

Consider repeat testOr different test

LR- > 0.5

Trust clinical judgmentTest is weak

"Armed with sensitivity, specificity, likelihood,
SROC와 일치 정도
로 무장하여 테스트의 거짓말을 꿰뚫어 볼 수 있으며
진실 여부를 판단할 수 있습니다. "

References

Key Sources

Carreyrou J. Bad Blood. Knopf, 2018.
CDC. MMWR. 1987;36(49):833-840. [HIV blood supply]
Dinnes J et al. Cochrane Database Syst Rev. 2022;7:CD013705. [COVID RAT]
Moyer VA. Ann Intern Med. 2012;157:120-134. [PSA screening]
UK Panel. Lancet. 2012;380:1778-1786. [Mammography]
WHO. Malaria RDT Performance. 2022.
Reitsma JB et al. J Clin Epidemiol. 2005;58:982-990. [Bivariate model]
Deeks JJ et al. J Clin Epidemiol. 2005;58:882-893. [Publication bias]
Macaskill P et al. Cochrane Handbook Ch. 10. 2023.

테스트는 민감도가 99%이고 구체적이 99%입니다. 질병 유병률은 1/1000입니다. 환자가 양성 반응을 보였습니다. 그 사람이 질병에 걸릴 확률은 얼마나 됩니까?

99%

90%

About 9%

50%

검사에도 불구하고 혈액 공급이 HIV에 오염된 이유는 무엇입니까?

The tests had low specificity

감염 초기에 검사의 민감도가 낮은 잠복기가 있었습니다

검사가 제대로 수행되지 않았습니다

검사 비용이 너무 많이 들었습니다

What does "SnNout" mean?

A highly Sensitive test, when Negative, rules OUT disease

A highly Specific test, when Negative, rules OUT disease

Sensitivity should be used for screening

Specificity should be above 90%

✔

Course Complete

"이제 네 가지 결과,
테스트의 두 가지 장점
기본의 오류
및 풀링 기술을 알게 되었습니다. 증거.

다음 테스트가 당신에게 놓여 있을 때—
알게 될 것입니다."