시험이 거짓말을 할 때: Ultimate DTA 과정(V3)

그 여자의 이야기를 들어보셨나요
who promised to 피 한 방울로 세상을 바꿉니다,
who raised billions on a test that never worked?

Palo Alto, 2003

STANFORD UNIVERSITY

19세 소년이 비전을 가지고 중퇴했습니다: 수백 개의 피 한 방울로 테스트를 진행했습니다.

Investors believed. Walgreens believed. The Pentagon believed.

They gave her $9 billion.

그러나 테스트 결과가 잘못된 결과가 나왔습니다. 환자들은 HIV에 감염되지 않았는데도 HIV에 걸렸다는 말을 들었습니다. 환자들은 dying.

Carreyrou J. Bad Blood. 2018

기만의 결정나무

What Theranos Did vs. What Should Happen

New Diagnostic Test

↓

SHOULD DO

Validate Against Gold Standard

↓

Publish TP/FP/FN/TN

↓

FDA Approval

THERANOS DID

Skip Validation

↓

Hide Failures

↓

Harm Patients

"그리고 테스트는 거짓말을 했습니다.
거짓말은 확실하게 차려입었고
아무도 2×2 테이블을 요청하지 않았습니다."

이것이 우리가 진단 테스트 정확도를 연구하는 이유입니다.

When a test speaks,
오직 four possible truths.

두 가지만이 축복입니다. 두 가지는 저주입니다.

결과 트리

Every Test Result Has a Reality Behind It

Patient Tested

↓

진실은 무엇입니까?

Has Disease

D+

↓

TPTest +

FNTest -

No Disease

D-

↓

FPTest +

TNTest -

신성한 2×2 테이블

HIV Rapid Test Example (Real Data)

	HIV+	HIV-	Total
Test +	98	3	101
Test -	2	895	897
Total	100	898	998

이 표에서 모든 진실이 나옵니다

Sensitivity = 98/100 = 98%
Specificity = 895/898 = 99.7%

"Two outcomes save. Two outcomes harm.
TP, TN: 테스트 결과는 사실입니다.
FP, FN: 테스트는 거짓말이었습니다.
Know them by name, for they determine fate."

검사를 받고
found clean,
수천 명에게 주어진 혈액에 대해 들어보지 못하셨습니까?
while death swam within it?

1985년 혈액 공급 위기

UNITED STATES

When HIV testing began, doctors celebrated: they could now screen the blood supply.

그러나 검사 결과는 a window period—바이러스가 존재했지만 undetectable.

혈액 검사 결과가 "음성"이었습니다. 혈액이 수혈되었습니다.

8,000-12,000 Americans 더 나은 검사가 종료되기 전에 수혈을 통해 감염되었습니다.

CDC. MMWR. 1987;36(49):833-840

The Window Period Decision Tree

Why False Negatives Are Deadly

Person Recently Infected

↓

Time Since Infection?

< 2 weeks

Test NEGATIVEVirus present!

↓

Blood DonatedOthers infected

> 4 weeks

Test POSITIVECorrectly detected

↓

Blood DiscardedSupply safe

민감도 변화 시간

0%

Day 1-7
Eclipse period

~50%

Day 14
Seroconversion

~95%

Day 21
Most detected

99.9%

Day 45+
Window closed

THE LESSON

감도는 고정되어 있지 않습니다. It depends on when you test. A "99% sensitive" test may be 0% sensitive in early infection.

"그리고 검사 결과는 '깨끗함'
바이러스가 아직 모습을 드러내지 않았기 때문입니다.
그리고 혈액을 공유했고
감염이 퍼졌습니다. "

어머니에게 주어진 알약에 대해 들어보셨나요?
to protect their pregnancies,
that planted cancer in their daughters
twenty years before it bloomed?

1938-1971년 DES 비극

UNITED STATES & EUROPE

Diethylstilbestrol (DES) was given to millions of pregnant women to prevent miscarriage.

No proper clinical trial was ever conducted. Doctors assumed it worked because it seemed reasonable.

Decades later, their daughters developed a rare cancer: clear cell adenocarcinoma of the vagina. A cancer so rare it was a diagnostic signal in itself.

5-10 million women 해로움이 드러났습니다.

Herbst AL et al. N Engl J Med. 1971;284:878-881

검증 결정 트리

What Should Have Happened

New Medical Intervention

↓

제대로 테스트되었습니까?

YES

Randomized Trial

↓

Long-term Follow-up

↓

Know True Effects장점과 해로움

NO (DES)

Assumption Only

↓

Widespread Use

↓

Hidden HarmDiscovered too late

진단 신호

희귀성이 높아지는 경우 증거

질의 투명 세포 선암종은 젊은 여성에게는 매우 드물기 때문에 7 cases in one hospital triggered an investigation.

클러스터 자체가 진단 테스트였습니다.
Sensitivity to DES exposure: nearly 100%
이 나이에 이 암에 걸렸다면 거의 확실히 노출된 것입니다.

1:1000

Risk of clear cell
cancer in DES daughters

5-10M

Women exposed
worldwide

"그리고 어머니들은 약을 복용했습니다.
그리고 딸들은 그림자 속에서 자랐고
20년 후에 암이 피어났습니다.
a diagnosis that indicted a generation of medicine."

A test has two virtues and two vices.

Sensitivity: 아픈 사람을 찾을 수 있습니까?

Specificity: 건강한 사람을 살려줄 수 있을까요?

감도: 헌터

THE FORMULA

Sensitivity = TP / (TP + FN)

"Of all the sick, how many did we catch?"

Worked Example: COVID PCR Test

Given: 200 infected patients tested

TP = 196 (correctly positive), FN = 4 (missed)

Sensitivity = 196 / (196 + 4) = 196/200 = 98%

Interpretation: Test catches 98 of every 100 infected people

특이성: 가디언

THE FORMULA

Specificity = TN / (TN + FP)

"Of all the healthy, how many did we spare?"

Worked Example: Same COVID PCR Test

Given: 1000 uninfected people tested

TN = 999 (correctly negative), FP = 1 (false alarm)

Specificity = 999 / (999 + 1) = 999/1000 = 99.9%

Interpretation: Test correctly clears 999 of every 1000 healthy people

기억 규칙

When to Use Which Test

무엇이 필요합니까?

RULE OUT disease

Use HIGH SENSITIVITY

↓

SnNoutSensitive Negative = OUT

RULE IN disease

Use HIGH SPECIFICITY

↓

SpPinSpecific Positive = IN

"민감함이 병을 잡습니다.
특이성이 장점을 살려줍니다.
But no test masters both perfectly—
이것이 우리가 짊어져야 할 부담입니다. 곰."

의사를 만나본 적 없나요
who saw 99% accurate
and believed a positive result meant 99% certainty?

이건 의학계에서 가장 치명적인 오류입니다.

기본율 오류

THE PUZZLE

A disease affects 1 in 1000 people.
검사의 민감도는 99%, 특이도는 99%입니다.
A patient tests positive.

질병에 걸릴 확률은 얼마나 됩니까?

Most doctors say ~99%. 실제 대답은 대략 다음과 같습니다. 9%.

밝혀진 수학

Testing 100,000 People (Prevalence 1/1000)

Step 1: 100 have disease, 99,900 healthy

Step 2: Of 100 sick: 99 test positive (TP), 1 negative (FN)

Step 3: Of 99,900 healthy: 999 test positive (FP), 98,901 negative (TN)

Step 4: Total positives = 99 + 999 = 1,098

PPV = TP / All Positives = 99 / 1,098 = 9%

긍정적인 결과의 91%는 거짓 긍정입니다!

Interactive Base Rate Calculator

See How Prevalence Changes PPV

Prevalence:

1%

Sensitivity:

99%

Specificity:

99%

9%

Positive Predictive Value (PPV)

91%의 긍정이 거짓입니다. 알람

확산의 결정 트리

Same Test, Different Settings

Test: 99% Sens, 99% Spec

↓

Where Is Testing Done?

General Pop
0.1%

PPV = 9%91% false +

High-Risk
10%

PPV = 92%8% false +

Confirmatory
50%

PPV = 99%1% false +

"그리고 의사가 '99% 정확하다'고 하더군요."
환자는 '99% 확실하다'는 말을 들었습니다.
그리고 둘 다 속았습니다.
왜냐하면 그들은 '이 질병은 얼마나 희귀한가?'라고 묻는 것을 잊어버렸기 때문입니다."

호출
that could find TB in two hours,
그게 불려졌어 revolutionary—
했지만 놓친 drug-resistant strains?

남아프리카공화국의 GeneXpert 스토리

CAPE TOWN, 2010

100년 동안 결핵 진단을 위해서는 몇 주 동안 박테리아를 키워야 했습니다. 그런 다음 GeneXpert가 등장했습니다. 결과는 2 hours.

South Africa deployed it nationwide. The WHO endorsed it.

그러나 환자의 경우 low bacterial loads—often HIV co-infected— sensitivity dropped to 67%. One in three cases missed.

그리고 리팜피신 내성을 검출하기 위해 내성 사례를 놓쳤습니다 5% . 그 환자들은 잘못된 치료를 받았습니다. 저항성 결핵 확산.

Steingart KR et al. Cochrane Database Syst Rev. 2014;1:CD009593

TB Diagnosis Decision Tree

GeneXpert가 충분하지 않은 경우

Suspected TB Patient

↓

GeneXpert Test

↓

Positive

↓

Rifampicin?

SensitiveStandard Tx

ResistantMDR-TB Tx

Negative

↓

HIV+ or High Suspicion?

YesCulture needed

NoLikely negative

Sensitivity by Patient Type

98%

Smear-positive
(high bacterial load)

67%

Smear-negative
(low bacterial load)

61%

HIV co-infected
(immune suppressed)

THE LESSON

임상 시험에서 검사의 민감도가 환자의 민감도와 일치하지 않을 수 있습니다. 인구를 파악하세요.

"그리고 기계는 이렇게 말했습니다. '음성'
의사는 기계를 믿었고
환자는 폐결핵을 앓고
폐에 기침 저항이 생겨 집으로 돌아갔습니다. "

남자에 대한 시험을 들어본 적이 없나요?
암을 발견한 것은 never kill,
그리고 다음과 같은 치료로 이어졌습니다. destroyed lives?

PSA 검사의 비극

UNITED STATES, 1990s-2010s

PSA (Prostate-Specific Antigen) could detect prostate cancer early.

의사들은 수백만 명의 남성을 검사했습니다. 암이 발견되었습니다. 전립선이 제거되었습니다.

그러나 이러한 "암" 중 상당수는 증상을 유발하지 않았을 것입니다. 수술로 인해 발생한 발기부전과 요실금 in men who would have died of old age, not cancer.

Moyer VA. Ann Intern Med. 2012;157:120-134

피해의 수

1

생명을 구한
prostate cancer
per 1000 screened

30-40

Men made impotent
or incontinent
per 1000 screened

100+

False positives
(biopsies, anxiety)
per 1000 screened

THE REVERSAL

In 2012, the US Preventive Services Task Force recommended against 정기적인 PSA 검사. 테스트에서는 찾을 필요가 없는 것을 너무 많이 찾았습니다.

Patient Decision Aid: PSA Screening

55~69세 남성 1,000명이 13년 동안 검사를 받는 경우

Deaths from prostate cancer prevented

1-2 men

Men who will have false positive requiring biopsy

100-120 men

결코 해롭지 않은 암 진단을 받은 남성

20-50 men

Men left impotent or incontinent from treatment

30-40 men

이러한 절충안이 귀하에게 허용됩니까?

"그리고 테스트 결과 그림자가 발견되었습니다.
그리고 외과 의사가 잘라냈어요.
그 사람은 살았습니다. 무력하고 자제할 수 없었습니다.
결코 깨어나지 않을 암에서."

가슴 통증이 있는
첫 번째 트로포닌이 normal,
집으로 보내졌고
그 전에 사망한 남자에 대해 들어본 적이 있습니까? 아침인가요?

트로포닌 타이밍 문제

EMERGENCY DEPARTMENTS WORLDWIDE

트로포닌은 심장 마비 진단의 표준입니다. 하지만 3-6 hours to rise after myocardial injury.

A patient arrives one hour after chest pain begins. Troponin is tested: normal. "You're fine. Go home."

심장이 죽어가고 있었습니다. 단백질은 아직 누출되지 않았습니다.

Studies show 2-5% of MI patients sent home from ED die within 30 days.

Pope JH et al. N Engl J Med. 2000;342:1163-1170

Serial Testing Decision Tree

2-트로포닌 프로토콜

Chest Pain Patient

↓

First Troponin

↓

Elevated

↓

Treat as MI

Normal

↓

When Did Pain Start?

<6 hrs

Wait 3 hrsRepeat troponin

>6 hrs

Low riskConsider d/c

High-Sensitivity Troponin

~70%

Conventional troponin
sensitivity at 0 hrs

~95%

hs-Troponin
sensitivity at 0 hrs

99%

hs-Troponin
at 3 hrs serial

THE TRADE-OFF

High-sensitivity troponin catches more heart attacks early. But it also has more false positives—elevated in kidney disease, heart failure, sepsis, and marathon runners.

"그리고 검사 결과는 '정상'으로 나타났습니다.
심장이 막 죽기 시작했습니다.
그리고 환자는 안심하세요,
and went home to finish dying."

감도는 테스트를 설명합니다.
특이성은 테스트를 설명합니다.

그런데 환자가 이렇게 묻는다.
"I tested positive. What are MY chances?"

Likelihood Ratios

POSITIVE LIKELIHOOD RATIO

LR+ = Sensitivity / (1 - Specificity)

How much more likely is a + result in sick vs healthy?

NEGATIVE LIKELIHOOD RATIO

LR- = (1 - Sensitivity) / Specificity

How much more likely is a - result in sick vs healthy?

페이건 노모그램

사전 테스트에서 사후 테스트 확률까지

Pre-Test
Probability

99%

50%

20%

5%

1%

Likelihood
Ratio

100

10

1

0.1

0.01

Post-Test
Probability

99%

80%

50%

20%

1%

Draw a line from pre-test through LR to find post-test probability

Interpreting Likelihood Ratios

이 테스트는 얼마나 강력한가요?

LR+ Value?

LR+ > 10Strong rule-in

5-10Moderate

2-5Weak

1-2Useless

LR- Value?

< 0.1Strong rule-out

0.1-0.2Moderate

0.2-0.5Weak

0.5-1Useless

"민감함은 아픈 사람을 말해준다.
특이성은 우물을 말해줍니다.
But the likelihood ratio answers:
이 결과는 이 환자에게 무엇을 의미합니까?"

마을에서 열병에 걸린 아이를 본 적이 없나요?
그 급속한 테스트는 말했다 negative,
and the Plasmodium 그게 계속 늘어나는 거야?

말라리아 RDT 문제

SUB-SAHARAN AFRICA

Malaria kills 600,000 people yearly, mostly children under 5.

Rapid Diagnostic Tests were meant to guide treatment in remote areas without microscopes or laboratories.

But when parasitemia is low—RDT는 사례를 놓쳤습니다. And when P. falciparum HRP2 유전자를 삭제합니다. the RDT sees nothing at all.

WHO. Malaria RDT Performance. 2022

임상 결정 트리

Child with Fever in Malaria-Endemic Area

Febrile Child

↓

Perform RDT

↓

RDT Positive

↓

말라리아 치료

RDT Negative

↓

Clinical Suspicion?

High

Treat Anywayor Microscopy

Low

Look forOther Cause

Sensitivity Varies by Parasitemia

95%

High parasitemia
(>200/μL)

75%

Low parasitemia
(100-200/μL)

50%

Very low
(<100/μL)

임상 수업

A negative RDT does not rule out malaria in endemic areas. Clinical judgment must override the test when suspicion is high.

"그리고 검사 결과는 '음성'으로 나왔고
그리고 아이는 집으로 보내졌고
어둠 속에서 기생충이 번식했고
아침이 되자 아이는 wake."

역병이 창궐하던 해에
세계는 다음과 같은 테스트가 필요했습니다. fast.

그러나 빠른 것은 accurate.

코크란 평결

COVID-19 Rapid Antigen Tests (155 Studies)

Population	Sensitivity	Missed
Symptomatic	73%	27%
Asymptomatic	55%	45%
First 7 days	80%	20%

Dinnes J et al. Cochrane Database Syst Rev. 2022;7:CD013705

The False Security Decision Tree

Thanksgiving 2020: What Happened

Family Member Tests Negative

↓

Truly Negative?

55% if asymptomatic

True NegativeSafe to gather

45% if asymptomatic

FALSE NegativeInfectious!

↓

가족과 함께Grandparents infected

"그리고 검사 결과는 '음성'으로 나왔고
모이는 것과는 다릅니다. 포옹하고
겨울이 끝날 무렵
할아버지는 땅에 묻혔습니다."

검진
암을 발견했다는 소식을 들어보신 적이 있으신가요? would never kill,
그리고 다음과 같은 치료로 이어졌습니다. caused more harm than the disease?

과잉 진단 문제

3-4

Lives saved
per 10,000 screened

~15

Overdiagnosed
(treated unnecessarily)

~500

False alarms
(anxiety, biopsies)

THE QUESTION

3~4명의 생명을 구하기 위해 최대 15명의 여성이 자신에게 결코 해를 끼치지 않을 암에 대한 수술, 방사선 및 화학 요법을 받습니다.

이렇게 절충할 가치가 있습니까?

Patient Decision Aid: Mammography

50~69세 여성 10,000명을 10년 동안 검사

Deaths from breast cancer prevented

3-4 women

Women called back for false alarms

~500 women

Unnecessary biopsies

~200 women

자신에게 해를 끼치지 않는 암 치료를 받은 여성

~15 women

검사가 귀하에게 적합합니까?

The Screening Cascade Decision Tree

10회 이상 선별검사를 받은 10,000명의 여성 몇 년 동안

10,000 Women

↓

~1,000 RecalledAbnormal

↓

~500 False
Alarm

~500 Biopsy
~50 cancer

~9,000 Cleared

Of ~50 Cancers Found

~35 Would Kill3-4 saved

~15 Would Never KillOverdiagnosed

"그리고 테스트 결과 그림자가 발견되었습니다.
암이라고 불렀고
그 여자는 상처를 입고 화상을 입었습니다.
그녀의 나날을 결코 어둡게 하지 않을 그림자를 위해."

뇌의 플라크를 발견하지만
알려줄 수는 없는
스캔에 대해 들어보신 적이 있으신가요?
마음은 fade?

아밀로이드 역설

ALZHEIMER'S RESEARCH, 2010s-2020s

PET scans can now detect amyloid plaques—the hallmark of Alzheimer's.

But 30% of cognitively normal elderly have amyloid plaques. They may never develop dementia.

And 치매 환자의 10-20% have no amyloid.

검사는 플라크를 발견하지만 플라크는 질병이 아닙니다. 우리는 치매 환자가 아닌 대리자를 위해 테스트하고 있습니다. 결과.

Jack CR et al. Lancet Neurol. 2018;17:760-773

Surrogate vs. Outcome Decision Tree

실제로 테스트하는 것은 무엇입니까?

Diagnostic Test

↓

What Does It Detect?

Outcome itself

Direct Diagnosis예: 암 생검

↓

High clinical value

Surrogate marker

Indirect Signal예: 치매에 대한 아밀로이드

↓

Validated link?

YesUse cautiously

NoLimited value

"그리고 스캔에서 플라크가 발견되었습니다.
그리고 의사는 알츠하이머병이라고 명명했고
환자는 공포 속에 살았습니다.
of a forgetting that might never come."

모든 연구가 동일하게 생성되는 것은 아닙니다.

Some are biased.
Some are poorly designed.
일부 연구는 동일하지 않습니다. trusted.

밀과 왕겨를 어떻게 분리합니까?

QUADAS-2: 품질 체크리스트

Four Domains of Risk of Bias

1

Patient Selection

연속 또는 무작위 표본이 등록되었습니까? 케이스 제어 디자인을 피했습니까?

2

Index Test

참조 표준에 대한 지식 없이 테스트가 해석되었습니까? 임계값이 미리 지정되었나요?

3

Reference Standard

참조 표준이 상태를 올바르게 분류할 가능성이 있습니까? 맹목적으로 해석됐나요?

4

흐름과 타이밍

테스트 사이에 적절한 간격이 있었습니까? 모든 환자가 동일한 참조 표준을 받았습니까?

QUADAS-2 Decision Tree

이 연구를 신뢰해야 합니까?

DTA Study

↓

Check All 4 Domains

All Low Risk

High QualityTrust results

Some Unclear

Moderate주의해서 사용하세요

Any High Risk

Low Quality결과가 편향될 수 있음

DTA 연구의 일반적인 편견

!

Verification Bias

Only positive tests get the reference standard → inflates sensitivity

!

Spectrum Bias

연구 집단이 임상 현실과 다름 → 결과가 일반화되지 않음

!

Incorporation Bias

Index test is part of reference standard → artificially high accuracy

!

Review Bias

Index test interpreted knowing reference result → inflates both metrics

"숫자를 믿기 전에,
ask: How were they gathered?
편향된 연구는 자신있게 말합니다.
but its confidence is a lie."

한 연구는 속일 수도 있습니다.
한 연구는 더 그럴듯할 수도 있습니다.

하지만 모이면 증거—
the truth becomes harder to hide.

Why DTA Meta-Analysis Is Different

THE PROBLEM

민감도와 특이도는 correlated. When one goes up, the other tends to go down.

치료 효과처럼 따로 모아둘 수는 없습니다. bivariate model.

SROC 곡선

Summary Receiver Operating Characteristic

Sensitivity

1 - Specificity (False Positive Rate)

Individual studies

Summary estimate

SROC 읽기

곡선은 무엇을 말해주는가?

SROC Curve Position

↓

Top-Left Corner

Excellent TestHigh sens + spec

Near Diagonal

Useless TestNo better than chance

Points Scattered

High HeterogeneityInvestigate sources

"한 연구는 속일 수 있습니다.
많은 연구, 함께 무게를 달아
진실의 경로를 추적하세요.
테스트가 실제로 수행할 수 있는 작업을 보여주는 SROC 곡선입니다."

하지만 연구를 한다면 어떨까요? disagree?

One says sensitivity is 95%.
Another says 60%.

어떤 진실을 믿습니까?

Sources of Heterogeneity

연구 결과가 일치하지 않는 이유

같은 테스트, 다른 결과?

ThresholdDifferent cutoffs

PopulationSeverity, age

SettingPrimary vs specialist

QualityBias, blinding

Measuring Disagreement: I²

I² < 25%

Low
Studies agree

I² 25-75%

Moderate
Some variation

I² > 75%

High
Major disagreement

THE WARNING

When I² > 75%, the pooled estimate may be meaningless. Explain the disagreement before averaging.

"연구 결과에 동의하지 않을 경우
반대 의견을 침묵시키지 마십시오.
Ask: Why do they see differently?
동의 불일치 자체가 교훈을 줍니다."

DTA 툴킷

필수 조치 및 사용 시기

The Checklist

✓

Was there a valid reference standard?

Gold standard applied to ALL patients?

✓

통역사의 눈이 멀었나요?

Test readers unaware of diagnosis?

✓

스펙트럼이 적절했습니까?

귀하의 모집단과 유사한 환자가 있습니까?

✓

임계값이 미리 지정되어 있습니까?

아니면 결과를 최대화하기 위해 선택되었습니까?

When Results Don't Match Suspicion

The Clinical Override Decision Tree

Test Negative, High Suspicion

↓

What Is the LR-?

LR- < 0.1

Strong rule-outAccept negative

LR- 0.1-0.5

Repeat testOr different test

LR- > 0.5

Trust judgmentTest is weak

Sequential Testing Decision Tree

When One Test Isn't Enough

Initial Screening Test

↓

Positive

↓

Confirmatory TestHigh specificity

↓

PositiveDiagnose

NegativeFalse alarm

Negative

↓

Likely negativeIf high sens screen

"Armed with sensitivity, specificity, likelihood,
SROC와 일치 정도
로 무장하여 테스트의 거짓말을 꿰뚫어 볼 수 있으며
진실 여부를 판단할 수 있습니다. "

환자 소식을 듣지 못하셨나요?
누가 받았는지 wrong blood,
시험이 틀려서가 아니라
but because no one performed it?

끝나지 않은 테스트

HOSPITALS WORLDWIDE

ABO blood typing is nearly 100% accurate when performed.

Yet transfusion reactions still kill—테스트 실패가 아니라 human failure:

• Wrong blood drawn from wrong patient
• 실험실에서 라벨이 전환됨
• Bedside check skipped in emergency

In the UK, 1 in 13,000 transfusions 엉뚱한 환자에게 가네요. 테스트가 작동했습니다. 시스템이 실패했습니다.

Bolton-Maggs PHB. Transfus Med. 2016;26:303-311

Test vs. System Decision Tree

Where Can Things Go Wrong?

Diagnostic Process

↓

Error Source?

Test itself

Analytical ErrorSens/Spec issue

↓

Better test needed

Pre-analytical

Wrong sampleID error

↓

System fix needed

Post-analytical

Wrong actionReporting error

↓

Process fix needed

"The perfect test means nothing
잘못된 혈액을 채취한 경우
잘못된 라벨이 적용되었습니다.
잘못된 가방이 걸려 있어요."

DTA 연구는 테스트 정확도를 측정합니다. 시스템 정확도를 측정하지 않습니다.

알고리즘을 본 적이 없나요?
그에게서 배운 biased data,
그리고 그 편견을 퍼트려라
to every patient it touched?

AI 진단 혁명

STANFORD & BEYOND, 2017-PRESENT

Deep learning algorithms now match dermatologists at detecting skin cancer.

하지만 훈련 데이터는 predominantly light skin. On dark skin, performance dropped significantly.

알고리즘은 패턴을 학습했을 뿐만 아니라 biases.

그리고 외부 검증 없이 배포했을 때 예상보다 성능이 나빴습니다. training population didn't match the clinical population.

Esteva A et al. Nature. 2017;542:115-118; Adamson AS. JAMA Dermatol. 2018

AI Validation Decision Tree

이 AI는 임상용으로 사용할 준비가 되어 있나요?

AI Diagnostic Tool

↓

Validation Type?

Internal only

High RiskOverfitting likely

↓

Not ready

External validation

BetterBut check population

↓

환자와 일치합니까?

YesConsider use

NoCaution

Prospective RCT

Gold StandardPatient outcomes

AI 보정: 숨겨진 문제

DISCRIMINATION VS. CALIBRATION

Discrimination (AUC/ROC): Can the AI rank patients by risk?

Calibration: When the AI says "80% risk," do 80% actually have disease?

많은 AI 도구에는 good AUC but poor calibration. 이것은 알고리즘 형식의 기본 비율 오류입니다.

AUC

Can it rank?
(usually reported)

CAL

Is probability accurate?
(often ignored)

"그리고 알고리즘은 데이터로부터 학습했으며
데이터는 편향되었고
편향은 모든 예측에 퍼졌습니다.
아무도 훈련에서 누락된 사람이 누구인지 묻지 않았습니다. 설정하시겠습니까?"

환자가 묻습니다. "Is my test positive?"

But what they mean is:
"제가 질병?"

이 격차를 어떻게 메울 수 있습니까?

Communication Scripts

SCRIPT 1: EXPLAINING A POSITIVE RESULT

"검사 결과가 양성으로 나타났습니다. 하지만 그것이 무엇을 의미하는지 설명하고 싶습니다."

"이 검사는 해당 질환이 있는 사람을 찾는 데 유용하지만 잘못된 경보도 있습니다."

"위험에 따라 결정됩니다. 요인에 따라 [X]% 진성 양성일 가능성이 있습니다."

"We'll do a confirmatory test to be certain before any treatment."

Communication Scripts

SCRIPT 2: EXPLAINING A NEGATIVE RESULT (HIGH SUSPICION)

"Your test came back negative, but I'm still concerned."

"이 검사는 특히 질병 초기에 사례를 놓칠 수 있습니다."

"증상을 고려해 볼 때, 며칠 후에 검사를 반복하거나 다른 검사를 시도해 보고 싶습니다. 테스트하세요."

"A negative test doesn't always mean you're clear—증상도 중요합니다."

Communication Decision Tree

테스트 결과를 설명하는 방법

Test Result

↓

Positive

↓

PPV?

>90%"Very likely true"

<90%"Need to confirm"

Negative

↓

NPV?

>95%"Very reassuring"

<95%"Still watch symptoms"

의사에게 물어볼 질문

1

"이 테스트는 얼마나 정확합니까?"

민감도와 특이도를 간단하게 물어보세요. 언어

2

"결과가 잘못되면 어떻게 되나요?"

오탐과 부정의 결과 이해

3

"What happens next?"

Will there be a confirmatory test? Repeat test? Treatment?

4

"What if I don't get tested at all?"

테스트와 테스트하지 않음의 장단점 이해

"테스트는 다음 언어로 말합니다.
환자는 두려움과 희망 속에서 듣습니다.
치유자의 임무는 번역입니다.
통계와 영혼 사이의 격차를 해소하는 것입니다."

A test may be accurate.
But is it worth it?

What does it cost—in money,
in anxiety, in harm?

시험 치료 기준

When Is Testing Worthwhile?

Pre-Test Probability

↓

Very Low

Below Test ThresholdDon't test, reassure

Intermediate

Testing ZoneTest will change management

Very High

Above Treat ThresholdDon't test, treat

THE PRINCIPLE

Test only when the result will 수행 방법 변경. If you'd treat regardless, or not treat regardless—why test?

등급 증거 품질

DTA 증거 등급

⊕⊕⊕⊕

HIGH

여러 고품질 연구, 일관된 결과, 직접 적용 가능

⊕⊕⊕○

MODERATE

Some limitations in study quality, consistency, or applicability

⊕⊕○○

LOW

Serious limitations—may need to downgrade recommendations

⊕○○○

VERY LOW

Very serious limitations—evidence uncertain

Cost-Consequence Analysis

Example: Universal vs. Targeted Screening

Cost per case detected (universal)

$50,000

Cost per case detected (high-risk only)

$5,000

Cases missed by targeted approach

~10%

False positives avoided by targeted

~90%

어떤 접근 방식이 귀하의 모집단에 적합합니까?

"A test is not just accurate or inaccurate.
It has costs—in money, in worry, in harm.
현명한 임상의는 모든 것을 평가합니다.
그리고 테스트가 환자에게 도움이 될 때만 테스트합니다."

SROC 곡선은 where 테스트 수행

But how certain are we?
얼마나 vary in practice?

Confidence vs. Prediction Regions

Two Types of Uncertainty

95% CI (summary estimate)

95% 예측을 보여줍니다(향후 연구).

What Each Region Tells You

CI

Confidence Region (smaller ellipse)

우리가 95% 확신하는 곳에 true average 민감도/특이성이 있습니다. 요약 추정치에 대한 불확실성

PI

Prediction Region (larger ellipse)

Where we expect 95% of future studies 이 감소합니다. 연구 간의 이질성을 설명합니다.

CLINICAL IMPLICATION

예측 영역이 큰 경우 테스트는 설정에서 평균이 제안하는 것과 매우 다르게 수행될 수 있습니다. Wide prediction = high heterogeneity = investigate sources.

Bivariate Model Interpretation

메타 분석 결과 읽기

Summary Sens/Spec

↓

Check Regions

CI narrow, PI narrow

Consistent평균을 신뢰하십시오

CI narrow, PI wide

Heterogeneous평균은 그렇지 않을 수 있습니다. 적용

CI wide

Uncertain더 많은 연구가 필요함

"신뢰 영역은 다음을 알려줍니다. 얼마나 확실합니까?
예측 영역은 다음을 알려줍니다. 얼마나 달라질까요?
Both questions matter—
내일 사용하는 테스트는 그렇지 않을 수도 있습니다. 평균."

References

Key Sources

Carreyrou J. Bad Blood. Knopf, 2018. [Theranos]
CDC. MMWR. 1987;36(49):833-840. [HIV blood supply]
Herbst AL et al. N Engl J Med. 1971;284:878-881. [DES]
Moyer VA. Ann Intern Med. 2012;157:120-134. [PSA]
Pope JH et al. N Engl J Med. 2000;342:1163-1170. [Troponin]
Steingart KR et al. Cochrane 2014;1:CD009593. [GeneXpert]
Dinnes J et al. Cochrane 2022;7:CD013705. [COVID RAT]
UK Panel. Lancet. 2012;380:1778-1786. [Mammography]
Jack CR et al. Lancet Neurol. 2018;17:760-773. [Amyloid]
WHO. Malaria RDT Performance. 2022.
Reitsma JB et al. J Clin Epidemiol. 2005;58:982-990. [Bivariate]
Whiting PF et al. Ann Intern Med. 2011;155:529-536. [QUADAS-2]
Bolton-Maggs PHB. Transfus Med. 2016;26:303-311.

테스트는 민감도가 99%이고 구체적이 99%입니다. 질병 유병률은 1/1000입니다. 환자가 양성 반응을 보였습니다. 그 사람이 질병에 걸릴 확률은 얼마나 됩니까?

99%

90%

About 9%

50%

What does "SnNout" mean?

A highly Sensitive test, when Negative, rules OUT disease

A highly Specific test, when Negative, rules OUT disease

Sensitivity should be used for screening

Specificity should be above 90%

검사에도 불구하고 혈액 공급이 HIV에 오염된 이유는 무엇입니까?

The tests had low specificity

Tests had a window period with zero sensitivity in early infection

검사가 제대로 수행되지 않았습니다

검사 비용이 너무 많이 들었습니다

어떤 QUADAS-2 도메인이 테스트가 해석되었는지 여부를 알지 못한 채 평가합니다. 진단?

Patient Selection

Index Test

Reference Standard

흐름과 타이밍

✔

Course Complete

"이제 네 가지 결과,
테스트의 두 가지 장점
기본의 오류
증거를 모으는 기술
그리고 진실을 숨기는 편견

다음 테스트가 당신에게 놓여 있을 때—
알게 될 것입니다."