当测试谎言时：终极 DTA 课程 (V3)

====================== 第 1 单元：欺诈 (Theranos) ====================

你没听过这个故事吗？女人
who promised to 用一滴血改变世界,
who raised billions on a test that never worked?

Palo Alto, 2003

STANFORD UNIVERSITY

一名十九岁的女孩怀着一个愿景辍学：用一滴血进行数百次血液测试。

Investors believed. Walgreens believed. The Pentagon believed.

They gave her $9 billion.

但测试给出了错误的结果。患者被告知他们感染了艾滋病毒，但实际上他们并没有感染。当患者 dying.

Carreyrou J. Bad Blood. 2018

欺骗决策树

What Theranos Did vs. What Should Happen

New Diagnostic Test

↓

SHOULD DO

Validate Against Gold Standard

↓

Publish TP/FP/FN/TN

↓

FDA Approval

THERANOS DID

Skip Validation

↓

Hide Failures

↓

Harm Patients

时，患者被告知他们的血液是正常的，并且测试撒了谎，
并且谎言是确定无疑的，
并且没有人要求 2×2桌子。”

这就是我们研究诊断测试准确性的原因。

======================模块 2：四个结果====================

When a test speaks,
只有 four possible truths.

两个是祝福。其中两个是诅咒。

结果树

Every Test Result Has a Reality Behind It

Patient Tested

↓

真相是什么？

Has Disease

D+

↓

TPTest +

FNTest -

No Disease

D-

↓

FPTest +

TNTest -

神圣的 2×2 桌子

HIV Rapid Test Example (Real Data)

	HIV+	HIV-	Total
Test +	98	3	101
Test -	2	895	897
Total	100	898	998

从此表中得出所有真相

Sensitivity = 98/100 = 98%
Specificity = 895/898 = 99.7%

"Two outcomes save. Two outcomes harm.
TP， TN：测试说的是真。
FP、FN：测试说谎了。
Know them by name, for they determine fate."

====================== 模块 3：HIV 窗口期 ====================

你没听说过那条血吗？进行了测试，
found clean,
并给予数千人——
while death swam within it?

血液供应危机，1985年

UNITED STATES

When HIV testing began, doctors celebrated: they could now screen the blood supply.

但是测试发生了 window period——感染后几周，病毒存在，但对 undetectable.

血液进行了测试。血液呈“阴性”。输血了。

8,000-12,000 Americans 在更好的测试关闭窗口之前通过输血被感染。

CDC. MMWR. 1987;36(49):833-840

The Window Period Decision Tree

Why False Negatives Are Deadly

Person Recently Infected

↓

Time Since Infection?

< 2 weeks

Test NEGATIVEVirus present!

↓

Blood DonatedOthers infected

> 4 weeks

Test POSITIVECorrectly detected

↓

Blood DiscardedSupply safe

敏感性随时间变化

Day 1-7
Eclipse period

~50%

Day 14
Seroconversion

~95%

Day 21
Most detected

99.9%

Day 45+
Window closed

THE LESSON

敏感性不固定。 It depends on when you test. A "99% sensitive" test may be 0% sensitive in early infection.

”测试说“干净”，
因为病毒还没有露面。
血液被共享，
感染传播到了无辜者。”

====================== 模块 4：DES TRAGEDY ====================

您没有听说过给母亲服用的药丸
to protect their pregnancies,
that planted cancer in their daughters
twenty years before it bloomed?

DES 悲剧，1938-1971

UNITED STATES & EUROPE

Diethylstilbestrol (DES) was given to millions of pregnant women to prevent miscarriage.

No proper clinical trial was ever conducted. Doctors assumed it worked because it seemed reasonable.

Decades later, their daughters developed a rare cancer: clear cell adenocarcinoma of the vagina. A cancer so rare it was a diagnostic signal in itself.

5-10 million women 的危害已经暴露出来。

Herbst AL et al. N Engl J Med. 1971;284:878-881

验证决策树

What Should Have Happened

New Medical Intervention

↓

是否经过了正确测试？

YES

Randomized Trial

↓

Long-term Follow-up

↓

Know True Effects好处和危害

NO (DES)

Assumption Only

↓

Widespread Use

↓

Hidden HarmDiscovered too late

诊断信号

稀有性成为证据

阴道透明细胞腺癌在年轻女性中非常罕见，以至于 7 cases in one hospital triggered an investigation.

簇本身就是诊断信号测试：
Sensitivity to DES exposure: nearly 100%
如果您在这个年龄患有这种癌症，那么您几乎肯定已经暴露了。

1:1000

Risk of clear cell
cancer in DES daughters

5-10M

Women exposed
worldwide

“母亲们满怀希望地服用了避孕药，
女儿们在阴影中成长，
二十年后，癌症绽放—
a diagnosis that indicted a generation of medicine."

==================== 模块 5：灵敏度和特异性 ====================

A test has two virtues and two vices.

Sensitivity：它能找到病人吗？

Specificity：它能保护健康人吗？

灵敏度：猎人

THE FORMULA

Sensitivity = TP / (TP + FN)

"Of all the sick, how many did we catch?"

Worked Example: COVID PCR Test

Given: 200 infected patients tested

TP = 196 (correctly positive), FN = 4 (missed)

Sensitivity = 196 / (196 + 4) = 196/200 = 98%

Interpretation: Test catches 98 of every 100 infected people

特异性：守护者

THE FORMULA

Specificity = TN / (TN + FP)

"Of all the healthy, how many did we spare?"

Worked Example: Same COVID PCR Test

Given: 1000 uninfected people tested

TN = 999 (correctly negative), FP = 1 (false alarm)

Specificity = 999 / (999 + 1) = 999/1000 = 99.9%

Interpretation: Test correctly clears 999 of every 1000 healthy people

记忆法则

When to Use Which Test

你需要什么？

RULE OUT disease

Use HIGH SENSITIVITY

↓

SnNoutSensitive Negative = OUT

RULE IN disease

Use HIGH SPECIFICITY

↓

SpPinSpecific Positive = IN

“敏感会传染疾病。
特异性可以避免井井有条。
But no test masters both perfectly—
这是我们所承受的负担。”

====================== 模块 6：基本速率谬误====================

你没见过医生吗
who saw 99% accurate
and believed a positive result meant 99% certainty?

这是医学上最致命的错误。

基本利率谬误

THE PUZZLE

A disease affects 1 in 1000 people.
测试的敏感性为 99%，特异性为 99%。
A patient tests positive.

他们患有这种疾病的概率是多少？

Most doctors say ~99%. 真正的答案大约是9%。

数学揭晓

Testing 100,000 People (Prevalence 1/1000)

Step 1: 100 have disease, 99,900 healthy

Step 2: Of 100 sick: 99 test positive (TP), 1 negative (FN)

Step 3: Of 99,900 healthy: 999 test positive (FP), 98,901 negative (TN)

Step 4: Total positives = 99 + 999 = 1,098

PPV = TP / All Positives = 99 / 1,098 = 9%

91% 的阳性结果是假阳性！

Interactive Base Rate Calculator

See How Prevalence Changes PPV

Prevalence:

Sensitivity:

99%

Specificity:

99%

Positive Predictive Value (PPV)

91% 的阳性结果是误报

流行率决策树

Same Test, Different Settings

Test: 99% Sens, 99% Spec

↓

Where Is Testing Done?

General Pop
0.1%

PPV = 9%91% false +

High-Risk
10%

PPV = 92%8% false +

Confirmatory
50%

PPV = 99%1% false +

“医生说‘99%准确’，
病人听到“99%确定”
两人都被骗了——
因为他们忘了问：这种疾病有多罕见？”

您有没有听说过被称为
that could find TB in two hours,
但错过了 revolutionary—
GeneXpert 故事的机器 drug-resistant strains?

，南方非洲

CAPE TOWN, 2010

一个世纪以来，结核病诊断需要培养细菌数周。然后是 GeneXpert： 2 hours.

South Africa deployed it nationwide. The WHO endorsed it.

的结果，但在 low bacterial loads—often HIV co-infected— sensitivity dropped to 67%. One in three cases missed.

患者中，为了检测利福平耐药性，它错过了 5% 耐药病例。这些患者接受了错误的治疗。耐药结核病传播。

Steingart KR et al. Cochrane Database Syst Rev. 2014;1:CD009593

TB Diagnosis Decision Tree

当 GeneXpert 不够时

Suspected TB Patient

↓

GeneXpert Test

↓

Positive

↓

Rifampicin?

SensitiveStandard Tx

ResistantMDR-TB Tx

Negative

↓

HIV+ or High Suspicion?

YesCulture needed

NoLikely negative

Sensitivity by Patient Type

98%

Smear-positive
(high bacterial load)

67%

Smear-negative
(low bacterial load)

61%

HIV co-infected
(immune suppressed)

THE LESSON

临床试验中的测试敏感性可能与您的患者的敏感性不匹配。 了解您的人群。

”机器说“阴性”，
医生相信了机器，
病人带着肺结核回家了，
咳嗽阻力进入了世界。”

==================== 模块 8：PSA 争议====================

你没听说过男性测试吗
发现了癌症 never kill,
并导致治疗 destroyed lives?

PSA 筛查悲剧

UNITED STATES, 1990s-2010s

PSA (Prostate-Specific Antigen) could detect prostate cancer early.

医生对数百万男性进行了筛查。发现了癌症。前列腺被切除。

但其中许多“癌症”永远不会引起症状。手术造成 阳痿和失禁 in men who would have died of old age, not cancer.

Moyer VA. Ann Intern Med. 2012;157:120-134

伤害的数字

生命被拯救
prostate cancer
per 1000 screened

30-40

Men made impotent
or incontinent
per 1000 screened

100+

False positives
(biopsies, anxiety)
per 1000 screened

THE REVERSAL

In 2012, the US Preventive Services Task Force recommended against 常规 PSA 筛查。测试发现了太多不需要发现的东西。

Patient Decision Aid: PSA Screening

如果对 1,000 名 55-69 岁的男性进行 13 年筛查

Deaths from prostate cancer prevented

1-2 men

Men who will have false positive requiring biopsy

100-120 men

被诊断患有永远不会伤害他们的癌症的男性

20-50 men

Men left impotent or incontinent from treatment

30-40 men

您可以接受这种权衡吗？

“测试发现了影子，
然后外科医生切开，
那个人还活着——无能、大小便失禁——
患有永远不会醒来的癌症。”

====================== 第 9 单元：肌钙蛋白和心脏病 ====================

您没有听说过那个有胸部的男人吗疼痛
其第一个肌钙蛋白是 normal,
被送回家-
并在早上之前死亡？

肌钙蛋白计时问题

EMERGENCY DEPARTMENTS WORLDWIDE

肌钙蛋白是心脏病诊断的金标准。但需要 3-6 hours to rise after myocardial injury.

A patient arrives one hour after chest pain begins. Troponin is tested: normal. "You're fine. Go home."

心脏快要死了。蛋白质还没有泄漏。

Studies show 2-5% of MI patients sent home from ED die within 30 days.

Pope JH et al. N Engl J Med. 2000;342:1163-1170

Serial Testing Decision Tree

二肌钙蛋白协议

Chest Pain Patient

↓

First Troponin

↓

Elevated

↓

Treat as MI

Normal

↓

When Did Pain Start?

<6 hrs

Wait 3 hrsRepeat troponin

>6 hrs

Low riskConsider d/c

High-Sensitivity Troponin

~70%

Conventional troponin
sensitivity at 0 hrs

~95%

hs-Troponin
sensitivity at 0 hrs

99%

hs-Troponin
at 3 hrs serial

THE TRADE-OFF

High-sensitivity troponin catches more heart attacks early. But it also has more false positives—elevated in kidney disease, heart failure, sepsis, and marathon runners.

“测试结果显示‘正常’，
因为心脏刚刚开始死亡。
病人是放心，
and went home to finish dying."

==================== 模块 10：似然比 ====================

灵敏度描述了测试。
特异性描述了测试。

但病人问：
"I tested positive. What are MY chances?"

Likelihood Ratios

POSITIVE LIKELIHOOD RATIO

LR+ = Sensitivity / (1 - Specificity)

How much more likely is a + result in sick vs healthy?

NEGATIVE LIKELIHOOD RATIO

LR- = (1 - Sensitivity) / Specificity

How much more likely is a - result in sick vs healthy?

费根列线图

从测试前到测试后的概率

Pre-Test
Probability

99%

50%

20%

Likelihood
Ratio

100

0.1

0.01

Post-Test
Probability

99%

80%

50%

20%

Draw a line from pre-test through LR to find post-test probability

Interpreting Likelihood Ratios

这个测试有多强大？

LR+ Value?

LR+ > 10Strong rule-in

5-10Moderate

2-5Weak

1-2Useless

LR- Value?

< 0.1Strong rule-out

0.1-0.2Moderate

0.2-0.5Weak

0.5-1Useless

“灵敏度告诉我们有病。
特异性告诉我们健康。
But the likelihood ratio answers:
什么这个结果对这位患者意味着什么吗？"

====================== 模块 11：疟疾 RDT ====================

您没见过村里发烧的孩子吗，
快速检测说 negative,
and the Plasmodium 不断繁殖？

疟疾RDT问题

SUB-SAHARAN AFRICA

Malaria kills 600,000 people yearly, mostly children under 5.

Rapid Diagnostic Tests were meant to guide treatment in remote areas without microscopes or laboratories.

But when parasitemia is low—RDT漏掉病例. And when P. falciparum 删除HRP2基因— the RDT sees nothing at all.

WHO. Malaria RDT Performance. 2022

临床决策树

Child with Fever in Malaria-Endemic Area

Febrile Child

↓

Perform RDT

↓

RDT Positive

↓

治疗疟疾

RDT Negative

↓

Clinical Suspicion?

High

Treat Anywayor Microscopy

Low

Look forOther Cause

Sensitivity Varies by Parasitemia

95%

High parasitemia
(>200/μL)

75%

Low parasitemia
(100-200/μL)

50%

Very low
(<100/μL)

临床教训

A negative RDT does not rule out malaria in endemic areas. Clinical judgment must override the test when suspicion is high.

“测试结果显示‘阴性’，
孩子被送回家，
寄生虫在体内繁殖。天黑了，
到了早上，孩子就醒不过来了。”

==================== 第 12 单元：新冠病毒快速测试 ====================

在瘟疫肆虐的那一年，
世界需要一个测试 fast.

但快速与 accurate.

Cochrane 判决

COVID-19 Rapid Antigen Tests (155 Studies)

Population	Sensitivity	Missed
Symptomatic	73%	27%
Asymptomatic	55%	45%
First 7 days	80%	20%

Dinnes J et al. Cochrane Database Syst Rev. 2022;7:CD013705

The False Security Decision Tree

Thanksgiving 2020: What Happened

Family Member Tests Negative

↓

Truly Negative?

55% if asymptomatic

True NegativeSafe to gather

45% if asymptomatic

FALSE NegativeInfectious!

↓

与家人聚集Grandparents infected

“测试结果显示‘阴性’，
和家人拥抱，
到冬天结束时，
祖父被埋葬了。”

你有没有听说过筛查
发现癌症 would never kill,
并导致治疗 caused more harm than the disease?

过度诊断问题

3-4

Lives saved
per 10,000 screened

~15

Overdiagnosed
(treated unnecessarily)

~500

False alarms
(anxiety, biopsies)

THE QUESTION

为了挽救 3-4 条生命，约 15 名女性接受了永远不会伤害她们的癌症手术、放疗和化疗。

这种权衡值得吗？

Patient Decision Aid: Mammography

如果对 10,000 名 50-69 岁的女性进行为期 10 年的筛查

Deaths from breast cancer prevented

3-4 women

Women called back for false alarms

~500 women

Unnecessary biopsies

~200 women

女性接受永远不会伤害他们的癌症治疗

~15 women

筛查适合您吗？

The Screening Cascade Decision Tree

10,000 名女性经过 10 年的筛查

10,000 Women

↓

~1,000 RecalledAbnormal

↓

~500 False
Alarm

~500 Biopsy
~50 cancer

~9,000 Cleared

Of ~50 Cancers Found

~35 Would Kill3-4 saved

~15 Would Never KillOverdiagnosed

“测试发现了影子，
并将其称为癌症，
而这位女士被割伤并被烧伤——
为了一个永远不会让她的日子变得黑暗的阴影。”

====================== 第 14 单元：阿尔茨海默氏淀粉样蛋白 ====================

您没有听说过扫描
发现大脑中的斑块，
但无法告诉您
大脑是否会 fade?

淀粉样蛋白悖论

ALZHEIMER'S RESEARCH, 2010s-2020s

PET scans can now detect amyloid plaques—the hallmark of Alzheimer's.

But 30% of cognitively normal elderly have amyloid plaques. They may never develop dementia.

And 10-20%的人患有痴呆 have no amyloid.

测试发现了斑块。但斑块不是疾病。 我们正在测试替代物，而不是结果。

Jack CR et al. Lancet Neurol. 2018;17:760-773

Surrogate vs. Outcome Decision Tree

我们真正测试的是什么？

Diagnostic Test

↓

What Does It Detect?

Outcome itself

Direct Diagnosis例如，活检癌症

↓

High clinical value

Surrogate marker

Indirect Signal例如，用于痴呆症的淀粉样蛋白

↓

Validated link?

YesUse cautiously

NoLimited value

“扫描发现了斑块，
医生将其命名为阿尔茨海默病，
患者居住在恐怖——
of a forgetting that might never come."

====================模块15：QUADAS-2质量====================

并不是所有的研究都是平等的。

Some are biased.
Some are poorly designed.
有些不应该 trusted.

我们如何将小麦与小麦分开箔条？

QUADAS-2：质量检查表

Four Domains of Risk of Bias

Patient Selection

是连续样本还是随机样本入组？是否避免了病例对照设计？

Index Test

是否在不了解参考标准的情况下解释了测试？阈值是否预先指定？

Reference Standard

参考标准是否可能正确分类病情？是否盲目解释？

流程和时间

测试之间是否有适当的间隔？所有患者都接受相同的参考标准吗？

QUADAS-2 Decision Tree

您应该相信这项研究吗？

DTA Study

↓

Check All 4 Domains

All Low Risk

High QualityTrust results

Some Unclear

Moderate谨慎使用

Any High Risk

Low Quality结果可能有偏差

DTA 研究中的常见偏差

Verification Bias

Only positive tests get the reference standard → inflates sensitivity

Spectrum Bias

研究人群与临床不同现实→结果不能一概而论

Incorporation Bias

Index test is part of reference standard → artificially high accuracy

Review Bias

Index test interpreted knowing reference result → inflates both metrics

“在您相信数字之前，
ask: How were they gathered?
一项有偏见的研究充满信心地说话—
but its confidence is a lie."

==================== 模块 16：元分析和 SROC ====================

一项研究可能会欺骗。
一项研究可能会让人满意。

但是当您收集 所有证据—
the truth becomes harder to hide.

Why DTA Meta-Analysis Is Different

THE PROBLEM

敏感性和特异性是 correlated. When one goes up, the other tends to go down.

您不能像治疗效果那样将它们分开汇总。您需要 bivariate model.

SROC曲线

Summary Receiver Operating Characteristic

Sensitivity

1 - Specificity (False Positive Rate)

Individual studies

Summary estimate

读取 SROC

曲线告诉您什么？

SROC Curve Position

↓

Top-Left Corner

Excellent TestHigh sens + spec

Near Diagonal

Useless TestNo better than chance

Points Scattered

High HeterogeneityInvestigate sources

“一项研究可能会欺骗。
许多研究，权衡一起
追踪真理之路——
揭示测试真正作用的SROC曲线。”

但是如果研究 disagree?

One says sensitivity is 95%.
Another says 60%.

你相信哪个真理？

Sources of Heterogeneity

为什么研究不同意

相同的测试，不同的结果？

ThresholdDifferent cutoffs

PopulationSeverity, age

SettingPrimary vs specialist

QualityBias, blinding

Measuring Disagreement: I²

I² < 25%

Low
Studies agree

I² 25-75%

Moderate
Some variation

I² > 75%

High
Major disagreement

THE WARNING

When I² > 75%, the pooled estimate may be meaningless. Explain the disagreement before averaging.

“当研究存在分歧时，
不要压制异议。
Ask: Why do they see differently?
分歧本身就说明了一切。”

==================== 模块 18：工具包====================

您的 DTA 工具包

基本措施以及何时使用它们

The Checklist

✓

Was there a valid reference standard?

Gold standard applied to ALL patients?

✓

口译员是否被蒙蔽了？

Test readers unaware of diagnosis?

✓

频谱是合适吗？

与您的人群相似的患者？

✓

阈值是否预先指定？

还是选择最大化结果？

When Results Don't Match Suspicion

The Clinical Override Decision Tree

Test Negative, High Suspicion

↓

What Is the LR-?

LR- < 0.1

Strong rule-outAccept negative

LR- 0.1-0.5

Repeat testOr different test

LR- > 0.5

Trust judgmentTest is weak

Sequential Testing Decision Tree

When One Test Isn't Enough

Initial Screening Test

↓

Positive

↓

Confirmatory TestHigh specificity

↓

PositiveDiagnose

NegativeFalse alarm

Negative

↓

Likely negativeIf high sens screen

"Armed with sensitivity, specificity, likelihood,
配备了 SROC 和一致性度量，
您可以通过测试的谎言 -
并自行判断其真实性。”

==================== 第 19 单元：输血错误 ====================

您是否听说过接受输血的患者
谁收到了 wrong blood,
不是因为测试错误，
but because no one performed it?

未完成的测试

HOSPITALS WORLDWIDE

ABO blood typing is nearly 100% accurate when performed.

Yet transfusion reactions still kill——不是因为测试失败，而是因为 human failure:

• Wrong blood drawn from wrong patient
•实验室中更换的标签
• Bedside check skipped in emergency

In the UK, 1 in 13,000 transfusions 给了错误的患者。测试有效。系统失败。

Bolton-Maggs PHB. Transfus Med. 2016;26:303-311

Test vs. System Decision Tree

Where Can Things Go Wrong?

Diagnostic Process

↓

Error Source?

Test itself

Analytical ErrorSens/Spec issue

↓

Better test needed

Pre-analytical

Wrong sampleID error

↓

System fix needed

Post-analytical

Wrong actionReporting error

↓

Process fix needed

"The perfect test means nothing
如果抽取了错误的血液，
贴上了错误的标签，
挂了错误的袋子。”

DTA 研究测量测试准确性。它们不测量系统准确性。

您没有看到从
学习并传播偏差的算法 biased data,
并传播偏差
to every patient it touched?

人工智能诊断革命

STANFORD & BEYOND, 2017-PRESENT

Deep learning algorithms now match dermatologists at detecting skin cancer.

但是训练数据是 predominantly light skin. On dark skin, performance dropped significantly.

算法学习了模式，但也学习了 biases.

在没有外部验证的情况下部署时，它的表现比预期更差，因为 training population didn't match the clinical population.

Esteva A et al. Nature. 2017;542:115-118; Adamson AS. JAMA Dermatol. 2018

AI Validation Decision Tree

这个AI准备好用于临床了吗？

AI Diagnostic Tool

↓

Validation Type?

Internal only

High RiskOverfitting likely

↓

Not ready

External validation

BetterBut check population

↓

匹配您的患者？

YesConsider use

NoCaution

Prospective RCT

Gold StandardPatient outcomes

AI校准：隐藏的问题

DISCRIMINATION VS. CALIBRATION

Discrimination (AUC/ROC): Can the AI rank patients by risk?

Calibration: When the AI says "80% risk," do 80% actually have disease?

许多AI工具都有 good AUC but poor calibration。这是算法形式的基本利率谬误。

AUC

Can it rank?
(usually reported)

CAL

Is probability accurate?
(often ignored)

“算法从数据中学习，
并且数据存在偏差，
并且偏差传播到每个预测 -
并且没有人问：训练集中缺少了谁？”

====================== 模块 21：患者沟通 ====================

患者问： "Is my test positive?"

But what they mean is:
“我有这种病吗？”

您如何弥合这一差距？

Communication Scripts

SCRIPT 1: EXPLAINING A POSITIVE RESULT

“您的测试结果呈阳性。但我想解释一下是什么意思是。"

"该测试可以很好地发现患有这种疾病的人，但它也有误报。"

"根据您的风险因素，大约有 [X]% 这是一个真正的阳性结果。"

"We'll do a confirmatory test to be certain before any treatment."

Communication Scripts

SCRIPT 2: EXPLAINING A NEGATIVE RESULT (HIGH SUSPICION)

"Your test came back negative, but I'm still concerned."

"该测试可能会漏掉病例，尤其是在早期疾病。”

“鉴于您的症状，我想在几天内重复测试，或者尝试不同的测试。”

"A negative test doesn't always mean you're clear—您的症状也很重要."

Communication Decision Tree

如何解释测试结果

Test Result

↓

Positive

↓

PPV?

>90%"Very likely true"

<90%"Need to confirm"

Negative

↓

NPV?

>95%"Very reassuring"

<95%"Still watch symptoms"

向您询问的问题医生

“此测试的准确度如何？”

用通俗易懂的语言询问敏感性和特异性

“如果结果错误怎么办？”

了解误报和漏报的后果

"What happens next?"

Will there be a confirmatory test? Repeat test? Treatment?

"What if I don't get tested at all?"

了解测试与不测试的权衡

“测试用数字说话。
患者听到恐惧和希望。
治疗者的任务是翻译——
弥合统计数据与数据之间的差距。 "

==================== 第 22 单元：成本效益和等级 ======================

A test may be accurate.
But is it worth it?

What does it cost—in money,
in anxiety, in harm?

测试治疗阈值

When Is Testing Worthwhile?

Pre-Test Probability

↓

Very Low

Below Test ThresholdDon't test, reassure

Intermediate

Testing ZoneTest will change management

Very High

Above Treat ThresholdDon't test, treat

THE PRINCIPLE

Test only when the result will 改变您的内容. If you'd treat regardless, or not treat regardless—why test?

GRADE 证据质量

对 DTA 证据进行分级

⊕⊕⊕⊕

HIGH

多项高质量研究，结果一致，可直接应用

⊕⊕⊕○

MODERATE

Some limitations in study quality, consistency, or applicability

⊕⊕○○

LOW

Serious limitations—may need to downgrade recommendations

⊕○○○

VERY LOW

Very serious limitations—evidence uncertain

Cost-Consequence Analysis

Example: Universal vs. Targeted Screening

Cost per case detected (universal)

$50,000

Cost per case detected (high-risk only)

$5,000

Cases missed by targeted approach

~10%

False positives avoided by targeted

~90%

哪种方法适合您的人群？

"A test is not just accurate or inaccurate.
It has costs—in money, in worry, in harm.
明智的临床医生会权衡所有因素其中 -
仅在测试为患者服务时进行测试。“

====================== 模块 23：高级 SROC ====================

SROC 曲线显示 where 测试执行。

But how certain are we?
它会达到多少 vary in practice?

Confidence vs. Prediction Regions

Two Types of Uncertainty

95% CI (summary estimate)

95% 预测（未来研究）

What Each Region Tells You

Confidence Region (smaller ellipse)

我们对 true average 的敏感性/特异性有 95% 的信心。总体估计存在不确定性。

Prediction Region (larger ellipse)

Where we expect 95% of future studies 下降。考虑研究之间的异质性。

CLINICAL IMPLICATION

如果预测区域很大，则测试在您的设置中的表现可能与平均值建议的非常不同。 Wide prediction = high heterogeneity = investigate sources.

Bivariate Model Interpretation

阅读元分析结果

Summary Sens/Spec

↓

Check Regions

CI narrow, PI narrow

Consistent相信平均值

CI narrow, PI wide

Heterogeneous平均值可能不应用

CI wide

Uncertain需要更多研究

“置信区域告诉您：我们有多大把握？
预测区域告诉您：变化有多大？
Both questions matter—
您明天使用的测试可能不是

==================== 第 24 模块：测验和参考 ====================

References

Key Sources

Carreyrou J. Bad Blood. Knopf, 2018. [Theranos]
CDC. MMWR. 1987;36(49):833-840. [HIV blood supply]
Herbst AL et al. N Engl J Med. 1971;284:878-881. [DES]
Moyer VA. Ann Intern Med. 2012;157:120-134. [PSA]
Pope JH et al. N Engl J Med. 2000;342:1163-1170. [Troponin]
Steingart KR et al. Cochrane 2014;1:CD009593. [GeneXpert]
Dinnes J et al. Cochrane 2022;7:CD013705. [COVID RAT]
UK Panel. Lancet. 2012;380:1778-1786. [Mammography]
Jack CR et al. Lancet Neurol. 2018;17:760-773. [Amyloid]
WHO. Malaria RDT Performance. 2022.
Reitsma JB et al. J Clin Epidemiol. 2005;58:982-990. [Bivariate]
Whiting PF et al. Ann Intern Med. 2011;155:529-536. [QUADAS-2]
Bolton-Maggs PHB. Transfus Med. 2016;26:303-311.

测试的敏感性为 99%，特异性为 99%。 1/1000。患者感染该疾病的概率是多少？

99%

90%

About 9%

50%

What does "SnNout" mean?

A highly Sensitive test, when Negative, rules OUT disease

A highly Specific test, when Negative, rules OUT disease

Sensitivity should be used for screening

Specificity should be above 90%

为什么尽管进行了检测，血液供应仍被 HIV 污染？

The tests had low specificity

Tests had a window period with zero sensitivity in early infection

检测未正确执行

检测太差了。昂贵的

哪个 QUADAS-2 域评估是否在不知道诊断的情况下解释了测试？

Patient Selection

Index Test

Reference Standard

流程和时间

✔

Course Complete

“现在你知道了四种结果，
测试的两个优点，
基本比率的谬误，
池化的艺术证据，
以及隐藏真相的偏见。

当下一个测试对你不利时——
你会知道的。"

1 / 4