who promised to 用一滴血改变世界,
who raised billions on a test that never worked?
No more needles. No more vials. No more waiting.
Investors believed. Walgreens believed. The Pentagon believed.
They gave her $9 billion.
测试是错误的。宝宝很健康。
But how many women, receiving the same news, made different decisions?
并且谎言是确定无疑的,
没有人质疑这些数字。”
这就是我们研究诊断测试准确性的原因。
只有 four possible truths.
两个是祝福。其中两个是诅咒。
Every Test Result Has a Reality Behind It
Test: Positive
Test: Positive
Test: Negative
Test: Negative
True Positive (TP)
Sick person correctly identified.
测试说出了真相。
False Positive (FP)
Healthy person wrongly alarmed.
测试撒谎了。
False Negative (FN)
Sick person wrongly reassured.
最致命的谎言。
True Negative (TN)
Healthy person correctly cleared.
测试说出了真相。
2x2 混淆矩阵
| Disease Present | Disease Absent | |
|---|---|---|
| Test Positive | TP True Positive |
FP False Positive |
| Test Negative | FN False Negative |
TN True Negative |
Know them by name.
TP, TN:测试说的是真。
FP、FN:测试撒谎了。”
Sensitivity asks: Can it find the sick?
Specificity asks: Can it spare the healthy?
High sensitivity = few false negatives = few missed cases.
High specificity = few false positives = few false alarms.
Lower the threshold to catch more sick people? You'll alarm more healthy people.
Raise the threshold to spare healthy people? You'll miss more sick people.
This is the threshold effect——诊断的跷跷板。
SnNout: Sensitive tests rule OUT
A highly sensitive test, when negative, rules out disease. If it didn't find it, it's probably not there.
SpPin: Specific tests rule IN
高度特异性的测试如果呈阳性,则可诊断疾病。如果它说你拥有它,那么你可能就拥有了。
SpPin: Specific Positive rules IN
特异性可以避免井井有条。
But no test masters both perfectly—
这是我们必须承担的重担。”
世界需要一项测试 快速找到感染者.
But what if the rapid test missed too many?
对于有症状的人:
Sensitivity: 73% (missed 27% of cases)
In people WITHOUT symptoms:
Sensitivity: 55% (missed 45% of cases)
近一半的无症状感染者被告知自己已经痊愈。
Thanksgiving Dinners
Families tested negative in the morning, gathered indoors, unknowingly infected grandparents
Workplace Outbreaks
Workers tested negative, came to work, infected colleagues in the break room
Hospital Transmission
Patients tested negative, admitted to wards, infected vulnerable patients
一家人齐聚一堂,
祖父拥抱了他的孙子们,
冬天结束时,他就走了。”
但病人问了一个不同的问题:
"I tested positive. What are my chances?"
您的患者的一种罕见疾病检测呈阳性(患病率为千分之一)。
Question: 他们实际上患有这种疾病的概率是多少?
大多数医生说95%。真正的答案? About 2%.
Specificity tells how many well it will spare.
But only the likelihood ratio answers:
什么这个结果对这位患者意味着什么吗?"
that found too much?
When does finding disease become causing harm?
Mammography could detect tumors too small to feel.
妇女们被告知: "Annual mammograms save lives."
But what if some of those "cancers" would never have killed?
这名妇女被诊断出来,接受了手术、放疗、化疗治疗—— 为了一种永远不会伤害她的疾病。
Independent UK Panel on Breast Cancer Screening. Lancet. 2012;380:1778-1786
来自乳腺癌
(treated unnecessarily)
(anxiety, biopsies)
这是一笔好交易吗? 答案取决于价值观,而不仅仅是数字。
并称之为疾病,
那女人被割伤、被烧伤、被毒死——
为了一个永远不会让她的日子变得黑暗的阴影。”
这就是过度诊断的问题。
但是当您收集 所有的研究,
当你权衡他们的证据时——
The truth becomes harder to hide.
More Precision
Combining studies gives narrower confidence intervals, reducing uncertainty
Detect Heterogeneity
Why do different studies give different answers? Setting? Population? Threshold?
Expose Publication Bias
负面研究是否被隐藏?漏斗图揭示了不对称性
Explore Thresholds
Build SROC curves to understand the sensitivity-specificity trade-off
They are correlated:当一个上升时,另一个趋于下降(阈值效应)。
The bivariate model 解释了这种相关性,给出了有效的汇总估计。
Reitsma JB et al. J Clin Epidemiol. 2005;58:982-990
ROC Space
曲线显示了权衡
Higher = better test
Diagonal line = useless test (random guessing)
The curve = 所有研究表现的总结
开始揭露真相。
SROC 曲线是证据路径——
showing what the test can truly do."
一项研究表明敏感性为 95%。
Another says 60%.
你相信哪个真理?
High heterogeneity means 这些研究正在测量不同的事物— or the test performs differently in different settings.
Threshold Differences
“阳性”结果的不同截止值(例如,糖尿病的不同 HbA1c 阈值)
Population Differences
Disease severity, age, comorbidities differ between studies
Setting Differences
Primary care vs. specialist clinic vs. emergency room
Quality Differences
Risk of bias, verification bias, spectrum bias
Studies agree
Some disagreement
Major disagreement
你不能平均苹果和橙子。你必须 explain why studies differ before pooling them.
不要压制异议。
Ask: Why do they see differently?
分歧本身就说明了一切。”
Sensitivity & Specificity
How well the test performs on sick vs. healthy people
Likelihood Ratios (LR+, LR-)
How much a result changes the probability of disease
Diagnostic Odds Ratio (DOR)
Single measure of test discrimination (DOR = LR+ / LR-)
SROC 曲线下面积 (AUC)
Overall test performance across all thresholds (0.5 = useless, 1.0 = perfect)
bivariate meta-analysis
DTA 审查
开放获取工具
Rutter & Gatsonis 2001 - HSROC model
Cochrane Handbook Ch. 10 - DTA methods
Was there a valid reference standard?
Gold standard test applied to all patients?
口译员是否被蒙蔽了?
Test readers unaware of diagnosis, and vice versa?
频谱是合适吗?
Patients similar to your clinical population?
阈值是否预先指定?
或者是为了最大化结果而选择的?
配备了 SROC 和一致性度量,
您可以通过测试的谎言 -
并自行判断其真实性。”
When a machine claims to see what no other machine can see,
没有人问: “给我看看证据”?
FDA found:
• Results varied by 146% between runs on the same sample
• Edison machines failed 87% of proficiency tests
• Zero 发表同行评审的验证研究
• 患者的样本呈阴性,结果却呈 HIV 阳性
Sources: FDA Warning Letter 2016; Carreyrou J. Bad Blood. 2018; CMS Inspection Reports.
你选择什么?
Face lawsuits
Harm patients
保护您的患者
Avoid Scandal
A $9 billion valuation became a criminal fraud conviction.
每家医院在签署前都要求验证数据
受到保护,免受谎言的侵害。
每家信任营销的医院
became complicit in harming patients.
缺乏证据并不是营销问题。
It is a patient safety emergency.
谁付出代价?
The test result comes in 15 minutes.
但如果结果是 15 minutes of false confidence?
Real-world performance (Cochrane 2022):
• Symptomatic individuals: 73% sensitivity (missed 27%)
• Asymptomatic individuals: 58% sensitivity (missed 42%)
• Early infection (days 0-3): ~50% sensitivity
近一半的无症状感染者被告知他们“已康复”。
Source: Dinnes J et al. Cochrane Database Syst Rev. 2022;7:CD013705
你选择什么?
School closure
Three hospitalizations
Teacher isolates
Outbreak prevented
It means: "not detected."
这两个短语的区别
is measured in lives.
is almost meaningless.
SnNout only works when sensitivity is HIGH.
Know your test's limits before trusting its verdict.
still cause harm?
如果发现癌症怎么办
would never have hurt you?
Sensitivity: ~85% | Specificity: ~90%
10 年来每年对 1,000 名女性进行筛查:
• 1 death prevented 来自乳腺癌
• 5 women overtreated 对于永远不会伤害他们的癌症
• 100-500 false alarms leading to biopsies, anxiety, repeat imaging
Overdiagnosis rate: 19-30% of screen-detected cancers
Source: Independent UK Panel on Breast Cancer Screening. Lancet. 2012;380:1778-1786
你选择什么?
肿瘤呈惰性(DCIS)
Would never have harmed her
了解好处和坏处
Autonomy preserved
A test can be accurate 并且仍然导致 harm.
When overdiagnosis exceeds lives saved,
we must ask: Is finding always helping?
可以超过受益 true positives.
Always weigh benefits against harms.
筛查并不总是可以节省开支。
is worse than missing it?
What if the treatment causes more suffering
than the disease ever would?
• 对高级癌症的敏感性: 21%
• Detects many indolent cancers that would never harm
Lower cutoff to 2.5 ng/mL:
• Sensitivity rises to: 40%
• But overdiagnosis doubles
Treatment consequences:
• 20-30% of men experience incontinence after prostatectomy
• 30-70% experience erectile dysfunction
Source: US Preventive Services Task Force. JAMA. 2018;319(18):1901-1913
您选择什么阈值?
Thousands of unnecessary
活检和治疗
But most missed are indolent
Fewer unnecessary treatments
Some preventable deaths
No overtreatment harm
Every threshold trades 特异性敏感性,
检测过度诊断.
选择不是医疗的。这是 ethical.
这取决于你愿意接受什么伤害。
It is a values problem.
Before choosing a cutoff, ask:
What is worse: missing disease or overtreating the healthy?
Different truths.
How can identical numbers
mean opposite things?
Sensitivity: ~80% | Specificity: ~95%
In high-prevalence setting (TB prevalence 10%):
• Positive Predictive Value: 85%
• A positive test usually means TB
In low-prevalence setting (TB prevalence 0.1%):
• Positive Predictive Value: 15%
• A positive test is usually a false positive
Source: Pai M et al. Lancet Infect Dis. 2014;14(8):765-773
你的结论是什么?
Patient infects family
诊断延误数月
Chest X-ray, sputum
Treat early if confirmed
PPV 和 NPV are properties of the population.
相同的结果意味着 different things
in different people.
A positive test in a high-risk patient means disease.
The same positive in a low-risk patient means probably nothing.
Context is everything.
Theranos: Demand Validation
No peer-reviewed data = no trust, regardless of marketing claims
COVID Rapid Tests: Know Sensitivity Limits
“未检测到”与“未感染”不同
Mammography: Weigh Benefits vs. Harms
Finding is not always helping; overdiagnosis causes real harm
PSA: The Threshold is a Values Choice
每个截断值都以敏感性换取特异性;没有“正确”的答案
TB Test: Context Determines Meaning
The same result means different things in different populations
本课程引用的主要来源
- Carreyrou J. Bad Blood: Secrets and Lies in a Silicon Valley Startup. Knopf, 2018.
- Dinnes J, et al. Rapid, point-of-care antigen tests for diagnosis of SARS-CoV-2 infection. Cochrane Database Syst Rev. 2022;7:CD013705.
- 英国乳腺癌筛查独立小组。乳腺癌筛查的好处和坏处。 Lancet. 2012;380:1778-1786.
- Reitsma JB 等人。敏感性和特异性的双变量分析在诊断评价中产生信息丰富的总结措施。 J Clin Epidemiol. 2005;58:982-990.
- Rutter CM, Gatsonis CA. A hierarchical regression approach to meta-analysis of diagnostic test accuracy evaluations. Stat Med. 2001;20:2865-2884.
- Deeks JJ, et al. The performance of tests of publication bias in systematic reviews of diagnostic test accuracy. J Clin Epidemiol. 2005;58:882-893.
- Macaskill P, et al. Cochrane Handbook for Systematic Reviews of Diagnostic Test Accuracy. Chapter 10. 2023.
- Higgins JPT, Thompson SG. Quantifying heterogeneity in a meta-analysis. Stat Med. 2002;21:1539-1558.
- US Food and Drug Administration. Warning Letter to Theranos Inc. 2016.
- US Preventive Services Task Force. Screening for Prostate Cancer. JAMA. 2018;319(18):1901-1913.
- Pai M, et al. Tuberculosis. Lancet Infect Dis. 2014;14(8):765-773.
测试的两个优点,
残酷的权衡阈值,
以及汇集证据的艺术。
当下一个测试对你不利时——
you will know how to see through it."
测试何时谎言——现在您知道了。