并非每个信号都是真实的。
模块0:开头
🎯 Learning Objectives
- 定义荟萃分析并解释其在证据合成中的作用
- 确定何时不应汇总研究
- 描述证据层次结构以及系统评价的位置
- Recognize that meta-analysis can mislead when done poorly
- 回顾本课程的七个原则
本课程的存在是因为
医学是错误的。
一次都没有。并不罕见。反复。这些方式杀死了那些相信证据可靠的病人。
What is Meta-Analysis?
一种统计方法,用于结合解决同一问题的多个独立研究的结果。
*When well conducted. Quality of conduct matters more than study design alone — as GRADE recognizes.
为什么要进行合并研究?
Increase Statistical Power
Individual studies may be too small to detect effects.
Improve Precision
Narrower confidence intervals around effect estimates.
Resolve Disagreement
当研究发生冲突时,合并可以澄清信号。
Explore Heterogeneity
Identify why effects differ across populations or settings.
But meta-analysis can also
MISLEAD
When done poorly, it amplifies bias rather than truth.
何时不合并
研究衡量的是根本不同的事物(苹果和苹果)橙子)
Extreme heterogeneity that cannot be explained
One study dominates all others (megastudy problem)
研究具有很高的偏倚风险,无法调整
汇集是一种特权,而不是权利。
The decision to combine must be defended.
证据的层次结构
Systematic Reviews & Meta-Analyses of RCTs
Randomized Controlled Trials
Cohort Studies
Case-Control Studies
Case Series / Expert Opinion
层次结构中的位置取决于方法质量,而不是研究类型
本课程通过
evidence reversals.
每个模块以一个医学如何出错的故事开始。然后我们学习可以防止伤害的方法。
七原则
这些短语将在您的整个旅程中返回:
1. “并非每个信号都是真实的。”
2. “方法保护患者免受我们的信任。”
3. "What was hidden in plain sight?"
4. “没有来源的数字不是一个“
5. “异质性是一条消息,而不是噪音。”
6. “缺乏证据并不等于不存在。”
7. "Certainty must be earned, not assumed."
Module 0 Quiz
1.为什么有时不应该在荟萃分析中汇集研究?
2。 RCT 的系统评价在证据层次中位于何处?
开始旅程。
模块 1:问题
并非每个信号都是真实的。
这个不是一个关于错误的故事。
这是一个关于确定性的故事。
模块 1:问题
🎯 Learning Objectives
- 制定一个有针对性的 PICO 问题以进行系统评价
- Distinguish surrogate outcomes from patient-important outcomes
- Explain why biological plausibility alone is insufficient evidence
- 描述 CAST 试验及其对循证医学的影响
- 应用原则:“并非每个明亮的迹象都是指导”
~9,000
excess deaths per year
From a treatment everyone believed worked.
这是一个关于我们如何相信以及我们如何错误的故事。
The Observation
Patients with frequent PVCs after MI had 2-5x higher mortality.
A massive clinical need. A clear target.
The Response
Antiarrhythmic drugs were developed, FDA approved,
and prescribed to ~200,000 patients per year.
这个故事中没有出现恶棍。
每个人都根据现有的最佳证据采取行动。
逻辑让每个人都相信
PVCs after MI predict sudden cardiac death
Antiarrhythmic drugs suppress PVCs
Suppressing PVCs should prevent sudden death
Antiarrhythmics save lives in post-MI patients
这条链是合乎逻辑的。这个结论感觉是不可避免的。
CAST: The Cardiac Arrhythmia Suppression Trial
Finally, someone asked: "Does suppressing PVCs actually save lives?"
结果:1989 年 4 月
数据安全监测委员会提前停止了试验。
| Outcome | Drug (n=755) | Placebo (n=743) |
|---|---|---|
| Arrhythmic deaths | 33 | 9 |
| All cardiac deaths | 43 | 16 |
| Total deaths | 56 | 22 |
| Death rate | 7.4% | 3.0% |
完美抑制心律失常的药物使死亡率增加了 150%。
人类成本
Before CAST, ~200,000 Americans per year received these drugs.
~9,000
excess deaths per year - possibly more
Vietnam War: ~6,000 US deaths/year • These drugs: ~9,000+ deaths/year
For every number, a name we will never know.
Look again.
逻辑 - 重新审视
PVCs after MI predict sudden cardiac death
Antiarrhythmic drugs suppress PVCs
Suppressing PVCs should prevent sudden death
Antiarrhythmics save lives in post-MI patients
抑制标记会修复结果的假设从未被测试过。
What Went Wrong: The Surrogate Trap
PVC 是受损组织的标志,而不是死亡原因
The drugs had proarrhythmic effects - triggering deadlier rhythms
替代者有所改善,但结果却恶化 - 分离的替代者
替代者没有撒谎。我们问了错误的问题。
PICO 框架
Every answerable clinical question has four components:
调查练习:CAST 之前的证据
您是 1988 年的心脏病专家。一名患者在 MI 中幸存下来,但频繁发生 PVC。观察文献很清楚...
| Study | 室性早搏患者 | Mortality Risk |
|---|---|---|
| Lown (1977) | High-grade PVCs | 2.4x higher |
| Bigger (1984) | >10 PVCs/hour | 3.1x higher |
| Mukharji (1984) | Complex PVCs | 4.8x higher |
信号很清楚。该机制是合理的。您会开抗心律失常药吗?
Before: Observational Logic
PVCs → Higher mortality
Drugs suppress PVCs
∴ Drugs should reduce mortality
After: CAST RCT (1989)
Death rate on drug: 7.4%
Death rate on placebo: 3.0%
RR = 2.5 (150% increase in deaths)
代理人有所改善。病人死了。这就是为什么我们问:“重要的结果是什么?”
证据综合的教训
生物学合理性并不是证据
A logical mechanism doesn't guarantee the expected effect.
Surrogate endpoints can mislead
Improving a biomarker doesn't prove improvement in outcomes.
随机试验提供了最强的因果证据
仅靠观察数据很少能证明
共识不是证据
200,000 个处方、FDA 批准和指南都是错误的。
This is why we do meta-analysis: to see past apparent truths.
如果您问的问题怎么办决定谁生谁死?
REAL DATA
1989 年,心脏病专家知道使用恩卡尼和氟卡尼可以实现 PVC 抑制 。替代终点看起来很完美:与安慰剂相比,药物通过 80%+. But CAST randomized 1,498 patients 抑制 PVC。试验提前停止: 56 deaths in the drug group vs 22 in placebo. Mortality increased 2.5-fold. An estimated ~9,000 excess American deaths per year 归因于这些药物。
What appears certain may be wrong.
What everyone believes may be false.
方法的存在,让患者不必为我们的信心付出代价。
这就是你在这里的原因。
Module 1 Quiz
1。抗心律失常逻辑中的根本错误是什么?
2。在 PICO 中,“O”代表什么以及为什么它很重要?
并非每个信号都是真实的。
方法保护患者免受我们的信任。
What was hidden in plain sight?
这是一个关于
observational evidence.
模块 2:协议
🎯 Learning Objectives
- Explain why protocol pre-registration prevents bias
- Identify key elements of a PROSPERO registration
- Distinguish healthy user bias from true treatment effects
- Describe why observational studies overestimated HRT benefits
- 应用原则:“方法保护患者免受我们的信任”信心”
30+
observational studies
All showing hormone replacement therapy protected postmenopausal women from heart disease.
证据似乎压倒性的。结论似乎是肯定的。
护士健康研究
122,000 nurses followed for decades. HRT users had 40-50% lower cardiovascular mortality.
Landmark study. Impeccable methodology. Wrong conclusion.
隐藏的偏见
Healthy User Bias: Women who chose HRT were healthier, wealthier, better educated
Compliance Bias: Women who took HRT consistently also took better care of themselves
Prescriber Bias: Doctors gave HRT to healthier women with fewer risk factors
治疗并不能保护他们。它们已经受到了保护。
WHI: The Women's Health Initiative
The largest randomized trial of HRT ever conducted.
结果:2002 年 7 月
Trial stopped early after 5.2 years. Harm exceeded benefits.
| Outcome | Hazard Ratio | Direction |
|---|---|---|
| Coronary heart disease | 1.29 | HARM |
| Stroke | 1.41 | HARM |
| Breast cancer | 1.26 | HARM |
| Pulmonary embolism | 2.13 | HARM |
The Lesson
PRE-SPECIFY
A protocol written before the search begins prevents fishing, prevents bias, prevents hindsight distortion.
如果治疗有效会怎样?一些?
REAL DATA
WHI showed HRT increased cardiovascular events overall. But later analyses revealed a critical pattern: women who started HRT within 10 years of menopause had REDUCED cardiovascular risk. Women starting 20+ years after menopause had INCREASED risk. The overall null/harm result hid a timing effect.
PROSPERO Registration
搜索前注册
PROSPERO: International prospective register of systematic reviews
锁定您的决定
PICO, search strategy, outcomes, analysis plan - all pre-specified
Document Amendments
允许更改,但必须透明且合理
Prevent Duplication
检查您的评论是否已存在开始
Module 2 Quiz
1。为什么护士健康研究显示 HRT 有益而 WHI 没有?
2. What is the primary purpose of PROSPERO registration?
预先指定不是
It is protection.
Against our own tendency to find what we expect.
方法保护患者免受我们的信任。
What was hidden in plain sight?
模块 3:搜索
What was hidden in plain sight?
这是一个关于
what they didn't publish.
模块 3:搜索
🎯 Learning Objectives
- Develop a comprehensive search strategy using PRESS guidelines
- Search multiple databases including grey literature sources
- Identify trial registries and regulatory databases (ClinicalTrials.gov, FDA)
- Explain how the rosiglitazone case exposed hidden cardiovascular harms
- 应用原则:“什么隐藏在显而易见的地方?”
$3.2B
annual sales at peak
文迪雅(罗格列酮)是世界上最重要的药物之一最畅销的糖尿病药物。
已发表的试验看起来令人放心。未发表的证据讲述了不同的故事。
已发表的证据(2007 年之前)
Published trials showed rosiglitazone effectively lowered HbA1c. Cardiovascular outcomes were rarely reported.
代理人看起来不错。但实际的心血管事件又如何呢?
Nissen's Discovery: May 2007
Dr. Steven Nissen 从 GSK 自己的网站获得了未发表的试验数据。
法律和解要求 GSK 在网上发布临床试验结果。 Nissen 和 Wolski 分析了 42 项试验 - 其中许多从未在期刊上发表。
数据在技术上是公开的。
No one had systematically searched for it.
荟萃分析结果
| Outcome | Odds Ratio | 95% CI |
|---|---|---|
| Myocardial Infarction | 1.43 | 1.03 - 1.98 |
| CV Death | 1.64 | 0.98 - 2.74 |
Published in NEJM. The FDA called an emergency advisory committee meeting.
The FDA Advisory Committee: July 2007
委员会存在分歧。有些人希望撤回它。一些人称荟萃分析存在缺陷。
但信号无法被忽视。
The Aftermath
Black box warning added for heart failure risk (2007)
Severe restrictions on prescribing in the US (2010)
Withdrawn 完全来自欧洲市场(2010)
FDA now requires cardiovascular outcome trials for all diabetes drugs
What a Comprehensive Search Requires
新闻清单
Peer Review of Electronic Search Strategies
研究问题的翻译
搜索是否反映了PICO元素?
布尔和邻近运算符
AND、OR、OR 使用不正确吗?
Subject Headings
MeSH/Emtree 术语是否合适并分解?
Text Words
Synonyms, spelling variants, truncation?
PRESS Checklist (continued)
Spelling, Syntax, Line Numbers
是否存在会导致检索的错误失败?
限制和过滤器
日期、语言、研究设计限制是否适当?
Peer-reviewed searches substantially improve retrieval of key studies.
PRESS guideline: McGowan et al., 2016
Database Translation
必须针对每个数据库调整相同的搜索:
"diabetes mellitus, type 2"[MeSH] OR "type 2 diabetes"[tiab]
'non insulin dependent diabetes mellitus'/exp OR 'type 2 diabetes':ti,ab
Subject headings, field tags, and operators differ between databases.
什么时候会发生什么您搜索 - 但什么也没找到?
REAL DATA
Governments stockpiled $9 billion 用于治疗大流行流感的奥司他韦(达菲)。科克伦合作组织试图审查证据。 77 clinical trials, full reports existed for only 20。罗氏拒绝共享 5 years的数据。当 BMJ 和 Cochrane 最终获得 over 160,000 pages of clinical study reports, they found: Tamiflu reduced symptoms by less than 1 day, with no evidence it prevented hospitalizations or complications.
If Nissen had searched only PubMed,
the signal would have remained hidden.
Comprehensive search is survival.
What was hidden in plain sight?
Module 3 Quiz
1。什么类型的证据来源揭示了罗格列酮心血管信号?
2. What does PRESS stand for?
What was hidden in plain sight?
模块 4:筛选
没有出处的数字不是数字。
这是一个关于
what they chose to report.
模块 4:筛选
🎯 Learning Objectives
- Apply PRISMA flow diagram to document study selection
- Implement dual-reviewer screening with conflict resolution
- 识别选择性结果报告和数据操作
- Calculate inter-rater reliability (Cohen's kappa)
- 应用原则:“没有出处的数字不是数字”
88,000
heart attacks attributed to Vioxx
A blockbuster drug. A hidden signal. A preventable catastrophe.
1999年至2004年间,数百万人服用了这个止痛药。有些人再也没有回家。
万络的崛起
罗非昔布 (Vioxx) 是一种 COX-2 选择性 NSAID。市场上宣称对胃部比传统止痛药更安全。
VIGOR 试验 (2000)
Vioxx Gastrointestinal Outcomes Research
What VIGOR Published
| GI Outcome | Vioxx | Naproxen |
|---|---|---|
| Confirmed GI events | 2.1 per 100 pt-yrs | 4.5 per 100 pt-yrs |
| Reduction | 54% fewer GI events | |
标题:万络对胃部更安全!
医生是这么被告知的。这是患者所相信的。
What VIGOR Buried
| CV Outcome | Vioxx | Naproxen |
|---|---|---|
| Myocardial Infarction | 20 events | 4 events |
| Relative Risk | 5x higher in Vioxx group | |
选择性报告
数据截止操作: 3 additional heart attacks occurred after the cutoff used in publication
Spin: CV信号被解释为萘普生具有心脏保护作用(没有证据)
Outcome switching: CV事件是预先指定的但没有强调
Internal knowledge: 默克公司的电子邮件表明他们了解该信号
APPROVe 试验(2004 年)
预防结直肠息肉的试验 - 为了安全起见提前停止。
Four years after VIGOR showed a 5x risk. Four years too late.
您是否考虑过当信号出现时会发生什么?隐藏在噪音中?
REAL DATA
Vioxx(罗非昔布)在 1999. By 2004, estimates suggest 88,000-140,000 excess heart attacks and 30,000-40,000 deaths. Merck's own VIGOR trial showed 5x cardiovascular risk in 2000—but it was dismissed as a "naproxen cardioprotective effect."
PRISMA 流程图
Every step of screening must be documented and transparent.
Dual Screening: Why Two Reviewers?
Reduces Selection Bias
One reviewer might unconsciously favor certain studies
Catches Errors
疲劳、误读和错误是不可避免的
Forces Explicit Criteria
Disagreements reveal ambiguity in inclusion rules
Typical agreement: κ = 0.6-0.8
Disagreements resolved by discussion or third reviewer
校准:试验阶段
Before screening thousands of records, reviewers should calibrate on a sample of 50-100 records.
Screen the same set independently
Compare decisions and discuss disagreements
Refine inclusion criteria until κ > 0.7
记录校准过程和任何规则更改
PRISMA 2020 Updates
PRISMA 2020 大幅修改了清单,扩展了合成方法、确定性评估和方案注册的报告。
If Vioxx's cardiovascular data had been screened by independent reviewers,
if all pre-specified outcomes had been required to be reported,
88,000 heart attacks might have been prevented.
没有出处的数字不是数字。
Module 4 Quiz
1。在 VIGOR 试验中,与萘普生相比,万络组发生 MI 的相对风险是多少?
2. Why is dual screening (two independent reviewers) important?
没有出处的数字不是数字。
模块 5:提取
没有出处的数字不是数字。
这是一个关于
从未存在过的数字。
模块 5:提取
🎯 Learning Objectives
- 设计具有出处字段的标准化数据提取表格
- Calculate effect sizes from various reported statistics (OR, RR, HR, SMD)
- Implement dual-extraction with discrepancy resolution
- 识别数据伪造和不当行为的危险信号
- Explain how the DECREASE fraud affected clinical guidelines
~10,000
possible excess deaths in Europe
根据基于捏造的临床试验的指南数据。
DECREASE 试验影响了全世界的围手术期护理。数据被发明了。
Don Poldermans: A Star Researcher
Professor at Erasmus Medical Center, Rotterdam. Author of over 500 papers. Lead author of ESC guidelines on perioperative cardiac care.
看似无懈可击的来源。直到有人查看数据。
DECREASE 试验:声明
| Trial | Finding | Impact |
|---|---|---|
| DECREASE-I (1999) | 90% reduction in cardiac death | Changed guidelines |
| DECREASE-IV (2009) | Beta-blockers safe in low-risk | Expanded recommendations |
Effect sizes were implausibly large.
90% reduction? Almost nothing in medicine works that well.
The Investigation: 2011
Erasmus MC investigated after whistleblower complaints
伪造的患者数据: Patients who didn't exist or weren't enrolled
No informed consent: Many "participants" never consented
Poldermans dismissed: From Erasmus MC in 2011
一系列危害
当 DECREASE 被从荟萃分析...
POISE 试验 (2008) 已显示出危害。它被驳回,因为它与 DECREASE 冲突。
为什么没有被捕获?
Trust in authority: Poldermans 是审查自己证据的指南作者
No data verification: 没有人要求提供个体患者数据
Publication prestige: Published in top journals, assumed valid
Implausible effects accepted: 90% reductions should raise suspicion
Data Extraction: Defense Against Fraud
Dual Extraction
Two extractors independently - catches transcription errors and forces scrutiny
Record Provenance
Table, page, paragraph - every number traceable to source
Verify Against Registry
ClinicalTrials.gov 结果与出版物 -差异是危险信号
Request IPD
Individual patient data reveals what aggregate summaries hide
Effect Size Calculation
在提取过程中,您可以根据报告的数据计算效应大小:
Odds Ratio, Risk Ratio, Risk Difference from 2x2 tables
均值差、平均值和标准差的标准化均值差
始终从最可靠的来源提取。
Prefer: ITT results > per-protocol > subgroups
Red Flags During Extraction
Implausible effect sizes: 80-90% reductions should prompt scrutiny
Baseline imbalances: “过于完美”匹配的组
Round numbers: "Exactly 50" or "exactly 100" patients per arm
Registry discrepancies: 已发布的 N 与已注册的 N
Effect Size Conversions
研究报告结果具有不同的指标。为了汇集它们,您通常需要转换:
| From | To | Formula |
|---|---|---|
| SMD (d) | log-OR | log-OR = d × π / √3 |
| log-OR | SMD (d) | d = log-OR × √3 / π |
| Correlation (r) | Fisher z | z = 0.5 × ln((1+r)/(1−r)) |
| OR | RR | RR = OR / (1 − P₀ + P₀ × OR) |
| OR | NNT | NNT = 1 / (P₀ − OR×P₀ / (1−P₀+OR×P₀)) |
P₀ = 对照组的基线风险。这些公式假设了近似条件;参见博伦斯坦等人。 (第 7 章)用于精确推导。
事件时间(生存)数据
Many trials report time-to-event outcomes using hazard ratios (HR). Pooling HRs in meta-analysis requires special handling:
log(HR) + SE 方法
从试验中提取 log(HR) 及其 SE。如果未报告,则从 CI 导出 SE:SE = (ln(上) − ln(下)) / (2 × 1.96)。使用标准反方差方法进行池化。
未报告 HR 时
存在根据 Kaplan-Meier 曲线重建 IPD 的方法(Guyot 等人,2012 年)或根据 p 值和事件计数估计 HR(Parmar 等人,1998 年)。总是更喜欢直接报告的调整后的心率(如果有)。
HR < 1 favors treatment; HR > 1 favors control. Do not convert HRs to ORs or RRs—they measure fundamentally different quantities.
如果您提取的数据从来都不是真实的怎么办?
REAL DATA
Joachim Boldt 是麻醉液管理领域最多产的研究人员。他的超过 180 篇出版物被撤回 ——医学史上最大的撤回案例之一。他伪造的数据表明羟乙基淀粉(HES)是安全的。包括他的研究在内的荟萃分析得出结论,HES 是无害的。当 Boldt 的研究被删除后, 合并效应逆转: HES increased kidney injury by 59% (RR 1.59, 95% CI 1.26-2.00) and mortality by ~9% (RR 1.09). An estimated thousands of patients received a harmful fluid based on fabricated evidence.
元分析中的每个数字
must trace back to a verifiable source.
没有出处的数字不是数字。
Fraudulent data can kill as surely as fraudulent drugs.
Module 5 Quiz
1。当 DECREASE 试验数据从 β 受体阻滞剂荟萃分析中删除时发生了什么?
2. Why should dual extraction be standard practice?
没有出处的数字不是数字。
模块 6:偏差
方法保护患者免受我们的信任。
这是一个关于
我们看不到的偏差。
模块 6:偏差
🎯 Learning Objectives
- Apply Risk of Bias 2.0 (RoB 2) to randomized trials
- 将 ROBINS-I 应用于非随机研究
- Assess all five RoB 2 domains (randomization, deviations, missing data, measurement, selection)
- Distinguish confounding by indication from true treatment effects
- Explain how BART revealed hidden harms of aprotinin
20+
上市多年
抑肽酶是减少手术的黄金标准出血。
然后有人进行了随机对照试验。事实并非如此。
The Hidden Bias: Confounding by Indication
Sicker patients got aprotinin: Surgeons used it in complex, high-risk cases
Survivors bias: Dead patients can't report complications
Publication bias: 阴性研究尚未发表
观察性研究无法将药物的效果与患者的基线风险区分开来。
BART:随机真相
Blood Conservation Using Antifibrinolytics in a Randomized Trial
| Outcome | Aprotinin | Alternatives |
|---|---|---|
| 30-day mortality | 6.0% | 3.9% |
| Relative Risk | 1.53 (53% increased death) | |
调查:评估偏差
您正在审查观察性研究。应用偏见风险思维:
| Question | Observational | BART (RCT) |
|---|---|---|
| Random allocation? | ❌ Surgeon choice | ✓ Yes |
| Baseline comparable? | ❌ Sicker got drug | ✓ Balanced |
| Blinding? | ❌ Open label | ✓ Double-blind |
Confounding by indication: 外科医生给病情最严重的患者注射了抑肽酶。观察性研究在测量生存偏差时将生存归因于药物。
Risk of Bias 2.0: The Five Domains
Randomization Process
与预期干预措施的偏差
结果数据缺失
结果测量
报告结果的选择
ROBINS-I:对于非随机研究
当RCT不可用时,使用ROBINS-I(非随机研究中的偏倚风险)干预)
Confounding
Baseline differences between groups
Selection of Participants
Exclusions related to intervention
Classification of Interventions
Misclassification of exposure status
与预期干预措施的偏差
Co-interventions, contamination
Missing Data
Differential loss to follow-up
Measurement of Outcomes
Ascertainment bias
Selection of Reported Result
Selective reporting
Ratings: Low / Moderate / Serious / Critical / No information
当 64 项研究结果一致但全部错误时会发生什么?
REAL DATA
抑肽酶用于心脏手术以减少出血, 20 years. 64 small randomized trials 表明它是安全有效的。荟萃分析证实了益处。然后是 BART trial (2008) randomized 2,331 patients: aprotinin vs. tranexamic acid vs. aminocaproic acid. Result: aprotinin increased mortality by 53% (RR 1.53,95% CI 1.06-2.22)。该试验 因伤害而提前停止。拜耳在几个月内将抑肽酶从市场上撤回。
Sixty-four small trials measured bleeding, not death.
One adequately powered trial revealed 53% increased mortality.
证据的数量不能替代质量和功效。
Module 6 Quiz
1. Why did 64 small trials miss aprotinin's harm?
方法保护患者免受我们的信任。
模块 7:综合
异质性是一条消息,而不是噪音。
Magnesium 争议:1991-1995
When pooling leads us astray.
模块 7:综合
🎯 Learning Objectives
- Calculate pooled effect sizes using fixed-effect and random-effects models
- Choose between DerSimonian-Laird and HKSJ estimators appropriately
- Interpret forest plots including weights, confidence intervals, and diamonds
- Explain why small-study effects can mislead meta-analyses
- 应用原则:“异质性是一条消息,而不是噪音”噪音”
The Year: 1991
“你站在希望与证据的十字路口......”
Heart disease kills more people worldwide than any other cause. In 1991, a new hope emerges: Could something as simple and cheap as intravenous magnesium save lives after myocardial infarction?
生物学原理是合理的:
Magnesium stabilizes cardiac membranes, prevents arrhythmias, and vasodilates coronary arteries.
LIMIT-2:里程碑式的试验
Leicester Intravenous Magnesium Intervention Trial, 1992
A cheap, safe intervention that could save 250,000 lives per year globally.
医学界震惊了。
The Meta-Analysis: 1993
Researchers pooled seven randomized trials of IV magnesium in MI:
| Trial | Year | N | Odds Ratio |
|---|---|---|---|
| Morton 1984 | 1984 | 40 | 0.10 |
| Rasmussen 1986 | 1986 | 273 | 0.35 |
| Smith 1986 | 1986 | 400 | 0.48 |
| Abraham 1987 | 1987 | 94 | 0.87 |
| Shechter 1990 | 1990 | 103 | 0.27 |
| Ceremuzynski 1989 | 1989 | 48 | 0.22 |
| LIMIT-2 | 1992 | 2,316 | 0.74 |
Investigation Exercise: The Meta-Analyst's Dilemma
你是 1993 年 Cochrane 审稿人。您被要求综合镁对心肌梗死的证据。七个试验的数据就摆在您面前。
您看到这个森林图中的模式了吗?
但是等等...您注意到有关试验规模的任何信息吗?
警告标志
What should have given us pause?
Small sample sizes: Six of seven trials had <500 patients
Extreme effects: OR of 0.10 (90% reduction) is implausible for any drug
All positive: 阴性试验在哪里?文件抽屉问题...
Funnel asymmetry: Small trials showed much larger effects than larger ones
漏斗图测试
在汇总之前,我们必须检查发布偏差。让我们检查一下漏斗图。
⚠️ Asymmetric Funnel
左侧的小试验簇(显示益处)。小的负面试验在哪里?
Egger's test p = 0.04 — statistically significant asymmetry.
年份:1995 年 — ISIS-4 报告
“然后真相就出来了……”
The Fourth International Study of Infarct Survival (ISIS-4) enrolled 58,050 patients across 1,086 hospitals in 31 countries.
前后:完整图片
看看当我们将大型试验添加到我们的森林图中时会发生什么...
BEFORE ISIS-4
7 small trials (N = 3,274)
OR = 0.44
Strong benefit signal
AFTER ISIS-4
8 trials (N = 61,324)
OR = 1.02
No effect
Why Did Small Trials Mislead?
Publication Bias
Small negative trials were never published—they sat in file drawers
Small-Study Effects
Smaller trials tend to show larger effects due to methodological weaknesses
Random High Bias
偶然,一些小试验达到了极端结果 - 并且这些结果被发表
Random-Effects Amplification
Random-effects models give more weight to small trials, amplifying bias
Fixed vs. Random Effects
Which model should you choose?
Assumes one true effect. Weights studies by inverse variance (precision). Large trials dominate.
Magnesium result: OR = 0.96 (p = 0.52)
Assumes distribution of effects. Gives more weight to small trials. Wider confidence intervals.
Magnesium result: OR = 0.59 (p = 0.01)
⚠️ 模型选择决定了结论!
随机效应并不能修复偏差;
镁的教训
1。在信任汇总估计之前,请检查发布偏差 。漏斗图和艾格检验是您的工具。
2. Be wary of small-study effects. If only small trials show benefit, wait for a large, well-conducted trial.
3. Model choice matters. 随机效应会放大有偏见的证据。考虑这两个模型并理解其含义。
4. One large trial can overturn many small ones. 这就是像 ISIS-4 这样的大型试验如此有价值的原因。
荟萃分析中的特殊研究设计
并非所有 RCT 都使用标准平行组设计。汇集结果时,有两种常见的替代方案需要特殊处理:
Cluster-Randomized Trials
随机分组(医院、学校),而不是个人。 design effect = 1 + (m−1) × ICC 减少了有效样本大小。在合并之前将 N 除以设计效果,或使用试验中调整后的 SE。忽略聚类会人为地缩小 CI。
Crossover Trials
每位患者都会接受两种治疗。配对设计可减少方差,但您需要 within-patient correlation (或配对分析 SE)才能正确进行池化。使用并行组SE是保守的;使用了错误的 N 名重复计数患者。
请参阅 Cochrane 手册 v6.4,第 23 章了解详细公式和实例。
如果结合研究的方式决定了治疗是否可以挽救生命或是否有效,该怎么办?无用?
REAL DATA
早产儿的早期表面活性剂得到了 6 small trials showing reduced mortality (RR 0.84). A fixed-effect meta-analysis confirmed benefit (p=0.04). But a random-effects model showed no significance (p=0.12) — the confidence interval crossed 1.0. Later, SUPPORT (2010) and VON (2012), two large pragmatic trials with ~2,000 neonates combined, found no benefit 早期与晚期表面活性剂的支持。基于小型试验和错误的模型,临床实践已发生变化。
Module 7 Quiz
1。为什么镁荟萃分析显示 ISIS-4 没有发现的好处?
2. What warning sign should have alerted reviewers to potential bias?
3. When publication bias is suspected, which model may amplify the bias?
Small trials can show false signals.
Large trials anchor the truth.
异质性是一条消息,而不是噪音。
异质性是一条消息,而不是噪音。
模块 8:异质性
异质性是一条消息,而不是噪音。
ACCORD: 2008
当平均值掩盖真相时。
模块 8:异质性
🎯 Learning Objectives
- 计算和解释 I²、τ² 和预测区间
- Apply ICEMAN criteria to assess subgroup credibility
- Distinguish between clinical, methodological, and statistical heterogeneity
- Conduct and interpret leave-one-out sensitivity analyses
- Explain how ACCORD revealed differential effects across subgroups
The Year: 2008
“您即将见证历史上最令人震惊的试验终止之一......”
几十年来,糖尿病界有一个指导方针原理: lower blood sugar is better。具有里程碑意义的 DCCT (1993) 和 UKPDS (1998) 表明,强化血糖控制可减少微血管并发症 - 失明、肾衰竭、神经损伤。
逻辑外推:
If controlling glucose prevents complications, shouldn't intensive control prevent cardiovascular disease too?
ACCORD: Action to Control Cardiovascular Risk in Diabetes
The definitive test of intensive glucose control
所有患者均患有具有高心血管风险的 2 型糖尿病 - 既定的心血管疾病或多种危险因素。该试验设计持续 5.6 年。
February 6, 2008
数据安全监测委员会召开紧急会议。
After 3.5 years, they make an unprecedented decision:
停止试验。
令人震惊的结果
| Outcome | Intensive | Standard | HR (95% CI) |
|---|---|---|---|
| Primary CV endpoint | 352 events | 371 events | 0.90 (0.78–1.04) |
| All-cause mortality | 257 deaths | 203 deaths | 1.22 (1.01–1.46) |
| Severe hypoglycemia | 10.5% | 3.5% | 3.0× higher |
Investigation Exercise: The Clinician's Dilemma
您是一名内分泌科医生,治疗 500 名糖尿病患者。 ACCORD 结果已发布。对于一直在努力实现 HbA1c <6% 的患者,您有何建议?
强化控制对每个人都有害吗?还是只针对某些人?
亚组分析揭示:
| Subgroup | Intensive HR | Interpretation |
|---|---|---|
| No prior CVD | 1.00 (0.76–1.32) | No effect |
| Prior CVD | 1.45 (1.15–1.84) | Significant harm |
| Baseline HbA1c <8% | 1.02 (0.75–1.40) | No effect |
| Baseline HbA1c ≥8% | 1.29 (1.03–1.60) | Harm |
The average effect masked critical heterogeneity!
对于已确诊 CVD 或基线控制不佳的患者,强化治疗是有害的。
了解异质性:I² 及以上
当研究(或亚组)显示不同效果时,我们必须量化这种变化。
I² = 0–25%: 低异质性。各个研究的效果是一致的。
I² = 25–50%: Moderate. Look for sources of variation.
I² = 50–75%: Substantial. Consider whether pooling is appropriate.
I² = 75–100%: Considerable. A single pooled estimate may mislead.
但 I² 本身并不能告诉您要做什么,它表明您需要进一步研究。
Tau² (τ²):研究间方差
虽然 I² 告诉您由于异质性而导致的方差比例,但 τ² 告诉您
“总方差的哪一部分是由于研究之间的真实差异造成的?”
Scale: 0% to 100%
“研究之间的真实影响有多大差异?”
Same scale as the effect measure
Use τ² to calculate prediction intervals
预测区间显示您在新研究中预期的影响范围 - 通常比置信区间宽得多。
The Prediction Interval: What ACCORD Really Tells Us
Consider a meta-analysis of intensive glucose control across multiple trials...
Confidence Interval
HR 1.10 (0.95–1.27)
“我们的平均效果的最佳估计”
Prediction Interval
HR 1.10 (0.70–1.73)
"The range of effects in a new setting"
预测区间既有利又有害!
In some settings, intensive control might help. In others, it could kill.
When Is a Subgroup Effect Credible?
Subgroup Credibility Criteria (adapted from ICEMAN, Schandelmaier 2020 & Sun 2012)
是否预先指定了亚组分析?
事后亚组容易发生数据挖掘
Is there a plausible biological rationale?
机制应清晰且独立于数据
Is the effect consistent across related outcomes?
如果死亡出现危害,那么心肌梗死、中风是否也会出现类似危害?
Is there independent replication?
亚组效应是否已在其他研究中得到证实?
ICEMAN Applied to ACCORD
| Criterion | Assessment | Score |
|---|---|---|
| Pre-specified? | 是——先前的CVD已在方案中 | ✓ |
| Biological rationale? | Yes—hypoglycemia more dangerous with CVD | ✓ |
| Consistent outcomes? | Yes—CV mortality and all-cause mortality aligned | ✓ |
| Independent replication? | Partially—ADVANCE, VADT showed similar patterns | ~ |
ICEMAN Rating: High Credibility
The differential harm in high-risk patients appears genuine.
临床意义
对于没有 CVD 的患者: Moderate glucose control (HbA1c ~7%) remains the goal. Intensive control may reduce microvascular complications.
对于已确诊 CVD 的患者: Avoid intensive targets. Hypoglycemia is dangerous for damaged hearts.
对于老年患者: Relaxed targets. Quality of life matters. Tight control causes falls, confusion, and excess mortality.
"One size fits all" treatment is not patient-centered medicine.
Meta-Regression: Explaining Heterogeneity
When heterogeneity is high, meta-regression can identify study-level covariates that explain variation.
效应大小是否随研究特征而系统变化?
Caution
荟萃回归需要 ≥10 项研究每个协变量。由于研究较少,仅属探索性。生态谬误:研究水平的关联可能不适用于个人。
Example: In ACCORD, meta-regression might test if treatment effect varies by baseline HbA1c, showing harm concentrated in patients with very high levels.
What number saves lives? Who decides?
REAL DATA
几十年来,目标是:将血压治疗到 <140 mmHg systolic. Then came SPRINT (2015): 9,361 high-risk patients randomized to intensive (<120) vs standard (<140) targets. Intensive treatment reduced CV events by 25% and death by 27%. Trial stopped early for benefit. Guidelines changed worldwide.
Module 8 Quiz
1。为什么 ACCORD 试验提前停止?
2. What does a prediction interval tell us that a confidence interval doesn't?
3. According to ICEMAN, which factor is MOST important for subgroup credibility?
当研究存在分歧时,
听取分歧。
异质性是一条消息,而不是噪音。
缺乏证据并不等于不存在。
模块 9:隐藏的研究
缺乏证据并不等于不存在。
Reboxetine: 2010
从未见过光的 74%。
模块 9:隐藏的研究
🎯 Learning Objectives
- Interpret funnel plots for asymmetry detection
- 应用艾格检验和其他统计发表偏倚测试
- 实施偏倚调整的修剪和填充方法
- Critically appraise the limitations of publication bias tests
- 应用原则:“没有证据并不等于不存在”
The Year: 1997
"A new hope for depression patients who cannot tolerate SSRIs..."
瑞波西汀(Edronax)是一种新型抗抑郁药——一种选择性去甲肾上腺素再摄取抑制剂(NRI)。与 SSRI 不同,它针对的是不同的神经递质系统。对于失败或不能耐受氟西汀或舍曲林的患者,它提供了一种新机制。
已发表的证据
What doctors could find in medical journals:
| Comparison | Published Trials | Published Result |
|---|---|---|
| Reboxetine vs Placebo | 3 trials (n=507) | Significantly better (SMD = 0.56) |
| Reboxetine vs SSRIs | 4 trials (n=628) | Equivalent or better |
已发表的文献讲述了一个清晰的故事:
Reboxetine works. Patients benefit. Prescribe with confidence.
但是您看不到的试验又如何?
In 2010, German researchers at IQWiG made a request to the European Medicines Agency...
They demanded access to all 试验数据 — 已发表且未发表。
What they found changed everything.
完整图片
Eyding et al., BMJ 2010
| Comparison | Published Only | ALL DATA |
|---|---|---|
| Reboxetine vs Placebo | SMD 0.56 (benefit) | SMD 0.10 (no benefit) |
| Patients in analysis | 507 (14%) | 2,731 (100%) |
| Reboxetine vs SSRIs | Equivalent | 较差(危害 RR 1.23) |
| Patients in analysis | 628 (26%) | 2,411 (100%) |
Investigation Exercise: The File Drawer
您是2008 年系统审稿人。您可以在 PubMed、Embase 和 Cochrane 图书馆中搜索所有瑞波西汀试验。您发现 7 项已发表的试验显示出益处。
您能相信这个证据吗?
⚠️ 漏斗完全不对称!
所有已发表的研究都集中在一侧。无效试验和阴性试验在哪里?
发表偏差工具包
Funnel Plot
Plot effect size vs. standard error. A symmetric funnel suggests no bias; asymmetry raises alarms.
Egger's Regression Test
Regress effect/SE on 1/SE. A non-zero intercept (P < 0.10) suggests small-study effects. Note: inflated false-positive rate with binary outcomes; use Peters' test instead.
Peters' Test
For binary outcomes, regresses log OR on inverse of total sample size. Less prone to false positives.
Trim-and-Fill
估算“缺失”研究以使漏斗对称,然后重新计算汇总效应。
交互式:修剪和填充分析
让我们应用对瑞波西汀数据进行修剪和填充,看看调整后的估计值是什么...
Published Only
7 trials
SMD = 0.56
Significant benefit
Trim-and-Fill
7 + 5 imputed = 12 trials
SMD = 0.23
Reduced, still nominally significant
But even trim-and-fill underestimated the problem!
所有数据的真实效果是 SMD = 0.10(本质上为空)。
Trim-and-fill is conservative—it doesn't fully correct for selective publication.
The Best Defense: Trial Registries
发表偏差检测方法不完善。真正的解决方案是 prospective registration.
搜索试验时,始终检查注册表。将 registered 试验次数与 published次数进行比较。这个差距就是你的警告信号。
Since 2005, ICMJE requires trial registration as a condition of publication.
AllTrials 活动
"All trials registered. All results reported."
瑞波西汀丑闻以及其他药物的类似案例,催化了一场全球运动:
2013 年:EMA 临床数据政策
European Medicines Agency commits to publishing clinical study reports
2016: FDA Amendments Act enforcement
Mandatory results reporting on ClinicalTrials.gov within 12 months
AllTrials Coalition
Over 90,000 supporters, 700+ organizations demanding transparency
瑞波西汀后果
Germany's IQWiG recommended against reboxetine for depression
英国 NICE 将其降级为“不推荐”
FDA 在 2001 年拒绝了瑞波西汀(他们可以访问未发表的数据)
十多年来,患者接受的药物并不比瑞波西汀更好安慰剂。
因为只发表了阳性试验。
如果发表的结论与实际数据相反怎么办?
REAL DATA
葛兰素史克测试的研究 329帕罗西汀 adolescent depression。已发表的论文 (2001) 得出的结论是帕罗西汀为 "generally well tolerated and effective." 实际数据:帕罗西汀 failed on all 8 pre-specified outcomes. When re-analyzed (RIAT 2015), suicidal/self-harm events: 帕罗西汀组为 23 例,安慰剂组为 5 例。发表的论文重新定义了事后结果以产生生产意义。 2015年,RIAT(恢复隐形和放弃的试验)使用 原始临床研究报告 重新分析得出结论:帕罗西汀是 neither safe nor effective for adolescents.
Module 9 Quiz
1.已发表的文献中隐藏了多少比例的瑞波西汀试验数据?
2. Why can trim-and-fill underestimate the correction needed?
3. What is the best prospective defense against publication bias?
你不能做的事情请参阅
may be more important than what you can.
缺乏证据并不等于不存在。
Certainty must be earned, not assumed.
模块 10:确定性
Certainty must be earned, not assumed.
Early Surfactant: 2012
当高质量证据出现时。
模块 10:确定性
🎯 Learning Objectives
- 应用完整的 GRADE 框架来评估证据
- Evaluate all five downgrade factors (RoB, inconsistency, indirectness, imprecision, publication bias)
- Identify when to upgrade for large effect, dose-response, or confounding
- Construct Summary of Findings tables with absolute effect estimates
- 应用原则:“确定性必须赢得,而不是假设”
The Year: 1990s
"A revolution in neonatal care..."
呼吸窘迫综合征(RDS)是早产儿死亡的主要原因。外源性 surfactant——一种防止肺泡塌陷的物质——的开发是新生儿医学的伟大进步之一。
问题变成了:我们什么时候应该给予表面活性剂?
Prophylactically (to all high-risk infants) or selectively (only after RDS develops)?
原始 Cochrane 综述 (2003)
Multiple RCTs conducted before the era of routine CPAP
| Outcome | Prophylactic vs Selective | Certainty |
|---|---|---|
| Neonatal mortality | RR 0.73 (favors prophylactic) | High |
| BPD or death | RR 0.84 (favors prophylactic) | High |
但是新生儿护理世界正在发生变化...
A new technology emerged: Continuous Positive Airway Pressure (CPAP)
Non-invasive support that could help preterm lungs without intubation.
旧的证据仍然适用吗?
2012 Cochrane 更新
New trials conducted in the CPAP era
| Outcome | Old Trials | New Trials |
|---|---|---|
| BPD or death | RR 0.84 (favors prophylactic) | RR 1.12 (favors selective) |
| 机械通气的需求 | 预防性用药降低 | 使用预防性用药更高预防性的! |
Investigation: Why Did Evidence Evolve?
您是一名新生儿科医生。一位同事问:“随机试验怎么会相互矛盾?”
最初的证据是错误的吗?
Indirectness Changed
Old trials: No CPAP available. New trials: CPAP standard of care.
比较器改进
Selective surfactant + CPAP is better than prophylactic intubation.
Context Matters
一个时代的证据可能不适用于
This is why GRADE assesses Indirectness!
High-quality evidence can become inapplicable when context changes.
GRADE 框架
Grading of Recommendations, Assessment, Development and Evaluations
GRADE 回答了问题: 我们对此估计的信心有多大?
⊕⊕⊕⊕ HIGH: Very confident. True effect is close to the estimate.
⊕⊕⊕◯ MODERATE: Moderately confident. True effect likely close, but may differ substantially.
⊕⊕◯◯ LOW: Limited confidence. True effect may differ substantially.
⊕◯◯◯ VERY LOW: Very little confidence. True effect likely substantially different.
GRADE: Factors That Downgrade Certainty
随机对照试验证据从高水平开始。可以将其降级为:
Risk of Bias
Flawed randomization, lack of blinding, incomplete follow-up, selective reporting
Inconsistency
Unexplained heterogeneity across studies (large I², non-overlapping CIs)
Indirectness
人口、干预、比较器或问题结果的差异
Imprecision
Wide confidence intervals, small sample size, few events
等级:第五个因素
Publication Bias
Asymmetric funnel plot, missing registered trials, sponsor influence
Each factor can downgrade by one or two levels
High → Moderate → Low → Very Low
Example: 具有高偏倚风险(↓1)和严重间接性(↓1)的随机对照试验(从高开始)的荟萃分析将是评级 LOW.
Interactive: Apply GRADE to Surfactant
让我们使用旧试验与新试验来评估预防性表面活性剂证据的确定性。
OLD TRIALS (Pre-CPAP)
Starting: HIGH (RCTs)
Risk of Bias: Low (−0)
Inconsistency: None (−0)
Indirectness: Serious (−1)
Different standard of care today
Final: ⊕⊕⊕◯ MODERATE
NEW TRIALS (CPAP Era)
Starting: HIGH (RCTs)
Risk of Bias: Low (−0)
Inconsistency: None (−0)
Indirectness: None (−0)
Matches current practice
Final: ⊕⊕⊕⊕ HIGH
GRADE: Factors That Upgrade Certainty
观察证据从低开始。它可以升级为:
Large Magnitude of Effect
RR >2 或 <0.5,没有合理的混杂
Dose-Response Gradient
Higher exposure = larger effect in a consistent pattern
Residual Confounding
All plausible confounders would reduce the effect (strengthens causal inference)
Communicating Certainty
GRADE requires transparent language about confidence:
HIGH: "Prophylactic surfactant reduces mortality..."
MODERATE: "Prophylactic surfactant probably reduces mortality..."
LOW: "Prophylactic surfactant may reduce mortality..."
VERY LOW: "We are uncertain whether prophylactic surfactant reduces mortality..."
这种语言可确保临床医生了解证据的强度。
Can too much of a lifesaver become a killer?
REAL DATA
1940s-50s: High oxygen concentrations saved premature babies from respiratory failure. Then came an epidemic of blindness—retrolental fibroplasia (now called ROP). Doctors reduced oxygen dramatically. Blindness dropped. But then: increased deaths and brain damage 来自缺氧。所需的最佳氧气水平 decades of trials to find. Recent SUPPORT/BOOST II trials finally defined the therapeutic window: SpO2 91-95%.
Module 10 Quiz
1。为什么表面活性剂推荐在 2003 年至 2012 年间出现逆转?
2。以下哪一项不是成绩降级因素?
3.低确定性证据应使用什么语言?
数字是还不够。
您必须表达您的确定性。
Certainty must be earned, not assumed.
方法保护患者免受我们的信任。
模块 11:生活回顾
方法保护患者免受我们的信任。
COVID-19 Hydroxychloroquine: 2020
当紧急情况满足证据时。
模块 11:生活回顾
🎯 Learning Objectives
- 申请进行试验序贯分析以确定证据何时充足
- 设计并维护实时系统评价
- Establish update triggers and futility/harm boundaries
- Manage multiplicity and alpha-spending in sequential analyses
- Explain how rapid evidence synthesis evolved during COVID-19
March 2020: A World in Crisis
“病毒传播速度超出了我们的理解......”
COVID-19 导致数千人死亡。重症监护室人满为患。没有疫苗,没有治疗方法。然后一线希望: hydroxychloroquine (HCQ)—an old malaria drug—showed antiviral activity in lab studies.
急于采用
Gautret 研究几周内:
March 28: FDA issues Emergency Use Authorization for HCQ
April 4: India bans HCQ export (hoarding fears)
Global: Shortages affect lupus and rheumatoid arthritis patients
Millions received HCQ based on a 36-patient observational study
What could go wrong?
调查:Gautret 研究
您是EBM 专家要求评估法国的 HCQ 研究。检查设计...
| Issue | Impact |
|---|---|
| Non-randomized | Selection bias—who got HCQ? |
| 6 patients excluded | 3 went to ICU, 1 died, 1 withdrew, 1 had nausea |
| Surrogate outcome | Viral load, not clinical outcomes |
| 来自不同医院的对照 | Different care, different testing |
| No blinding | Expectation bias in lab testing |
这项研究将在 RoB 2.0 上获得高偏倚风险
GRADE certainty: VERY LOW. Yet it changed global policy.
Why Observational COVID Studies Misled
Immortal Time Bias
Patients must survive long enough to receive treatment. Survivors are compared to non-survivors.
Confounding by Indication
Sicker patients may get different treatments. Healthier patients received HCQ early.
Healthy User Effect
Patients who seek treatment tend to be healthier overall.
Outcome Reporting
具有阳性结果的研究更快地发表。
2020 年 6 月:RCT 报告
Large, rigorous trials completed at remarkable speed
| Trial | N | Result |
|---|---|---|
| RECOVERY (UK) | 4,716 | No benefit on mortality (RR 1.09) |
| WHO SOLIDARITY | 954 | No benefit (RR 1.19) |
| ORCHID (US) | 479 | 停止徒劳 |
时间线:观察与 RCT 证据
March-May 2020
Observational: ~20 studies
Suggest benefit
Pooled OR ~0.65
June-July 2020
RCTs: RECOVERY, SOLIDARITY
Show no benefit/harm
Pooled RR ~1.10
在 3 个月内从“有希望”到“无效”
这就是为什么我们需要随机化和实时回顾来跟踪不断变化的证据。
Living Systematic Reviews
快速发展的新方法证据:
Continuous Surveillance
每周甚至每天搜索文献以获取新证据
Cumulative Meta-Analysis
Update pooled estimates as each new trial reports
试验序贯分析 (TSA)
Determine when sufficient information has accumulated to conclude
Transparent Versioning
Track every change, maintain full audit trail
试验序贯分析 (TSA)
When have we learned enough?
TSA 将停止边界应用于荟萃分析 - 类似于单个试验中的中期分析。它解释了 required information size (RIS) needed to detect or exclude a clinically meaningful effect.
对于新冠病毒中的 HCQ,TSA 显示到 2020 年 6 月已经跨越了无效边界。
HCQ 传奇
1. Observational studies can mislead spectacularly 偏见盛行时的经验教训。即使许多指向同一方向的研究也可能是错误的。
2. RCTs can be conducted quickly when the will exists. RECOVERY enrolled 5,000+ patients in weeks.
3。实时回顾至关重要 for evolving topics. Fixed-point-in-time reviews become obsolete instantly.
4. Political pressure doesn't change biology. 即使在压力下,严格的方法也能保护患者。
如果预防就是原因怎么办?
REAL DATA
For decades, pediatric guidelines recommended: avoid peanuts in infancy to prevent allergy. Meanwhile, peanut allergy rates tripled 从 1997 年到 2008 年。然后来了 LEAP (2015): 640 high-risk infants randomized to early peanut introduction vs. avoidance. Result: Early introduction reduced peanut allergy by 81% (1.9% vs 13.7%)。预防策略导致了流行病。
Module 11 Quiz
1。 Gautret 羟氯喹研究的主要缺陷是什么?
2. What does Trial Sequential Analysis help determine?
3。为什么观察性新冠肺炎研究显示 HCQ 有益,而随机对照试验却没有?
Speed cannot replace rigor.
But rigor can be fast.
Living reviews balance both.
并非每个信号都是真实的。
模块 12:高级方法
并非每个信号都是真实的。
Advanced Methods
Beyond pairwise meta-analysis.
模块 12:高级方法
🎯 Learning Objectives
- Interpret network meta-analysis geometry and SUCRA rankings
- Apply bivariate models for diagnostic test accuracy meta-analysis
- Conduct dose-response meta-analysis with flexible splines
- Understand when individual patient data (IPD) meta-analysis is needed
- 认识每个高级方法的假设和局限性方法
当配对不够时
“有时问题比 A 与 B 更复杂......”
您学到的方法构成了基础。但临床现实往往要求更多: Which of 10 antidepressants is best? What's the optimal dose of statin? Does this test accurately diagnose early cancer?
该模块介绍了四种高级方法 - 每种方法回答不同的复杂问题。
Network Meta-Analysis (NMA)
When you have many treatments but few head-to-head trials
NMA combines direct evidence (A vs B) with indirect evidence (A vs C, B vs C → inferred A vs B) to compare multiple treatments simultaneously.
NMA Example: Antidepressants
The landmark Cipriani 2018 NMA compared 21 antidepressants using 522 trials.
The Challenge
21 drugs, but not every pair tested head-to-head
Many vs. placebo, few vs. each other
The Solution
NMA 结合了整个网络的直接和间接证据
对所有 21 种方法的有效性和可接受性进行排名
结果:一些药物的疗效排名较高,另一些药物的可接受性排名较高
没有一种药物是普遍“最好”的;通过可信区间、传递性和临床权衡来解释排名。
NMA: Critical Assumptions
Transitivity
Effect modifiers should be similarly distributed across comparisons; otherwise indirect comparisons may be biased
Consistency
直接和间接证据一致(可测试)
Connected Network
All treatments linked through at least one common comparator
When assumptions fail, NMA can mislead
始终评估传递性并测试不一致。
Dose-Response Meta-Analysis
寻找最佳剂量
Uses the Greenland-Longnecker method 使用受限三次样条来模拟剂量和效果之间的非线性关系。
Non-linear patterns
J-shaped (alcohol & mortality), U-shaped (vitamin D), threshold (aspirin)
Clinical relevance
找到具有最佳效益-危害平衡的剂量,而不仅仅是“越多越好”
个体患者数据 (IPD)
亚组分析的黄金标准
Instead of published summary data, obtain 原始患者水平来自试用者的数据 。实现精确的亚组分析、事件时间建模和标准化定义。
早期乳腺癌试验者协作小组在 20 世纪 80 年代率先提出了 IPD MA。
Diagnostic Test Accuracy (DTA)
当“干预”是测试
DTA meta-analysis synthesizes sensitivity (真阳性率)和 specificity (true negative rate)—two correlated outcomes requiring bivariate models.
Bivariate/HSROC Model
说明敏感性和特异性之间的相关性
SROC Curve
具有 95% 置信度的 ROC 曲线汇总和预测区域
QUADAS-2
Quality Assessment of Diagnostic Accuracy Studies
选择正确的方法
| Question | Method |
|---|---|
| Does A beat B? | Pairwise MA |
| Which of many treatments is best? | Network MA (NMA) |
| 最佳方法是什么剂量? | Dose-Response MA |
| Who benefits most? (subgroups) | IPD MA |
| 此测试的准确度如何? | DTA MA |
| 效果如何随时间变化? | Survival/Time-to-Event MA |
方法必须与问题匹配。永远不要用错误的方法提出问题。
Three large trials. Three different answers. What do you believe?
REAL DATA
CORTICUS (2008): 499 patients. Hydrocortisone in septic shock. No mortality benefit. ADRENAL (2018): 3,658 patients. Hydrocortisone. No mortality benefit. APROCCHSS (2018): 1,241 patients. Hydrocortisone + fludrocortisone. Mortality reduced (43% vs 49.1%, p=0.03). Same class of intervention. Different protocols. Different results.
Module 12 Quiz
1。网络荟萃分析相对于成对分析的主要优势是什么?
2. Why does DTA meta-analysis require bivariate models?
3. What does the "consistency" assumption in NMA require?
课程生态系统
本课程涵盖完整的系统审核工作流程。如需深入了解,请探索配套课程:
Bivariate/HSROC, SROC curves, QUADAS-2
RoB 2, ROBINS-I/E, domain-level assessment
Full SoF tables, GRADE-CERQual
One-stage/two-stage, mixed-effects models
Copas, PET-PEESE, p-curve, selection models
AMSTAR 2, ROBIS, overlap correction
CHARMS, PROBAST, c-statistic pooling
TSA, update triggers, abbreviated methods
Module 12 Complete
“方法必须与问题相匹配。高级方法可以回答高级问题,但基本原理永远不会改变。”
您已经掌握了核心工作流程。接下来的十个模块探索前沿:贝叶斯推理、网络荟萃分析、个体患者数据、剂量反应模型、稳健性和脆弱性、公平性、人工智能辅助合成、定性证据、多变量方法和再现性。
并非每个信号都是真实的。
模块 13:贝叶斯 Turn
并非每个信号都是真实的。
模块 13:贝叶斯 Turn
模块 13:贝叶斯 Turn
🎯 Learning Objectives
- Explain the频率论和贝叶斯推理之间的差异
- Interpret prior distributions, likelihoods, and posterior distributions
- Distinguish credible intervals from confidence intervals
- Understand when Bayesian meta-analysis offers advantages
- Recognize how prior choice affects conclusions
In 2005, a trial began
that would never truly end.
前列腺癌的 STAMPEDE 试验采用了多臂、多阶段 (MAMS) 平台设计。随着证据的积累,武器可能会增加或减少。尽管其统计数据是频率论的,但自适应哲学体现了贝叶斯精神:随着数据积累更新决策。
频率论世界观
In frequentist statistics, probability means long-run frequency。 95% CI 并不意味着“真实效应在内部的可能性为 95%”。这意味着:如果我们无限重复研究,95% 的区间将包含真相。
贝叶斯世界观
In Bayesian statistics, probability represents degree of belief. We start with a prior (数据之前我们相信的),用 likelihood (数据告诉我们的)更新,并得到 posterior (updated belief).
Prior × Likelihood = Posterior
贝叶斯'定理: P(θ|data) ∝ P(data|θ) × P(θ)
Credible Intervals
95% 的可信区间在概率上是可解释的,以指定模型和先验为条件。
Choosing Priors
Non-informative (Vague)
Normal(0, 10000) 或均匀。让数据占主导地位。模仿频率论结果。
Weakly Informative
Normal(0, 1) for log-OR. Regularizes extreme estimates while remaining flexible.
Informative
Based on previous evidence. Powerful but controversial. Must be pre-specified.
Half-Cauchy for τ
Recommended for heterogeneity. Half-Cauchy(0, 0.5) allows large τ but concentrates near zero.
MCMC Sampling
Most Bayesian models cannot be solved analytically. We use Markov Chain Monte Carlo (MCMC) 从后验中抽取样本。工具:JAGS、Stan、brms (R)、PyMC (Python)。
Bayesian Model Averaging
Instead of choosing between fixed-effect and random-effects models, Bayesian model averaging ,tail-ESS > 400 (BMA) 按后验概率对每个模型进行加权。这解释了最终估计中的模型不确定性。
Bayes Factors
BF₁₀ > 10 = H₁ 的有力证据。 BF₁₀ < 1/10 = H₀ 的有力证据。
Interactive: Posterior Visualizer
调整先验强度以查看其如何影响后验。观看更多数据如何压倒先前的数据。
STAMPEDE 故事
STAMPEDE 于 2005 年推出,有 5 个研究部门比较晚期前列腺癌的治疗方法。到 2016 年,它添加了阿比特龙,死亡率降低了 37%(HR 0.63,95% CI 0.52–0.76)。
平台设计体现了贝叶斯适应性思维:中期分析指导治疗组选择,新治疗组可以随着治疗的出现而进入,无效的治疗组尽早退出——使患者免于无效
STAMPEDE 招募了 100 多个中心的 10,000 多名患者,从根本上改变了前列腺癌护理。贝叶斯思维方式可以积累证据并实时为决策提供信息。
Decision Tree: When to Go Bayesian?
Remember Module 1?
CAST Through a Bayesian Lens
如果使用来自基础科学的信息先验(抗心律失常药物抑制 PVC)对 CAST 进行贝叶斯分析,后验仍然会强烈转向伤害。有了足够的数据,即使是强大的先验也会屈服于可能性。教训:贝叶斯方法不能防止不良先验,但它们做出假设 transparent.
Module 13 Quiz
Q1. What does a 95% Bayesian credible interval mean?
Q2. 研究间异质性的建议先验是什么(τ)?
Module 13 Complete
“贝叶斯转向与数学无关。它与诚实有关,使我们的假设可见。”
并非每个信号都是真实的。
模块 14:网络
方法保护患者免受我们的信任。
模块 14:网络
模块 14:网络
🎯 Learning Objectives
- Explain why pairwise comparisons are insufficient when many treatments exist
- Interpret network geometry (nodes, edges, thickness)
- 了解传递性、一致性以及间接证据
- Interpret SUCRA rankings and league tables
- Recognize when NMA assumptions are violated
A clinician faces a patient
与抑郁症的作用。哪种药物?
常用的抗抑郁药有 21 种。大多数头对头试验仅比较 2 或 3 个。 Cipriani 等人。 (2018, Lancet) 将 522 项试验和 116,477 名患者连接到一个网络中。
网络元分析的逻辑
Direct Evidence
Trials directly comparing A vs B give the most reliable estimate.
Indirect Evidence
如果 A vs C 和 B vs C 存在,我们可以推断 A vs B。这是“传递”假设。
Mixed Evidence
NMA combines both, weighted by precision, to rank all treatments simultaneously.
Interactive: Network Graph
每个节点都是一个治疗。边缘厚度代表比较这两种处理的研究数量。
Transitivity & Consistency
Transitivity:间接估计(通过通用比较器)应近似于直接估计。这要求效果修饰符在比较中类似地分布。
Consistency:比较直接和间接证据的统计测试。全局(治疗交互设计)和局部(节点分裂)测试有助于识别不一致循环。
SUCRA & P-scores
Caution: Ranking is seductive but misleading when differences between treatments are small or uncertain. Always report credible/confidence intervals alongside ranks.
Component NMA
When interventions are complex (e.g., behavioral + pharmacological), component NMA decomposes multi-component treatments to estimate the individual contribution of each component. Uses additive models: effect(A+B) = effect(A) + effect(B) + interaction.
Cipriani 网络
2018 年《柳叶刀》分析发现,所有 21 种抗抑郁药都比安慰剂更有效。阿米替林、米氮平和文拉法辛的疗效排名最高。阿戈美拉汀、氟西汀和艾司西酞普兰在可接受性方面排名最高(最少退出率)。
没有任何一种药物在所有结果上都“获胜”。该网络揭示了成对分析中看不见的权衡。
Decision Tree: Is NMA Appropriate?
Module 14 Quiz
Q1. 要使间接证据在 NMA 中有效,必须满足什么假设?
Module 14 Complete
“网络可以看到成对比较所不能看到的:治疗选择的整体情况。”
并非每个信号都是真实的。
模块 15:个人
What was hidden in plain sight?
模块 15:个人
模块 15:个人
🎯 Learning Objectives
- Explain why aggregate data can mask treatment–covariate interactions
- Distinguish one-stage from two-stage IPD models
- Recognize ecological bias in aggregate meta-analysis
- Understand the practical challenges of IPD collection
- Interpret treatment–covariate interaction plots
For decades, breast cancer trials
已发布摘要。不是患者。
早期乳腺癌试验者合作小组 (EBCTCG) 在数百项试验中收集了超过 100,000 名女性的个人记录。他们的 IPD 荟萃分析表明,他莫昔芬的益处在很大程度上取决于雌激素受体状态,而雌激素受体状态在汇总数据中是不可见的。
摘要隐藏的内容
每项已发表的他莫昔芬试验都报告了总体结果。在数百项研究中,他莫昔芬似乎提供了一定的益处。但“适度的获益”是一个平均值,掩盖了深刻的事实。
隐藏的亚组分裂
总体汇总效应(混合有反应和无反应的患者)是统计虚构的。 “适度”平均值夸大了一组的获益,而暗示另一组没有获益。
总体与个体患者数据
IPD 允许:(1) 一致的结果定义,(2) 按患者特征进行亚组分析,(3) 事件时间建模,(4) 检查生态偏差。它是 gold standard for exploring treatment effect modification.
One-Stage vs Two-Stage IPD
Two-Stage
Analyze each study separately, then combine estimates (like standard MA). Simple but loses information.
One-Stage
同时将单个混合效应模型拟合到所有患者数据。对于交互和罕见事件更强大。
Key: 两者都应该考虑研究聚类。切勿像从一项大型试验中那样汇集 IPD,这会带来混淆(辛普森悖论)。
Ecological Bias
A meta-regression using study-level mean age might show older patients benefit more. But this could be ecological bias- 研究级别的关联并不反映患者级别的真相。只有 IPD 可以分离 within-study from between-study effects.
当整体与部分相关时
辛普森悖论:当数据按混杂变量分组时,聚合数据中出现的趋势会逆转。
实践中的悖论
A mega-trial analysis found Treatment X beneficial overall. But 每个研究,它是有害的。如何?研究之间基线风险的差异造成了一种错觉——病情较重的人群碰巧接受了更多的治疗,从而夸大了总体效益。
Cates (2002, BMJ)表明,在不考虑聚类的情况下汇总研究可以扭转效果的明显方向。
这就是为什么 IPD 一阶段模型将研究作为聚类变量,以防止研究间混杂伪装成治疗效果。
EBCTCG 遗产
40 年来,EBCTCG 的 IPD 荟萃分析一直在定义乳腺癌治疗。他们 2005 年对他莫昔芬与不治疗的分析显示,对 ER 阳性肿瘤有明显的获益(RR 0.59),但对 ER 阴性肿瘤没有获益(RR 0.97)。
如果没有 IPD,两组的总体总体效应将被汇总——稀释了获益,并可能否认 ER 阳性患者的获益程度。
Decision Tree: When Is IPD Worth Pursuing?
您能否从 >80% 的试验中获得 IPD?
Is ecological bias a concern?
EBCTCG 收集了 40 年来数百次试验的数据。大多数 IPD 荟萃分析涉及 5-20 项试验。决定取决于问题,而不是野心。
模式重复
还记得模块 3 吗? HRT 在观察性研究中似乎有益,但在随机对照试验中却有害。发生了相同的总体掩盖:总体利益掩盖了亚组危害。
妇女健康倡议的 IPD 分析后来表明 timing mattered— 绝经 10 年内开始 HRT 的女性与更晚开始 HRT 的女性有不同的结果。 “时间假设”在已发表的汇总摘要中是不可见的。
这个教训再次出现: 汇总数据可能会掩盖关键的治疗-协变量相互作用。无论是乳腺癌的 ER 状态还是 HRT 的时间安排,个体层面的数据揭示了摘要所隐藏的内容。
Module 15 Quiz
Q1. IPD 相对于聚合数据荟萃分析的主要优势是什么?
Module 15 Complete
“每个汇总估计的背后都有个人,他们的故事总体无法讲述。”
异质性是一条消息,而不是噪音。
模块 16:剂量
异质性是一条消息,而不是噪音。
模块 16:剂量
模块 16:剂量
🎯 Learning Objectives
- Explain why simple pairwise comparisons miss dose–response relationships
- Distinguish linear, quadratic, and spline dose–response models
- Interpret restricted cubic splines with knots
- Identify threshold effects and J/U-shaped curves
- Understand model comparison with AIC/BIC
几十年来,适量饮酒
似乎可以保护心脏。
“J 形曲线”显示不饮酒者的心血管死亡率高于适度饮酒者。但斯托克韦尔等人。 (2016) 证明 J 曲线是将以前饮酒者(因病戒酒)错误分类为“戒酒者”的人为因素。
A Scientific Consensus Built on Sand
到 2010 年,超过 100 项观察性研究证实了 J 曲线。医学教科书是这么教的。心脏病专家引用了它。葡萄酒行业游说者资助了围绕它的会议。
证据似乎压倒性的。但如果对照组——“戒酒者”——被污染了怎么办?
病态戒烟者
A Hidden Confounder
The Problem
People who stop drinking often do so because they are already ill—肝脏疾病、药物相互作用、癌症诊断。这些“以前的饮酒者”在大多数研究中被归类为“戒酒者”。
The Effect: The reference group (abstainers) appeared less healthy——不是因为戒酒有害,而是因为病人加入了戒酒。
When Stockwell et al. (2016, J Stud Alcohol Drugs) removed former drinkers and applied appropriate study-quality corrections: J 曲线消失了。保护效果是幻觉。
Dose–Response Meta-Analysis
Standard meta-analysis asks: "Does treatment X work?" Dose–response meta-analysis asks: "At what dose 治疗 X 效果最好吗?”它模拟了多项研究中剂量水平和结果之间的关系。
Restricted Cubic Splines
RCS place knots 在预先指定的剂量点并在它们之间拟合平滑多项式。通常在剂量分布的分位数处有 3-5 个结。超出边界结的线性。非线性测试将样条模型与更简单的线性进行比较模型。
Model Comparison
AIC/BIC 比较线性拟合与样条拟合。较低 = 更好。同时测试线性度的偏差(样条项的 p 值)。
Interactive: Dose–Response Builder
比较线性拟合、二次拟合与样条拟合。观察模型形状如何随不同假设而变化。
酒精 J 曲线揭秘
斯托克韦尔2016年的重新分析发现,当以前的饮酒者被正确排除在“戒酒者”参考组之外时,适度饮酒的保护作用就消失了。 J 曲线是由生病戒烟者偏差驱动的。
剂量反应荟萃分析揭示了真相:曲线的形状主要取决于您如何定义“零剂量”。错误的参考类别造成了虚假的好处。
When Curves Shape Policy
The phantom J-curve influenced alcohol guidelines worldwide:
NHS Guidance (until 2016)
“适量饮酒可保护心脏”出现在官方指南中。斯托克韦尔修正后,英国将 all 饮酒者的限制修订为每周 14 单位(之前男性为 21 单位)。没有任何数量被宣布为“安全”。
Dietary Guidelines Advisory Committee
2015 年引用了 J 曲线研究。2020 年委员会建议将男性饮酒限制降至 1 次/天,承认参考组偏差。
Australian Guidelines
Safe drinking limits were delayed by industry-funded J-curve research promoting “cardioprotective” moderate intake.
Decision Tree: Is Dose-Response Analysis Appropriate?
是关系似乎是非线性的?
Standard pairwise meta-analysis (no dose-response possible with only two levels)
Module 16 Quiz
Q1. What makes restricted cubic splines useful in dose–response meta-analysis?
Module 16 Complete
“剂量产生毒物。曲线的形状揭示了毒物是否真实。”
缺乏证据并不等于不存在。
模块 17:脆弱性
缺乏证据并不等于不存在。
模块 17:脆弱性
模块 17:脆弱性
🎯 Learning Objectives
- 计算和解释脆弱性指数
- 使用 GOSH 图识别有影响力的研究和子集效应
- Interpret contour-enhanced funnel plots
- 应用 Copas 选择模型和 PET-PEESE 进行发表偏见
- Understand how sensitivity analyses strengthen meta-analytic conclusions
Governments stockpiled billions
基于他们看不到的证据。
H1N1流感之后,各国政府花费了数十亿美元用于奥司他韦(达菲)库存。 Cochrane 团队(Jefferson et al. 2014)多年来一直在努力获取未发表的数据。当他们最终这样做时,预防并发症的证据就消失了。
脆弱性指数
脆弱性指数要求: "How many patients would need to change outcome to flip a statistically significant result to non-significant?" 它迭代地在事件较少的组中添加事件(将非事件转换为事件),直到 p > 0.05.
Interactive: Fragility Calculator
Enter a 2×2 table to calculate the fragility index. Watch events shift until significance flips.
GOSH Plots
研究异质性图形概述 (GOSH)将荟萃分析模型拟合到所有可能的研究子集。每个点都绘制了一个子集的汇总效应与 I² 的关系。聚类表明不同的子组;异常云表明一项研究驱动了异质性。
对于 k 个研究,有 2k−1 subsets. For k > 15, random sampling is used.
Contour-Enhanced Funnel Plots
Standard funnel plots show effect size vs standard error. Contour-enhanced 版本添加了 p < 0.01、p < 0.05 和 p < 0.10 的阴影区域。如果缺失的研究落在不重要的区域,则可能存在发表偏倚。如果它们落在重要区域,则其他原因(例如研究质量)可能会解释不对称性。
Copas Selection & PET-PEESE
Copas Selection Model
将研究发表的概率建模为其 SE 和效应大小的函数。联合评估真实效果和选择机制。
PET-PEESE
Precision-Effect Test (PET): regress effects on SE. If intercept = 0, no true effect. PEESE uses SE² for better performance when a true effect exists.
奥司他韦传奇
罗氏资助的原始荟萃分析(Kaiser 2003)显示奥司他韦使流感并发症减少了 67%。但 10 项试验中有 8 项从未发表。 Cochrane 获得临床研究报告后,并发症的获益降至不显着的 11%。
脆弱性不仅仅是统计上的,而是信息性的。证据基础本身丢失了大部分数据。
决策树:解释您的脆弱性结果
Highly fragile. 少数不同的事件会推翻结论。请极其谨慎地解释。
Moderately fragile. 对小扰动敏感。是否有未发表的试验可能会改变这种情况?
Relatively robust. But remember: fragility is only one dimension. Publication bias can undermine even robust results.
Walsh et al. (2014, J Clin Epidemiol)发现,在顶级期刊上发表的 399 项随机对照试验中,脆弱性指数中位数仅为 8。超过 25% 的 FI ≤ 3。影响临床实践的里程碑式试验常常被统计线索所困扰。
Beyond the Index: Structural Fragility
奥司他韦传奇揭示了 three types of fragility——而脆弱性指数仅捕获了首先。
Statistical Fragility (FI)
有多少事件翻转 p 值?这就是脆弱性指数的衡量标准。它量化了对个体患者结果的敏感性。
Informational Fragility
有多少证据被隐藏?十项罗氏奥司他韦试验中有八项尚未发表。证据基础在结构上不完整。
Analytical Fragility
有多少研究人员自由度可以改变结论?不同的结果定义、分析人群或统计方法。
回调至模块 10(帕罗西汀): 使用不同结果定义的重新分析完全推翻了结论。这就是分析的脆弱性——FI 从未被计算过,因为终点本身存在争议。完整的稳健性评估检查所有三个维度。
Module 17 Quiz
Q1. 一项试验每组有 200 名患者,12 个治疗事件,25 个对照事件 (p=0.03)。脆弱性指数为 3。这意味着什么?
Module 17 Complete
“每次尝试打破它时幸存的数字就是值得的数字
并非每个信号都是真实的。
模块 18:权益
Certainty must be earned, not assumed.
模块 18:权益
模块 18:权益
🎯 Learning Objectives
- Identify how trial exclusion criteria create evidence gaps
- 应用 PROGRESS-Plus 框架评估证据的公平性
- Use PRISMA-Equity reporting guidelines
- Understand transportability: when trial findings fail in practice
- Design equity-sensitive search and synthesis strategies
SPRINT proved tight blood pressure control
saves lives. But whose lives?
具有里程碑意义的 SPRINT 试验排除了患有糖尿病、既往中风和心力衰竭的患者。超过 75% 的美国高血压患者不符合资格。证据有力,但适用范围较窄。
排除了大多数患者的试验
SPRINT 招募了 9,361 名患者,并证明强化血压控制(目标 <120 mmHg)可将心血管事件减少 25%(HR 0.75,95% CI 0.64–0.89)。但纳入标准却讲述了不同的故事。
谁被排除在外:
- Diabetes — 35% 的美国成年人患有高血压
- Prior stroke — 8% 的高血压人群
- Symptomatic heart failure — 6% of hypertensive adults
- Expected survival <3 years — 最虚弱的患者
- Nursing home residents — excluded entirely
- GFR <20 mL/min — advanced kidney disease
结果:结束75% 患有高血压的美国成年人不符合资格。证据是有力的。但对于谁呢?
证据来自
78%
of cardiovascular mega-trial participants came from high-income countries (2000–2020).
6%
from sub-Saharan Africa — where cardiovascular disease is rising fastest.
Polypill 试验:5 项试验中的 4 项是在平均 BMI <25 的人群中进行的。美国平均 BMI 为 30。不同人群的药物代谢、合并症模式、医疗保健获取和遗传变异都不同。 Efficacy in one population does not guarantee effectiveness in another.
参考:多国试验和 PROGRESS-Plus 差距
PROGRESS-Plus Framework
Plus: Age, disability, sexual orientation, other vulnerable groups.
PRISMA-Equity & Transportability
PRISMA-Equity 将 PRISMA 扩展为要求报告审查中如何解决公平问题:人口特征、按劣势进行的亚组分析以及对服务不足的适用性的评估
Transportability:试验效果并不等于现实世界的效果。存在重新加权试验数据以匹配目标人群分布的方法。
From Trial to Real World: Transportability
Transportability >= 试验人群 X 的结果是否可以应用于目标人群 Y?这不是一个哲学问题——它有正式的方法。
Inverse Probability of Participation Weighting (IPPW)
Re-weights trial participants so they resemble the target population on key covariates.
Generalizability Index
量化试验样本与目标人群在观察到的特征上的相似程度。
Stuart et al. (2015, Stat Med): 当 SPRINT 结果重新加权以匹配美国高血压人群时,估计的益处被削弱——HR 0.82(对比试验中的 0.75)。治疗仍然有效。但当人群发生变化时,幅度也会发生变化。
SPRINT 和缺失的大多数
SPRINT 是一项针对 9,361 名患者精心设计的试验。其发现(强化血压控制与标准血压控制的 HR 为 0.75)改变了全世界的指南。但随后的分析显示,在最像试验人群的亚组中获益最强,而对于被排除的群体则不确定。
证据合成的公平性意味着不仅仅要问“它有效吗?”但是“它对谁有效?”
决策树:您的审核的公平性评估
ROOT: 您的审核证据是否来自与您的目标相似的人群?
YES → Good. But check: Are subgroups (age, sex, ethnicity, SES) reported separately?
- Yes: Use subgroup effects for population-specific recommendations
- No: Flag as limitation — equity gap in reporting
NO → Does PROGRESS-Plus analysis reveal differential effects?
- Yes: Population-specific recommendations needed. Consider transportability re-weighting.
- No: Cautious generalization with explicit equity statement in discussion
Callback: The HRT Lesson Revisited
还记得模块 3 吗? HRT 故事表明 healthy-user bias 使有害的治疗显得有益。 SPRINT 可能有相反的问题——“健康志愿者”效应可能会出现有效的治疗 more effective than it would be in the real world.
每个荟萃分析都应该问:谁被包括在内?谁被排除在外?这重要吗?
Module 18 Quiz
Q1. What does the PROGRESS-Plus framework help reviewers assess?
Module 18 Complete
“排除弱势群体的证据不能声称为他们服务。”
并非每个信号都是真实的。
模块 19:机器
没有出处的数字不是数字。
模块 19:机器
模块 19:机器
🎯 Learning Objectives
- Describe how AI/ML is used in systematic review screening
- Explain active learning and human-in-the-loop workflows
- Assess automation validation: recall, workload savings, and risk
- 认识到局限性并算法筛选的偏差
- 在证据合成中应用负责任的人工智能使用框架
When COVID-19 hit,
papers arrived faster than humans could read.
到 2021 年,已有超过 300,000 篇新冠肺炎论文。 Cochrane 使用机器学习分类器对快速评论的研究进行分类,将筛查工作量减少多达 70%,同时保持 >95% 的召回率。
The Flood
By April 2020, 4,000 COVID preprints appeared every week.
PubMed indexed 500 new COVID articles per day.
Cochrane's screening queue hit 10,000 unreviewed titles.
A pair of reviewers screens ~200 titles per day.
At 500 new articles/day, they fell further behind with every hour.
活生生的评论在它能够生存之前就已经死了。
第一个尝试
这个想法并不新鲜。科恩等人。 (2006,JAMIA) 首先表明机器学习可以减少 50% 的筛选工作量,而召回率损失不到 5%。
但是模拟并不现实。 COVID 将是第一个真正的大规模测试。
AI in Systematic Reviews
Screening Prioritization
Active learning ranks citations by relevance. Reviewers screen the most likely relevant first.
数据提取辅助
NLP 提取 PICO 元素、结果和结果。始终需要人工验证。
Risk of Bias Assessment
ML classifiers predict RoB domains. Experimental—human judgment remains gold standard.
Validating Automation
基本张力: 自动化节省了时间,但引入了新的错误源。始终报告工具、版本、训练数据和停止标准。
要了解机器是否错过了相关研究, you need a human to screen everything.
But if humans screen everything, 为什么使用机器?
The solution: prospective holdout validation.
- Random 10% sample screened by both human and machine
- 比较:机器是否错过了人类发现的内容?
- If recall drops below 95%, retrain and expand human screening
信任,但验证。机器赢得了它的角色——它不会继承它。
Cochrane's COVID Response
Cochrane 使用经过数百万条记录训练的机器学习分类器构建了 COVID-19 研究登记册。该系统实现了 99% 的灵敏度,同时将手动筛查从几周缩短到几天。
但机器是一个工具,而不是替代品。每一项纳入的研究仍然经过人类评审员的验证。教训:人工智能增强了审稿人的能力,而不是取代他们。
几乎找不到的研究
2020 年 6 月,RECOVERY 试验公布了其地塞米松结果——the first treatment proven to reduce COVID mortality (28-day mortality: 22.9% vs 25.7%, RR 0.83).
预印本出现在 medRxiv 上,标题不标准。类似的情况在大流行期间反复发生:机器学习分类器经过现有术语的训练,将不熟悉的框架排名较低。
在几篇实时评论中,人类评论员扫描标记的标题识别了关键药物名称,并升级了分类器已降低优先级的研究。
如果没有这些人,里程碑式的治疗结果可能要等几周才能进入现实世界。审查。
机器读取速度更快。人类阅读得更深。两者都不够。
Decision Tree: When Should You Use AI?
Active learning prioritization. Dual-screen random 10% holdout. Stop when 3 consecutive batches yield 0 relevant studies.
Report: classifier type, training data, recall on holdout, stopping rule.
For <5,000 titles, dual human screening remains gold standard. AI adds complexity without proportionate benefit.
If yes → AI is especially valuable. Continuous classifier retraining on new evidence. But: 永远不要让机器做出最终的收录决定。
模式重复
还记得模块 6 吗? Poldermans 编造的 DECREASE 数据指导了围手术期 β 受体阻滞剂指南长达十年。
AI can now detect statistical anomalies automatically:
- GRIM test: 报告的平均值是否与整数样本量一致?
- SPRITE: 报告的汇总统计数据是否可以根据可信的个体数据重建?
- Statcheck: Do reported p-values match the test statistics?
这些工具发现了异常情况在 hundreds of published papers—faster than any human auditor.
但是机器标志。人类法官。撤回的决定仍然是非常人性化的。
Module 19 Quiz
Q1. 系统评价中人工智能辅助筛查的最低可接受召回率是多少?
Module 19 Complete
“机器阅读速度更快。人类阅读更深。他们一起阅读真相。”
并非每个信号都是真实的。
模块 20:定性
方法保护患者免受我们的信任。
模块 20:定性
模块 20:定性
🎯 Learning Objectives
- Explain why some questions require qualitative evidence synthesis
- Describe meta-ethnography (Noblit & Hare) and thematic synthesis
- Apply the CERQual framework to assess confidence in qualitative findings
- Understand mixed-methods synthesis approaches
- Recognize when qualitative evidence changes practice
世界卫生组织提出了一个问题
没有随机对照试验可以回答。
为什么全世界的妇女在分娩过程中都会遭受不尊重和虐待?博伦等人。 (2015) 将来自 34 个国家的 65 项定性研究综合为七个虐待领域的框架。
超越随机化的问题
2014 年,世界卫生组织召集了一个小组来解决全球危机:妇女受到身体虐待、言语虐待被羞辱,并且在分娩时得不到照顾。这并不是一个罕见的事件——报告来自 34 countries.
They needed to understand WHY. What drives disrespect and abuse in maternity care?
没有 RCT 可以回答这个问题。你不能将女性随机分配到虐待性护理还是尊重性护理。你不能让接生员失明。你无法用李克特量表来衡量“尊严”。 证据必须是定性的。
Meta-Ethnography
Developed by Noblit & Hare (1988), meta-ethnography translates 跨研究的概念而不是汇总数字。它从一阶(参与者引用)和二阶(作者解释)数据生成新的解释框架(三阶结构)。
argument
What Bohren Found: A Taxonomy of Mistreatment
Hitting, pinching, slapping during labor
Inappropriate touching, non-consensual procedures
Shouting, threats, judgmental comments
Based on HIV status, ethnicity, age, poverty
Neglect, lack of informed consent
Poor communication, dismissiveness
Overcrowding, understaffing, lack of supplies
65研究。 34 个国家。同样的模式在各种语言、文化和系统中重复出现。这不是轶事。这是综合证据。
CERQual:定性证据的置信度
CERQual assesses confidence in qualitative review findings across four components:
Methodological Limitations
贡献研究的质量。
Coherence
数据支持研究结果的程度。
Adequacy
数据的丰富性(不仅仅是数据的数量)研究)。
Relevance
对审查问题背景的适用性。
When Qualitative Evidence Changes Practice
Bohren's synthesis informed the WHO's 2018 Recommendations on Intrapartum Care for a Positive Childbirth Experience. Specific changes grounded in qualitative evidence:
这些建议基于定性证据,现在指导 194 个 WHO 成员国的产妇护理。任何林地都不可能产生它们。没有 I² 统计数据可以揭示它们。
Bohren's Framework of Mistreatment
2015 年的定性综合报告确定了七个领域:身体虐待、性虐待、言语虐待、耻辱和歧视、未能达到专业标准、关系不佳以及卫生系统状况。该框架为世界卫生组织关于产时护理的建议(2018)提供了信息。
没有 p 值可以捕捉分娩期间被打耳光的经历。定性综合表达了数字无法表达的内容。
Decision Tree: When Is Qualitative Synthesis Appropriate?
ROOT: 您的研究问题是关于经验、感知、障碍还是促进因素吗?
YES → 您的问题是关于如何或为什么,而不仅仅是“是否”?
- Yes: Qualitative evidence synthesis (meta-ethnography, thematic synthesis, or framework synthesis)
- No: 考虑混合方法:定量的效果 + 定性的效果机制
NO >→ 您的问题是关于有效性/功效吗?
- Yes: Quantitative meta-analysis
- But: 补充实施障碍的定性审查(CERQual 评估)
Key insight: 最强的系统审查回答两个问题:它有效吗? (定量)以及为什么它有效或失败? (定性)
Module 20 Quiz
Q1. What distinguishes meta-ethnography from quantitative meta-analysis?
Module 20 Complete
“并非所有重要的内容都可以计算在内。并非所有计算在内的内容都重要。”
异质性是一条消息,而不是噪音。
模块 21:多元
异质性是一条消息,而不是噪音。
模块 21:多元
模块 21:多元
🎯 Learning Objectives
- 识别研究中的结果何时相关
- Explain multivariate random-effects models
- Apply robust variance estimation (RVE) for dependent effect sizes
- 了解嵌套数据的三级模型
- Choose between multivariate approaches based on data structure
Cardiovascular trials report
死亡率、MI、中风和更多。
这些结果在患者内部相关。死亡的患者没有 MI 终点。标准荟萃分析独立对待每个结果,忽略依赖性和潜在的重复计算证据。
无人质疑的假设
打开任何标准荟萃分析教科书。这些模型假设每项研究都贡献 one independent effect size. But reality is different.
单个心血管试验报告死亡率、心肌梗塞、中风和血运重建。一项心理治疗研究报告了 3、6 和 12 个月时的抑郁、焦虑和生活质量。
Most analysts either: (a) treat all 120 as independent (inflating precision by a factor of √4), or (b) 选择一个结果并丢弃其余的。 两种方法都是错误的。
依赖性问题
In standard pairwise meta-analysis, each study contributes one effect size. But many studies report multiple outcomes, subgroups, timepoints, or arms—creating dependent 效应大小。忽略这一点会增加精度并扭曲推理。
Robust Variance Estimation
RVE (Hedges, Tipton & Johnson, 2010) uses a sandwich-type 估计器提供有效的标准误差,无论相关效应之间的真实相关性如何。无需了解或估计研究内相关性。最适合 ≥20 个研究。
Small-sample correction: Tipton 和 Pustejovsky (2015) 在聚类数量较小时使用 Satterthwaite 自由度开发了 RVE 的小样本校正 (CR2)。
What Dependence Does to Your Confidence Intervals
如果同一研究的 4 个结果有研究内相关性 ρ = 0.5:
Treating as independent
CI width = X
考虑依赖性
CI width = 1.58X
您的置信区间应为 58% wider。每一项忽略这一点的荟萃分析都会发表错误的精确结果。
RVE (Hedges, Tipton & Johnson, 2010): Uses a “sandwich” variance estimator that produces correct standard errors without needing to know the exact within-study correlation.
Three-Level Models: Making Structure Explicit
Level 1: Sampling Variance
Measurement error within each effect size estimate.
Level 2: Within-Study Variance
单个研究中的结果和时间点各不相同。
Level 3: Between-Study Variance
研究在人群、环境和方法方面彼此不同。
Example: 在抑郁症心理治疗的荟萃分析中(k=50项研究,180个效应)大小), 35% 方差是研究内(不同结果), 65% 方差是研究间(不同疗法、人群)。这种分解揭示了异质性有多大 within vs between studies.
Three-Level Models: Formal Framework
当效应嵌套时(例如,研究内的多个结果,或研究组内的研究), three-level model 将方差划分为:(1) 抽样方差(级别 1)、(2) 研究内方差(级别 2)和 (3) 研究间方差(级别 3)。这在跨级别借用力量的同时保持了正确的推论。
心血管挑战
他汀类药物的荟萃分析可能包括 30 项试验,每项试验报告死亡率、心肌梗死、中风和血运重建。即来自 30 个簇的 120 个效应大小。将它们视为 120 个独立估计会通过与研究内相关性相关的因素来提高精确度。
RVE or multivariate models handle this correctly—producing wider, honest confidence intervals.
Decision Tree: Which Approach for Dependent Effect Sizes?
ROOT: 您的荟萃分析是否对每个研究有多重效应?
YES >→ 您知道(或可以估计)研究内相关性吗?
- Yes: Multivariate random-effects model (most efficient)
- No: RVE with small-sample correction (robust to unknown correlations)
NO → Standard univariate random-effects model
Sub-question: 您的多重效应是来自不同结果、时间点还是
- Different outcomes → Three-level model or RVE with clustering
- Different timepoints → Network of timepoints with temporal correlation
- Different subgroups → Consider if subgroups are meaningful or should be averaged
Module 21 Quiz
Q1. What problem does Robust Variance Estimation (RVE) solve?
Module 21 Complete
“当结果相互纠缠时,假装它们是独立的只是方便的谎言。”
没有出处的数字不是数字。
模块 22:证明
没有出处的数字不是数字。
模块 22:证明
模块 22:证明
🎯 Learning Objectives
- Understand how computational errors propagate through policy
- 定义可再现性并区别于可复制性
- 应用证据哈希和携带证明的数字
- Use reproducibility checklists for meta-analysis
- 认识到预注册和开放数据的作用
A graduate student opened a spreadsheet
并发现紧缩时代是建立在错误之上的。
2010年,莱因哈特和罗格夫声称,债务与 GDP 比率超过 90% 的国家出现了负增长。这影响了整个欧洲的紧缩政策。 2013 年,Thomas Herndon 发现 Excel 中存在一个错误,将 5 个国家/地区排除在平均值之外。修正后的结果:适度正增长,而不是崩溃。
Reproducibility vs Replicability
Reproducibility is the minimum standard。如果其他人无法从您报告的数据中重现您的汇总估计,则无法验证分析。荟萃分析应共享:提取的数据、分析脚本、软件版本和随机种子。
Proof-Carrying Numbers
Every number in a meta-analysis should carry its provenance:它来自哪里、如何转换以及生成它的代码。 Evidence hashing creates a cryptographic fingerprint of inputs so any change (accidental or deliberate) is detectable.
Input Hash
提取数据的 SHA-256 哈希值。如果一个单元格发生变化,哈希值就会发生变化。来源链:数据 → 代码 → 结果 → 哈希。
Interactive: Reproducibility Checklist
勾选每个项目以评估荟萃分析的再现性。您的评论得分如何?
改变经济的 Excel 错误
Reinhart-Rogoff 的“债务时代的增长”在国会证词、欧盟委员会报告和国际货币基金组织政策简报中被引用。 Excel 错误(第 30-34 行被排除在 AVERAGE 公式之外)意味着五个国家/地区(澳大利亚、奥地利、比利时、加拿大和丹麦)完全缺失。
校正后的平均值从 -0.1% 变为 +2.2%。紧缩政策影响了数百万人。再现性不是学术上的完美主义——它是防止灾难的保障。
Remember Module 5?
DECREASE Through the Lens of Reproducibility
Don Poldermans 的 DECREASE 试验因数据捏造而被撤回。如果携带证明的数字存在——散列输入、出处链、经过验证的计算——捏造行为将是可检测的 before 证据进入荟萃分析并改变手术指南。
Module 22 Quiz
Q1. Reinhart-Rogoff错误是什么?
Module 22 Complete
“没有出处的数字不是数字。没有再现性的分析不是证据。”
Certainty must be earned, not assumed.
模块 23:您的第一次元冲刺
Certainty must be earned, not assumed.
模块 23:您的第一次元冲刺
模块 23:您的第一次元冲刺
🎯 Learning Objectives
- 了解 40 天系统审核工作流程
- Map the Seven Principles to real practice phases
- Recognize Definition-of-Done (DoD) gates as quality checkpoints
- Appreciate why structure prevents the failures you've studied
- Graduate ready to conduct (not just understand) meta-analysis
您已经了解了这些故事。
现在您必须走这条路。
您研究的每一个证据逆转的发生都是因为团队 knew 方法,但没有 follow them systematically.
META-SPRINT 框架
具有 5 个阶段门的 40 天结构化工作流程。每个门都是一个完成定义 (DoD) 检查点,在质量得到保证之前,它会阻止您继续前进。
Why 40 days? 足够长以保证严格性,足够短以防止范围蔓延。由于没有强制透明的最后期限,罗格列酮心脏信号被隐藏了多年。
五个阶段门
DoD-A: Protocol Lock (Days 1-3)
PICOS defined, timepoint rules set, model choices pre-specified. No moving target.
DoD-B: Search Lock (Days 6-10)
All databases searched, grey literature checked, PRESS validated. No hidden studies.
DoD-C: Extraction Lock (Days 10-28)
Dual extraction, provenance linked, RoB assessed. No fabricated numbers.
The Five Phase Gates (continued)
DoD-D:分析锁定(第 21-33 天)
Forest plots generated, sensitivity analyses run, heterogeneity explored. No cherry-picking.
DoD-E: Submission Lock (Days 33-40)
GRADE certainty rated, clinical summary written, manuscript finalized. No overconfidence.
Day 34 Freeze: 在第 1 天之后无法添加新研究34. 这可以防止困扰 BMP 脊柱手术荟萃分析的“武器化范围蔓延”,业界不断“寻找”有利的研究。
实践中的七个原则
Every principle you learned maps to a specific phase gate:
红队原则
你自己的团队试图打破你的
每天,两名轮换团队成员会花 12 分钟作为对手检查数据质量。这就是 Boldt 的欺诈行为被发现的方式 - 不是通过友好审查,而是通过怀疑性检查发现不可能的招募率。
CondGO: When Things Go Wrong
What happens when you discover a critical problem mid-sprint?
CondGO = Conditional Go
A bounded rescue protocol. You have exactly 72 hours 仅使用允许的操作来解决问题。如果你无法修复它,你必须停止审查。
📖 文迪雅的教训: GSK在2000年看到了心血管信号,但没有强制的最后期限。他们“观望、等待”了7年。数万人受到伤害。 CondGO 的存在是因为“我们最终会处理它”会害死人。
您以故事开始本课程。
您以准备练习结束本课程。
META-SPRINT 工作流程将您学到的所有内容构建到一个 40 天的系统中,以防止您遇到的故障研究过。
当您准备好进行真正的系统评价时,请打开 META-SPRINT 应用程序。您在此处学到的故事将指导您 — 在每一步中都作为提醒出现。
What does it look like when every principle is followed?
REAL DATA
胆固醇治疗试验者 (CTT) 合作是荟萃分析的黄金标准。他们获得了 来自 170,000 多名参与者的个体患者数据 across 26 statin trials. Pre-specified protocol. IPD from all major trials. Standardized outcomes. Result: statins reduce major vascular events by 21% per mmol/L LDL reduction (RR 0.79, 95% CI 0.77-0.81), regardless of baseline risk. This finding, replicated across 15 年来的 5 次荟萃分析, has prevented an estimated millions of heart attacks and strokes worldwide.
Capstone Quiz
1。 META-SPRINT 中第 34 天“硬冻结”的目的是什么?
2. The CondGO protocol gives teams how long to fix critical problems?
3. Red-team adversarial QA caught Joachim Boldt's fraud by noticing:
您学到的故事不是历史。
它们是保护您未来工作的警告。
当您进行第一次荟萃分析时,
remember CAST before you trust a signal,
remember Poldermans before you skip provenance,
在忽略漏斗之前记住瑞波西汀。
您现在已经准备好了。遵循结构。谦虚地去吧。遵循七项原则。
并非每个信号都是真实的。
模块 24:期末考试
Certainty must be earned, not assumed.
Final Examination
Final Exam: Part 1 of 2
测试您对荟萃分析原理的掌握程度。每个问题都涉及课程中的一个核心概念。
Q1. 研究人员想要研究“运动对健康的影响”。此研究问题的主要问题是什么?
Q2. 漏斗图显示明显的不对称性,左下区域缺少研究。这表明什么?
Q3. 荟萃分析报告 I² = 85% 和 τ² = 0.42。最合适的解释是什么?
Q4. 在 GRADE 中,随机对照试验证据的起始确定性是什么?
Q5. In RoB 2.0, which domain assesses whether outcome assessors knew the treatment allocation?
Final Exam: Part 2 of 2
Q6. CAST试验表明,抗心律失常药物尽管抑制心律失常,但仍增加了死亡率。这是一个示例:
Q7. When should a random-effects model be preferred over a fixed-effect model?
Q8. According to ICEMAN criteria, which makes a subgroup analysis MORE credible?
Q9. What assumption must be checked in network meta-analysis to ensure valid indirect comparisons?
Q10. 在试验序贯分析(TSA)中,跨越无效边界表明什么?
Part 1 Complete — continue to Part 2 (Advanced Modules)
Final Exam: Part 2 of 2 (Advanced)
Questions 11–25 cover Modules 13–22 (Bayesian, NMA, IPD, Dose-Response, Fragility, Equity, AI, Qualitative, Multivariate, Reproducibility).
Q11. 在贝叶斯荟萃分析中,当您在许多研究中使用模糊先验时会发生什么?
Q12. 在 Cipriani 的抗抑郁药 NMA 中,为什么没有单一药物被宣布为“获胜者”?
Q13. 为什么应该您从来没有像一次大型试验那样汇集 IPD?
Q14. What caused the alcohol "J-curve" to disappear in Stockwell's reanalysis?
Q15. 在奥司他韦传奇中,Cochrane 在访问未发表的临床研究报告时发现了什么?
Q16. 美国高血压患者中有多少百分比不符合SPRINT试验的资格?
Q17. Why is AI considered an "augmenter" rather than a "replacer" in systematic reviews?
Q18. What does the "adequacy" component of CERQual assess?
Q19. A meta-analysis includes 30 statin trials, each reporting 4 correlated outcomes (120 effect sizes). Which approach is correct?
Q20. 在Reinhart-Rogoff误差中,高债务的修正平均增长率是多少国家?
Passing Score: 15/20 across both parts
返回相关模块查看任何遗漏的问题。每个问题都测试一个核心概念。
并非每个信号都是真实的。
方法保护患者免受我们的信任。
Congratulations
您已完成证据逆转:荟萃分析课程。
愿您的综合以真理为指导,您的智慧汇集,
并以谦逊的态度得出结论。
七原则:
“并非每个信号都是真实的。”
“方法保护患者免受我们的信任。”
"What was hidden in plain sight?"
“没有来源的数字不是一个“
“异质性是一条消息,而不是噪音。”
“缺乏证据并不等于不存在。”
"Certainty must be earned, not assumed."
“引导我们走上正路...”