证据之约：撰写元分析论文

你有没有听说过隐藏的试验，
被埋藏的数据，
那些被隐藏的论文告诉 只说了一半真相?

达菲丑闻

COCHRANE COLLABORATION, 2009-2014

Governments stockpiled $9 billion worth of Tamiflu to fight pandemic flu.

但是当Cochrane研究人员试图验证该药物的益处时，他们发现 60% of trial data had never been published.

经过5年的战斗，他们得出了隐藏数据的结论。更改： Tamiflu shortened symptoms by less than a day 并没有预防并发症。

花费了 90 亿美元用于从未完全披露的证据。

Jefferson T et al. Cochrane Database Syst Rev. 2014;4:CD008965

Why We Write Meta-Analyses

合成的目的

Individual Studies

↓

Problem?

Small samplesLow power

Conflicting resultsWhich to believe?

Publication biasMissing negatives

↓

Meta-AnalysisSystematic synthesis

↓

More precise estimate + Bias detection

达菲透明度运动

2009-2014 | COCHRANE COLLABORATION vs. ROCHE

For years, governments worldwide stockpiled Tamiflu (oseltamivir) at a cost of billions of dollars，基于制造商声称它可以预防流感并发症

当 Cochrane 评审员要求提供完整的试验数据来验证这些说法时， 罗氏以“保密”为由拒绝了 5 年。

在不懈的压力下，临床研究报告终于在 2014 年发布。情况发生了巨大的变化： 达菲将症状缩短了不到一天，并且没有证据表明可以预防住院或严重并发症。

THE LESSON

荟萃分析的效果取决于它可以获取的隐藏试验的数据。无效的治疗看起来有效，并且浪费了数十亿美元的资源。

“公司知道，
监管机构也知道，
但已发表的论文没有说明这一点—
数十亿美元被花费在半真半假。”

这就是我们编写荟萃分析的原因 - 寻找全部真相。

如果您希望您的工作 trusted,
您必须与读者签订契约。

该契约有一个名称：
PRISMA.

PRISMA 2020

系统审查和元分析的首选报告项目

27

Checklist items

2020

Updated version

50K+

Citations

THE COVENANT

PRISMA 不是官僚机构。 向您的读者承诺 您已透明、完整地完成工作。

七个部分

1

Title

Identify as systematic review, meta-analysis, or both

2

Abstract

整个审核的结构化摘要

3

Introduction

PICO 的基本原理和目标

4

Methods

Protocol, search, selection, data, bias, synthesis

5

Results

Flow diagram, characteristics, risk of bias, synthesis results

6

Discussion

Summary, limitations, interpretation, implications

7

Other

Registration, funding, conflicts of interest

PRISMA 革命

2009-PRESENT | TRANSFORMING SYSTEMATIC REVIEW REPORTING

在 PRISMA (2009) 之前，系统评价报告是混乱的。有些评论根本没有报告他们的搜索策略。其他人则忽略了偏见风险评估。许多人未能解释为什么研究被排除。 Readers couldn't judge quality—they had to trust blindly.

PRISMA 的 27 项清单改变了一切。它要求作者记录每一步：完整的搜索策略、选择标准、提取方法和综合决策。

Today, 超过 10,000 种期刊认可 PRISMA。曾经卓越的透明度成为了预期的标准。

THE LESSON

一个简单的清单改变了整个领域。透明的报告从例外变成了常态——证明标准很重要。

“PRISMA 是作者和读者之间的契约：
我将向您展示一切 —
how I searched, what I found, what I excluded, why.
所以您可以评判我的工作，并信任（或质疑）我的作品结论。”

您没有看到审阅者
谁在看到数据后改变了结果，
谁移动了球门柱直到结果 looked right?

撤回元分析

MULTIPLE JOURNALS, 2010-2023

研究人员发现 many retracted meta-analyses had no pre-registered protocol.

Without a protocol, reviewers could:
• Change inclusion criteria after seeing results
• Switch primary outcomes to show significance
•添加或删除研究以改变结论

协议是您的 pre-commitment device- 它可以防止您欺骗自己。

Defined outcome switching: PROSPERO registration prevents bias

Protocol Registration Decision Tree

在哪里注册您的方案

新系统回顾

↓

Type of Review?

Health/Medical

PROSPEROprospero.york.ac.uk

Any Field

OSFosf.io/registries

Cochrane

Cochrane LibraryIntegrated protocol

↓

Registration ID in PaperCite in Methods

What the Protocol Must Contain

Essential Protocol Elements

1. 研究问题（PICO格式）

2. Eligibility criteria (inclusion/exclusion)

3. Information sources and search strategy

4. 研究选择过程

5. 数据提取项目

6. 偏倚评估风险工具

7. 主要和次要结果

8. 综合方法（荟萃分析计划）

9. 亚组和敏感性分析

“在看到数据之前写下方案。
Lock it in a public registry.
Then follow it—or explain why you deviated.
这就是您证明自己做到的方式”

标题是你做出的第一个承诺。

它必须告诉读者：
你学习了什么，你是如何学习的

标题剖析

PRISMA Title Requirements

Title Must Include

↓

Population研究对象

Intervention做了什么

Outcome测量了什么

↓

>+“系统审查”或“荟萃分析”

Good vs. Bad Titles

❌ BAD TITLE

“糖尿病治疗回顾”

Problems: No population specified, no intervention, no outcome, doesn't say systematic review

✓ GOOD TITLE

“SGLT2 抑制剂对 2 型糖尿病成人心血管死亡率的功效：系统回顾和荟萃分析”

人群、干预、结果和研究类型全部明确

“标题是你对读者说的第一句话。
使其完整。使其诚实。
Tell them exactly what they will find within."

Most readers will only read your abstract.

If the abstract lies, or omits, or misleads—
most readers will never know.

旋转问题

BOUTRON ET AL., 2010

Researchers analyzed 72 RCTs with non-significant primary outcomes.

他们发现 40% of abstracts contained "spin"— 侧重于次要结果、亚组或组内变化的报告，以使结果看起来比实际情况更有利。

摘要讲述的故事与数据不同。

Boutron I et al. JAMA. 2010;303:2058-2064

Structured Abstract Elements

PRISMA Abstract Checklist

□ 背景和目标

□ Eligibility criteria

□ Information sources

□ Risk of bias assessment

□ Synthesis methods

□ 结果（# 项研究，# 名参与者，效果估计CI)

□ Limitations

□ 结论和含义

□ Registration number

当摘要讲述的故事与论文本身不同时会发生什么？

REAL DATA

Pitkin 等人（1999，BMJ）检查了六种主要期刊的结构化摘要，发现 18-68% 摘要包含与完整内容不一致的数据文章的缺陷包括数值错误和报告结果不支持的结论。

误导性摘要：Pitkin 1999

您的荟萃分析发现不显着的主要结果（RR 0.92，95% CI 0.78-1.09）。摘要？

路径 A：旋转摘要

Emphasize a significant secondary outcome and use language like "trend toward benefit"

↓

只看到摘要的读者会对治疗产生误导性的积极印象

OUTCOME: Misleading clinical decisions

PATH B: Report Faithfully

清楚地陈述不显着的主要结果，并注明探索性的次要结果

↓

需要详细信息的读者可以阅读完整内容；论文

结果：保留证据完整性

THE REVELATION

大多数读者永远不会忽略摘要。如果摘要产生误导，全文的诚实性无法消除损害。

“不要旋转。不要隐藏。
如果主要结果为空，请这么说。
The abstract must be a faithful mirror—
不是一个讨人喜欢的肖像。”

A vague question yields vague answers.

在搜索之前，在写作之前—
您必须确切地知道 您要寻找的内容.

PICO框架

构建研究问题

Research Question

↓

PPopulation

Who?

IIntervention

What treatment?

CComparator

Vs. what?

OOutcome

What measured?

PICO Example

转换一个模糊的问题

Vague: "Does exercise help depression?"

PICO:
• P: Adults diagnosed with major depressive disorder
• I: Supervised aerobic exercise (≥3x/week for ≥8 weeks)
• C: Usual care or waitlist control
• O: Change in depression score (HAM-D or BDI)

现在您确切地知道要搜索什么。

“精确定义您的问题。
患者是谁？治疗方法是什么？
比较器是什么？您将测量什么？
PICO 是旅程之前的地图。”

您没有听说过荟萃分析
that searched only one database,
错过了一半的证据，
并得出了 wrong conclusion?

The Search Strategy Decision Tree

Where to Search

Comprehensive Search

↓

Minimum Databases

MEDLINEPubMed

EmbaseEuropean focus

CENTRALCochrane trials

↓

Plus Additional Sources

Trial registriesClinicalTrials.gov

Grey literatureTheses, reports

Reference listsBackward citation

记录搜索

WHAT TO REPORT

• Full search strategy for at least one database (appendix)
• Date of search 对于每个数据库
• Any limits (language, date, publication type)
• Hand-searching (journals, conference proceedings)
• 与作者联系 对于未发表的数据

再现性测试

Another researcher should be able to 准确复制您的搜索 并找到相同数量的记录。

The Cochrane Search Strategy Discovery

2003 |科克伦方法论回顾

Cochrane researchers asked a simple question: What would happen if systematic reviewers only searched MEDLINE?

答案令人震惊。他们会有 错过了 30% 的纳入研究——包括一些完全改变荟萃分析结论的内容。

One striking example: an anti-depressant meta-analysis showed benefit when based on MEDLINE alone, but no benefit when all sources were included。缺失的研究是在 EMBASE 和 PsycINFO 等专业数据库中索引的较小的阴性试验。

THE LESSON

单一数据库搜索可能会系统性地错过阴性试验。 MEDLINE 中未包含的研究可能正是改变您结论的研究。

"Search wide. Search deep.
Document every database, every date, every term.
你错过的证据可能是最重要的证据。”

您必须从数千条记录中进行选择。

But choose by what rule?
谁会检查你的选择？

PRISMA 流程图

PRISMA 2020 Flow Diagram

IDENTIFICATION

n = 3,847

来自数据库的记录

↓

Duplicates removed (n = 892)

SCREENING

n = 2,955

Titles/abstracts screened

↓

Excluded (n = 2,680)

ELIGIBILITY

n = 275

Full-text assessed

↓

有理由排除 (n = 247)

INCLUDED

n = 28

Studies in synthesis

Selection Process Decision Tree

Who Selects? How?

Study Selection

↓

Two independent reviewersGold standard

↓

Disagreement?

ConsensusDiscussion

Third reviewerArbiter

One reviewer onlyAcknowledge limitation

REPORT AGREEMENT

计算并报告 inter-rater agreement (kappa statistic). Low agreement suggests unclear criteria.

一次试验的事后亚组分析能否在十年内重塑整个领域？

REAL DATA

妇女健康倡议（WHI，2002）发现 HRT increased cardiovascular risk overall. But post-hoc subgroup analysis suggested women aged <60 或者10年内绝经的女性可能受益，而老年女性则受到伤害。这种“时间假说”引发了多年的争论和进一步的研究。

The HRT Timing Hypothesis: WHI 2002

您对 HRT 的荟萃分析显示总体有害，但探索性亚组表明对年轻女性有益。你怎么写这个？

路径 A：夸大子群

将年龄亚组结果作为主要发现作为标题

↓

临床医生根据探索性、动力不足的亚组开出 HRT 处方；该发现可能无法重复

OUTCOME: Premature clinical change

PATH B: Report Honestly

Present overall result as primary; label subgroup as exploratory and pre-specified or post-hoc

↓

Readers understand the hypothesis needs confirmation; future trials can be designed to test it

OUTCOME: Responsible hypothesis generation

THE REVELATION

亚组分析产生假设，而不是结论。始终将它们标记为探索性并首先报告总体结果。

"Every exclusion must have a reason.
Every reason must be documented.
Two pairs of eyes are better than one—
一个人错过的东西，另一个人可能会抓住。”

你没看过荟萃分析吗
将好的研究与坏的研究汇集在一起，
并称为平均值 truth?

忽视偏见的危险

抗抑郁药丑闻

Turner et al. (2008) obtained FDA data on 74 antidepressant trials.

In the published literature：94% 的试验结果呈阳性。

In the FDA database：只有 51% 呈阳性。

已发表的荟萃分析汇集了选择性报告的数据。效应大小为 inflated by 32%.

Turner EH et al. N Engl J Med. 2008;358:252-260

偏见风险评估工具

使用哪种工具？

Study Design

↓

RCTs

RoB 2Cochrane tool

Non-randomized

ROBINS-IInterventions

DTA studies

QUADAS-2Diagnostic

Observational

NOSNewcastle-Ottawa

RoB 2 Domains

Cochrane Risk of Bias 2.0 for RCTs

D1 Randomization process

D2 与预期干预措施的偏差

D3 结果数据缺失

D4 结果测量

D5 报告结果的选择

JUDGMENT OPTIONS

Each domain: Low risk / Some concerns / High risk

“有偏差的荟萃分析研究
yields a biased conclusion—
具有更窄的置信区间。
您已经使谎言更加精确。“

从每项研究中，您必须提取

提取错误，以及您的整个分析
is built on sand.

数据提取表格

基本数据项

1 Study identifiers (author, year, country)

2 研究设计和设置

3 Participant characteristics (n, age, sex, severity)

4 Intervention details (dose, duration, delivery)

5 Comparator details

6 结果定义和测量

7 Results (means, SDs, events, sample sizes)

8 Follow-up duration and loss to follow-up

9 Funding source and conflicts of interest

Extraction Decision Tree

处理缺失数据

未报告数据

↓

What to Do?

First

Contact authors数据电子邮件

If no response

Calculate/imputeDocument method

If impossible

Exclude from MAInclude in narrative

罗格列酮数据提取错误

2007 | NEW ENGLAND JOURNAL OF MEDICINE

Nissen and Wolski's 2007 meta-analysis of rosiglitazone (Avandia) found a 43% increased risk of heart attack。这一发现引发了 FDA 的警告，并导致全球范围内的处方崩溃。

But later scrutiny revealed complications. Some effect estimates had been extracted from secondary publications rather than primary trial reports。事件计数方式的微小差异（从不同来源提取）显着改变了结果。

荟萃分析具有影响力并且基本正确，但争议强调了如何 small extraction decisions can have billion-dollar consequences. Merck's competing drug gained market share; GSK faced massive litigation.

THE LESSON

始终从主要来源提取。记录每一个选择。提取数据中的微小差异可能会改变监管决策和市场命运。

"Extract in duplicate. Check each number.
One digit wrong can change the conclusion.
提取表格就是您的分类账 —
keep it meticulous, keep it true."

效果大小是您的核心荟萃分析。

选择错误的衡量标准，
您的汇总估计值将是 meaningless.

Effect Measure Decision Tree

Choosing the Right Effect Size

Outcome Type

↓

Binary

RR, OR, RDEvents/no events

Continuous

Same scale?

YesMD (mean diff)

NoSMD (Hedge's g)

Time-to-event

HRHazard ratio

Common Effect Measures

RR

Risk Ratio
Multiplicative

OR

Odds Ratio
Case-control

MD

Mean Diff
Same units

SMD

Std Mean Diff
Different scales

THE PRINCIPLE

效果衡量标准必须 跨研究具有可比性. If studies used different scales, standardize.

Can a trial that transforms global practice still have serious limitations?

REAL DATA

RECOVERY 试验 (2020) 证明地塞米松降低了需要吸氧的住院 COVID-19 患者的 28 天死亡率： RR 0.83, 95% CI 0.75-0.93。然而，该试验是开放标签（无盲法），主要在英国医院进行，对照组接受常规护理（各不相同）。

RECOVERY 试验：2020

您的荟萃分析将 RECOVERY 视为主要研究。您如何处理具有里程碑意义的试验的局限性？

PATH A: Minimize Limitations

淡化开放标签设计和地理集中；重点关注显着的死亡率益处

↓

Readers cannot judge generalizability to other settings; potential detection bias is obscured

结果：证据评估不完整

PATH B: Honest Limitations

承认开放标签设计和地理限制，同时明确说明死亡率益处

↓

读者了解研究结果的强度以及仍然存在的不确定性

OUTCOME: Trustworthy, balanced reporting

THE REVELATION

即使是开创性的试验也有局限性。承认它们并不会削弱研究结果；它建立了读者的信任并指导未来的研究。

“选择适合数据的度量。
常见结果的风险比，罕见结果的优势比。
Standardize when scales differ.
错误的做法是把苹果和橙子放在一起。”

你没看过森林图
where studies pointed in opposite directions,
yet the diamond declared a single truth?

Fixed vs. Random Effects

Which Model to Use?

Meta-Analysis Model

↓

关于真实的假设吗效果？

One true effect

Fixed Effect所有研究估计相同的 θ

↓

Rarely appropriate非常相似的研究

Effects vary

Random EffectsDistribution of θᵢ

↓

Usually preferredMore conservative

何时不合并

Do Not Meta-Analyze If...

✗ 研究过于异质（临床或方法）

✗ 结果已定义不同

✗ 人群根本不同

✗ Risk of bias is too high across studies

✗ Publication bias is severe

THE WISDOM

Sometimes the most honest conclusion is: “这些研究不应该合并。”

当对 Cochrane 综述的方法论批评升级为组织危机时会发生什么？

REAL DATA

2018 年，Peter Gøtzsche 及其同事发表了对 Cochrane HPV 疫苗审查的批评，认为该审查排除了关键试验并使用了不适当的纳入标准。该审查纳入了涉及超过 73,000 名女性的 26 项研究。这场争议最终导致 Gøtzsche 被 Cochrane 开除。

Cochrane HPV 争议：2018

您收到了对您已发表的综合报告的方法论批评，认为您应该纳入其他研究？

路径 A：驳回。批评

捍卫原始方法，但不涉及所提出的具体方法论观点

↓

公众信任受到侵蚀；争议变得个人化而非科学化；证据基础没有得到改善

OUTCOME: Polarization and lost credibility

PATH B: Engage Transparently

结合建议的研究进行敏感性分析；发布透明的回应，以表明结论是否发生变化

↓

证据得到加强；方法论话语推动了该领域的发展；维持了信任

OUTCOME: Science self-corrects publicly

THE REVELATION

科学的进步是通过方法论批评来加强评论和该领域的。

“不要为了汇集而汇集。
不相容的荟萃分析。研究
不是综合，而是混淆。
Know when to say: these cannot be combined."

当研究存在分歧时，
分歧本身就是 data.

Do not hide it. Explain it.

Heterogeneity Measures

Q

Cochran's Q
Significance test

I²

Inconsistency
% variation

τ²

Tau-squared
Between-study var

PI

Prediction interval
Future studies

Investigating Heterogeneity

When I² > 50%

High Heterogeneity

↓

Investigation Methods

Subgroup analysisPre-specified

Meta-regressionIf ≥10 studies

Sensitivity analysisExclude outliers

↓

Report unexplained heterogeneityLimitation

What if a meta-analysis of small positive trials is overturned by a single mega-trial?

REAL DATA

到20世纪90年代初，几项小型试验表明静脉注射镁可降低急性心肌梗塞后的死亡率。荟萃分析（Teo 等人，1991）汇总了这些并发现了一个显着的好处： OR 0.44, 95% CI 0.27-0.71。然后 ISIS-4 (1995)，一个使用 58,050 patients的大型试验，没有发现任何好处。小规模研究的影响和异质性已被忽略。

镁争议：1991-1995

您对小规模试验的荟萃分析显示出很高的异质性（I²高于50%），但汇总估计值是显着的。您如何介绍这一点？

路径 A：埋葬异质性

Report the significant pooled estimate prominently; mention I² only in passing

↓

临床医生采用治疗方法；未来的大型试验可能会与荟萃分析相矛盾，从而削弱对该方法的信任

OUTCOME: Premature guideline changes

PATH B: Investigate Transparently

突出异质性；调查来源（研究规模、质量）；请注意，小规模研究的影响可能会夸大估计值

↓

读者理解其中的不确定性；建议要求在改变实践之前进行明确的大型试验

OUTCOME: Evidence-appropriate caution

THE REVELATION

异质性是一个警告信号，而不是脚注。小规模研究的影响可能会产生错误的令人放心的汇总估计，而单个大型试验就可以推翻。

"I-squared is not just a number to report.
这是一个问题：为什么这些研究不同意？
Investigate. Explain. Or acknowledge ignorance."

你有没有听说过文件抽屉，
负面研究进入 die,
leaving only the positive survivors
to tell a distorted story?

万络灾难

MERCK, 2004

Vioxx (rofecoxib) was a blockbuster painkiller earning $2.5 billion/year.

Internal company documents showed Merck knew of cardiovascular risks but 压制不利数据 and published only favorable analyses.

A meta-analysis using all available data revealed a 2-fold increased risk of heart attack.

Vioxx was withdrawn. It had caused an estimated 88,000-140,000 excess heart attacks.

Topol EJ. N Engl J Med. 2004;351:1707-1709

Detecting Publication Bias

Assessment Methods

Publication Bias Assessment

↓

Funnel plotVisual inspection

Egger's testStatistical asymmetry

修剪和填充Impute missing

↓

Requires ≥10 studiesLow power otherwise

Preventing Bias: The AllTrials Campaign

ALLTRIALS.NET

"All trials registered. All results reported."

• Search trial registries (ClinicalTrials.gov, WHO ICTRP)
•联系公司获取未发表的信息数据
• 在您的评论中引用注册号
• 报告您的分析中缺少哪些注册试验

“文件抽屉不是空的。
它保存着公司隐藏的研究，
期刊记录的结果被拒绝了。
你的任务就是打开那个抽屉——或者说你打不开。”

森林图是你的脸面元分析。

它向读者显示 everything:
each study, each weight, each confidence interval,
和最终的汇总估计。

阅读森林图

森林图的元素

Forest Plot

↓

Study namesLeft column

SquaresPoint estimates

Lines95% CI

DiamondPooled estimate

↓

方形大小 = 研究权重Larger = more precise

Forest Plot Checklist

What to Include

□ 研究标识符（作者、年份）

□ Sample size per arm

□ 95% CI 的效果估计

□ Weight (% contribution)

□ Line of no effect (RR=1 or MD=0)

□ 95% CI 的汇总估计

□ Heterogeneity statistics (I², τ², Q)

□ Test for overall effect (Z, p-value)

改变的 Vioxx 数字一切

2004 |批准试验和撤回市场

多年来，Vioxx（罗非昔布）心血管安全性的森林图显示出令人放心的模式。早期试验的点估计集中在无效线周围。钻石表明该药物是安全的。

然后是 APPROVe 试验。当其数据添加到森林图中时， 图片发生了巨大变化。 APPROVe 的大方块将汇集的钻石最终拉向了伤害。视觉效果是明确无误的。

那片森林 ended Vioxx。默克公司自愿撤回该药物。随后的诉讼使公司付出了代价 $4.85 billion in settlements。数千名患者曾遭受心脏病发作，而早期的小型试验却显示出不明确的结果。

THE LESSON

一项实施良好、动力充足的试验可以改变整个汇总估计。森林图讲述了证据如何积累的故事，有时，它如何逆转进程。

误导性森林图

您正在为荟萃分析设计一个森林图。轴尺度和研究顺序可以改变视觉印象。如何进行？

路径 A：影响力设计

Use a compressed axis scale to make effect sizes look larger; order studies to build a visual narrative

↓

读者对影响程度形成夸大的印象；该图成为一种宣传工具，而不是数据显示

结果：证据的视觉扭曲

路径B：清晰设计

Use standard axis scaling; order studies by year or alphabetically; include all standard elements (weights, CIs, I²)

↓

读者可以做出自己的判断；该图作为透明的数据可视化

OUTCOME: Honest visual communication

"The forest plot hides nothing.
Every study visible. Every weight transparent.
让读者看到你所看到的——
并自行判断。”

综合估计是不够的。

您还必须告诉读者：
How confident should they be in this result?

GRADE Certainty Assessment

对证据进行评级

Start: RCTs = High, Obs = Low

↓

Reasons to Downgrade?

Risk of bias-1 or -2

Inconsistency-1 or -2

Indirectness-1 or -2

Imprecision-1 or -2

Pub. bias-1 or -2

GRADE Certainty Levels

⊕⊕⊕⊕

HIGH
Very confident

⊕⊕⊕◯

MODERATE
Likely close

⊕⊕◯◯

LOW
May differ

⊕◯◯◯

VERY LOW
Uncertain

What happens when a GRADE assessment of "low certainty" collides with a public health emergency?

REAL DATA

2023 年 Cochrane 对减少呼吸道病毒传播的物理干预措施的审查（Jefferson 等人）发现，在社区环境中佩戴口罩的证据是 low certainty 每个等级，具有广泛的置信区间。该评论被广泛报道为证明“口罩不起作用”，尽管作者表示证据不足以在任一方向得出明确的结论。

Cochrane 面罩评论：2023

您对政治敏感主题的系统评价获得“低确定性”的 GRADE 评级。你如何传达这一点？

路径A：软化评级

淡化 GRADE 评估以避免政治争议；强调点估计而不是确定性水平

↓

审查失去了方法论的可信度； GRADE 变得被视为可选而不是严格的

OUTCOME: Compromised methodology

PATH B: Report Faithfully

如实报告 GRADE 等级；清楚地解释“低确定性”的含义（而不是含义）；区分缺席证据和缺席证据

↓

公众可能会产生误解，但科学记录是准确的；未来的研究方向变得清晰

OUTCOME: Methodological integrity

THE REVELATION

“低确定性”并不意味着“没有效果”。必须如实报告 GRADE 评级，并明确解释其含义，特别是对于政治性话题。

“效应大小是什么。
GRADE certainty is the how sure.
Report both—or the reader cannot judge
how much to trust your conclusion."

讨论是您的地方 interpret.

不旋转。不要夸大其词。
But to explain what your findings mean—
以及他们做什么 not mean.

Discussion Structure

1

Summary of Findings

重申主要结果和确定性评级

2

与现有文献的比较

How do your findings relate to prior reviews?

3

优点和局限性

纳入研究的综述和

4

对实践的影响

What should clinicians/policymakers do?

5

对研究的影响

还需要哪些研究？

Common Mistakes in Discussion

What NOT to Do

✗ 夸大数据之外的结论

✗ Ignoring limitations

✗ Treating statistical significance as clinical importance

✗ Failing to address heterogeneity

✗ 根据观察数据做出因果断言

如果有史以来被引用最多的方法论论文警告大多数研究结果都是错误的怎么办？

REAL DATA

John Ioannidis 2005 年在公共科学图书馆医学 (PLoS Medicine) 上发表的论文“为什么大多数发表的研究结果都是错误的”已被引用超过 10,000 times。他利用数学模型认为，研究结果正确的概率取决于研究功效、偏见和测试关系的数量。对于许多研究设计来说，研究后获得真实结果的概率可能低于 50%。

约安尼迪斯警钟：2005 年

您的荟萃分析具有统计显着性结果，但纳入的研究规模较小、异质性高，而且许多研究存在较高的偏倚风险。你如何写讨论？

路径A：过度宣扬发现的结果

以显着的汇总估计为主导；尽量减少限制；提出强有力的实践建议

↓

该发现过早地纳入指南；当复制失败时，整个荟萃分析方法都会受到指责

OUTCOME: Eroded trust in evidence synthesis

路径 B：校准解释

在研究质量、异质性和确定性的背景下讨论结果；将推荐强度与证据强度相匹配

↓

Readers understand the degree of confidence warranted; future research priorities become clear

OUTCOME: Proportionate, trustworthy conclusions

THE REVELATION

讨论必须调整热情以证明质量。薄弱证据的强烈主张损害了整个领域的可信度。

“讨论不是为了宣传。
这是为了诚实的解释。
说出证据表明的内容。
Admit what it does not show."

您没有听说过韦克菲尔德论文
where conflicts of interest were hidden,
其中的数据捏造的，
数以百万计的儿童 unvaccinated?

MMR-自闭症欺诈

THE LANCET, 1998-2010

Andrew Wakefield published a study linking MMR vaccine to autism.

他没有透露他是 paid £435,643 by lawyers seeking to sue vaccine manufacturers.

他没有透露他 filed a patent for a competing single-dose measles vaccine.

该研究最终被撤回。但损害已经造成： vaccination rates plummeted，麻疹疫情卷土重来。

Deer B. BMJ. 2011;342:c5347

Transparency Checklist

What to Declare

□ Protocol registration number

□ Funding sources (all)

□ 资助者在审查中的作用

□ Conflicts of interest (all authors)

□ 数据可用性声明

□ 偏离协议（及其原因）

□ Author contributions

AllTrials倡议

2013-PRESENT | A GRASSROOTS TRANSPARENCY MOVEMENT

2013 年，Ben Goldacre 和 Cochrane Collaboration 的同事在发现一个令人不安的事实后启动了 AllTrials： approximately half of all clinical trials were never published。缺失的试验中大部分是那些结果为负面或不方便的试验。

该活动聚集了 超过 90,000 个个人签署者 和 700 多个组织，要求注册所有过去和未来的试验，并报告完整的方法和结果。

影响是革命性的。 欧盟现在要求试验注册和结果。报告. The FDA strengthened its own requirements. Journals began demanding prospective registration. What started as advocacy became global policy.

THE LESSON

草根运动改变了国际法规——证明透明度倡导者可以重塑证据生态系统。

羟氯喹预印本级联：2020年

现在是 2020 年初。您的团队已经有了关于 COVID-19 治疗的初步荟萃分析结果。 Gautret 预印本（非随机，42 名患者）已经病毒式传播。Surgisphere 丑闻很快就会在主要期刊上显示捏造的数据。您是否急于预印本或等待？

路径 A：预印本速度

立即发布到 medRxiv 以影响政策

↓

如果纳入的研究存在伪造数据或有缺陷的方法，您的荟萃分析会放大错误

OUTCOME: Accelerated misinformation

PATH B: Verify, Then Publish

严格评估研究质量；在发布之前进行快速同行评审

↓

发表较慢，但分析是可靠的；当有缺陷的研究被撤回时，结论仍然存在

结果：持久，值得信赖的证据

“透明度不是可选的。
申报您的资金。声明您的冲突。
The reader has a right to know
谁为此工作付费 - 以及原因。”

提交之前

最终清单

PRISMA 2020 Final Check

Have You...

□ Completed all 27 PRISMA checklist items?

□ Included the PRISMA flow diagram?

□ Provided full search strategy in appendix?

□ 列出了排除的研究并说明原因？

□ 报告了每项研究的偏倚风险？

□ Provided forest plot(s)?

□ 评估了发表偏倚（如果≥10项研究）？

□ 证据质量分级 (GRADE)？

□ Declared all conflicts of interest?

□ Cited protocol registration?

Supplementary Materials

WHAT TO INCLUDE

• Full search strategies 所有数据库
• 排除的研究列表 with reasons
• 数据提取表格 （空白和填写）
• Risk of bias details 每个研究
• Additional forest plots (subgroups, sensitivity)
• Funnel plot 和统计测试
• GRADE证据简介 tables

欺诈性论文可以在同行评审、编辑审查和公众质疑中存活多久？

REAL DATA

Andrew Wakefield's 1998 Lancet paper linking MMR vaccine to autism took 12 years 完全撤回（2010）。在此期间，记者布莱恩·迪尔 (Brian Deer) 揭露了财务冲突、道德违规和数据操纵等问题。多项大型研究（包括丹麦队列 over 650,000 children) found no association, yet the original paper's influence persisted.

韦克菲尔德撤回：1998-2010

在同行评审期间，审稿人对您的荟萃分析中包含的一项研究提出了严重担忧，理由是数据不一致。您如何回应？

路径 A：转移担忧

驳回审稿人过于谨慎的担忧；保留该研究而不进行进一步调查

↓

如果该研究后来被撤回，您的荟萃分析可能需要撤回

结果：污染的证据合成

PATH B: Investigate Thoroughly

联系研究作者对原始数据进行敏感性分析，排除有疑问的研究；透明地披露问题

↓

您的荟萃分析对于纳入或排除可疑研究是稳健的；

结果：有弹性、自我纠正的审查

THE REVELATION

同行评审是您在发表之前的最后辩护。

“你已经收集了证据。
你已经公平地权衡了它。
你已经清楚地写下了它。

现在提交你的工作 -
让真相被发现，并再次发现。”

References

Key Sources

Page MJ et al. BMJ. 2021;372:n71. [PRISMA 2020]
Jefferson T et al. Cochrane 2014;4:CD008965. [Tamiflu]
Turner EH et al. N Engl J Med. 2008;358:252-260. [Antidepressants]
Boutron I et al. JAMA. 2010;303:2058-2064. [Spin]
Topol EJ. N Engl J Med. 2004;351:1707-1709. [Vioxx]
Deer B. BMJ. 2011;342:c5347. [Wakefield]
Sterne JAC et al. BMJ. 2019;366:l4898. [RoB 2]
Higgins JPT et al. Cochrane Handbook. 2023.
Schünemann HJ et al. GRADE Handbook. 2013.
Ioannidis JPA. PLoS Med. 2005;2:e124. [Why most research is false]

What percentage of antidepressant trials appeared positive in published literature vs. FDA data?

Published 51%, FDA 94%

Published 94%, FDA 51%

Both about 75%

Published 80%, FDA 60%

在进行系统评价之前注册方案的目的是什么？

To get funding

To claim priority

To prevent outcome switching and data-driven decisions

为了使评价可发表

什么时候不应该在荟萃分析中汇集研究？

当少于 10 个时研究

当研究在临床或方法上过于异质时

当效果不具有统计显着性时

当研究来自不同国家

✔

Course Complete

“您现在知道证据契约：
在您之前注册搜索。
Search comprehensively. Select transparently.
Extract carefully. Assess bias honestly.
明智地池化，或者根本不池化。
Write so that truth may be found,
并被追随者再次发现。”

如果没有执行工具，这些方法就毫无意义。

哪些软件将进行您的分析
from protocol to forest plot?

Software Decision Tree

选择您的工具

Meta-Analysis Software

↓

Your Context?

Cochrane Review

RevManFree, official

Academic/Flexible

R metaforFree, powerful

Institution License

Stata metaComprehensive

Point-and-Click

CMAUser-friendly

基本工具包

RevMan

Cochrane official
Free download

R + metafor

Most flexible
Reproducible code

GRADEpro

Certainty tables
SoF tables

Rayyan

Screening tool
AI-assisted

REPRODUCIBILITY

基于代码的工具（R、Stata）创建 reproducible analyses。分享您的代码，以便其他人验证您的工作。

R metafor Example

BASIC META-ANALYSIS IN R

                    library(metafor)

                    # Calculate effect sizes

                    dat <- escalc(measure="RR", ai=events_tx, bi=noevents_tx,

                        ci=events_ctrl, di=noevents_ctrl, data=mydata)

                    # Random effects model

                    res <- rma(yi, vi, 数据=dat, 方法=“REML”)

                    # Forest plot

                    forest(res, slab=paste(author, year))

当系统评价有数十名作者时，您如何协调写作？

REAL DATA

列出了 SPRINT 试验 (2015)来自数十家机构的 100 authors 。写作小组包括指导委员会、现场调查员和统计学家。协调贡献、管理版本控制和确定作者信用需要正式的结构。 ICMJE 标准将作者身份定义为需要大量贡献、起草或修订、最终批准和问责。

Team Writing Challenges: Large Collaborative Reviews

您的系统审核团队有来自 4 个国家/地区的 12 名成员。您如何管理写入过程？

PATH A: Informal Coordination

Pass drafts via email; resolve authorship at the end; no writing plan or version control

↓

Duplication of effort; authorship disputes at submission; inconsistent voice and formatting; lost contributions

结果：延迟和冲突

PATH B: Structured Process

预先分配部分线索；使用具有版本控制的共享平台；从一开始就同意 ICMJE 标准和作者顺序

↓

明确的责任；一致的输出；透明的贡献；在知道结果之前确定作者身份

OUTCOME: Efficient, fair collaboration

THE REVELATION

在工作开始之前就作者身份标准和写作责任达成一致。知道结果后，争议是最难解决的。

“该工具不会进行分析。
分析人员会进行分析。
但请明智地选择您的工具 -
并分享您的代码，以便真相可以被了解已验证。"

不是每个人都会写元分析。

But every clinician, every policymaker, every patient
must know how to read them.

HRT 逆转

WOMEN'S HEALTH INITIATIVE, 2002

对于几十年来，观察性研究表明激素替代疗法 (HRT) 可以保护女性免受心脏病的侵害。

这些研究的荟萃分析显示 35-50% reduction in cardiovascular risk.

然后 WHI 随机试验揭示了真相：HRT increased heart attack risk by 29%.

The observational meta-analyses had pooled confounded data— 更健康的女性选择了 HRT，而不是相反。

Rossouw JE et al. JAMA. 2002;288:321-333

How to Read a Forest Plot

Consumer's Guide

阅读森林图

↓

Line of no effectRR=1 or MD=0

Diamond positionLeft=benefit, Right=harm

Diamond widthNarrow=precise

↓

Does diamond cross the line?

NoStatistically significant

YesNot significant

Red Flags When Reading

Warning Signs in Published Meta-Analyses

⚠ No protocol registration cited

⚠ Single database searched

⚠ No risk of bias assessment

⚠ High I² but no investigation

⚠ Asymmetric funnel plot ignored

⚠ 行业资助，无敏感性分析

⚠ 结论超越证据

What GRADE Ratings Mean

针对临床医生和患者

HIGH: We are very confident. Future research unlikely to change.

MODERATE: Probably close to truth. Future research may change estimate.

LOW: Uncertain. Future research likely to change substantially.

VERY LOW: Very uncertain. Any estimate is speculative.

“阅读森林，而不仅仅是钻石。
寻找协议。检查偏差。
Ask: Who funded this? What did they hide?
知情的读者是真理的守护者。”

What if you must compare treatments
从未进行过正面测试？

这是 network meta-analysis.

When to Use Network MA

NMA Decision Tree

Multiple Treatments

↓

Direct comparisons available?

All pairs directly compared

Pairwise MAStandard approach

Some indirect only

Network MABorrow strength

↓

Check transitivity assumptionSimilar populations across comparisons

的领域网络几何

可视化证据

Nodes = Treatments (size = sample)
Edges =直接比较（宽度=研究）
Dashed =仅间接证据

League Tables

读取NMA结果

League tables show all pairwise comparisons 来自网络。

• Each cell: effect estimate + 95% CI
• Row vs. Column: Treatment A vs. Treatment B
• Green = Favors row treatment
• Red = Favors column treatment
• Rankings (SUCRA/P-score) help identify best options

CRITICAL ASSUMPTION

Transitivity: 如果在类似患者中比较 A 与 B、B 与 C，我们可以间接估计 A 与 C。

What happens to meta-analyses when a prolific author's entire body of work is retracted?

REAL DATA

Joachim Boldt, a German anesthesiologist, had over 220 papers retracted 用于数据制作（2010-2011 年发现）。他对胶体溶液的研究已被纳入多项系统评价和荟萃分析中。当撤稿到来时，每一项包含他的作品的荟萃分析都必须重新评估。当他捏造的数据被删除后，一些结论发生了重大变化。

Boldt 撤回级联：2010-2011

您发现您发表的荟萃分析中包含的一项研究已因数据伪造而被撤回。你会做什么？

PATH A: Hope No One Notices

忽略撤回；荟萃分析已经发表，撤回的研究规模很小

↓

其他人引用了你的荟萃分析；捏造的数据通过二次引用传播；患者护理决策基于受污染的证据

OUTCOME: Cascading harm from inaction

PATH B: Self-Correct Publicly

发布更正或更新的分析，不包括撤回的研究；通知期刊；说明结论是否改变

↓

科学记录是否被更正；读者可以看到更新的分析；您的诚信声誉得到提升

OUTCOME: Scientific self-correction

THE REVELATION

证据合成的完整性取决于持续的警惕。当纳入的研究被撤回时，道德义务是更新和纠正，而不是保持沉默。

"When treatments have never met,
网络搭建证据桥梁。
But the bridge rests on transitivity—
验证人群是否具有可比性。”

你有没有见过在训练集中预测癌症
准确率高达 99% 的 AI —
，但在部署在
中时却发生了灾难性的失败 real world?

脓毒症算法失败

EPIC SEPSIS MODEL, 2021

Epic's sepsis prediction algorithm was deployed in hundreds of hospitals.

Internal validation showed excellent performance.

但是密歇根医学院的一项独立研究发现该模型 missed 67% of sepsis cases and generated excessive false alarms.

该算法已在 训练的同一人群上得到验证——这是过度拟合和失败的原因。

Wong A et al. JAMA Intern Med. 2021;181:1065-1070

AI Validation Decision Tree

AI/ML 的证据

AI Prediction Model

↓

Validation Level?

Internal only相同的数据分割

HIGH RISK

TemporalDifferent time

MODERATE

ExternalDifferent site

BETTER

Impact RCTPatient outcomes

BEST

PROBAST & TRIPOD

PROBAST

Prediction model
Risk of Bias

TRIPOD

Reporting
guideline

TRIPOD-AI

AI-specific
extension

CALIBRATION VS. DISCRIMINATION

AUC/c-statistic: Can the model rank patients? (discrimination)
Calibration: 预测概率准确吗？

A model can have good AUC but poor calibration—and harm patients.

“算法从数据中学习，
并且数据是
It validated on itself,
and called its reflection truth.
External validation is not optional—it is survival."

荟萃分析用数字说话。

但是患者听到了 恐惧和恐惧希望.

您将如何弥合差距？

Translating Numbers to Meaning

Communication Decision Tree

Meta-Analysis Result

↓

Effect Size Type?

Relative (RR, OR)

Convert to NNTMore intuitive

Absolute (RD)

Use directly"X fewer per 1000"

Continuous (MD)

ContextualizeMinimal important diff

患者脚本

EXPLAINING A POSITIVE RESULT

“该研究汇集了 15 项研究，涉及 8,000 名患者。

发现这种治疗将[结果]的风险降低了约30%。

实际上：如果我们治疗 100 个像您一样的人，与不治疗相比，大约会减少 5 人获得[结果]。

We're moderately confident in this—future research might change it slightly.

您对此有什么疑问？"

Questions Patients Should Ask

Empowering Patients

1 "纳入了多少研究和患者？"

2 "有多自信研究人员参与了这个结果吗？”

3 “有什么好处和坏处？”

4 “这些研究中包括像我这样的人吗？”

5 “谁资助了这项研究？”

6 “这对我的具体情况意味着什么？情况如何？”

Can a spreadsheet error in an academic paper directly shape the economic policy of entire nations?

REAL DATA

莱因哈特和罗格夫 2010 年的论文声称，公共债务超过 90% of GDP 增长速度显着下降。这一发现被广泛引用来证明整个欧洲的紧缩政策是合理的。 2013 年，赫恩登、阿什和波林发现了一个电子表格错误：几个国家被意外排除在计算之外。修正后，90% 的阈值消失了。

Reinhart-Rogoff 政策影响：2010-2013

您的荟萃分析结果具有明确的政策含义。政策制定者渴望得到一个简单的信息。如何撰写政策简介？

路径 A：过度简化影响

提供清晰的阈值或标题数字；省略警告和不确定性范围以实现最大的政策影响

↓

根据简化的调查结果采用政策；当出现细微差别或发现错误时，研究和由此产生的政策都会受到质疑

OUTCOME: Policy built on a fragile foundation

路径B：诚信沟通

以适当的不确定性提供证据；区分强烈的和暗示性的发现；提供可操作的摘要，保留细微差别

↓

政策制定者了解证据支持什么以及哪里仍然存在不确定性；做出决定时要谨慎

OUTCOME: Durable, evidence-informed policy

THE REVELATION

Policy briefs must communicate uncertainty honestly. Oversimplified findings may gain influence quickly but collapse when scrutinized, damaging trust in research-policy relationships.

“荟萃分析用数字说话。
患者听到恐惧和希望。
你的工作是翻译——
忠实于证据，富有同情心。”

系统性评论及时捕捉证据。

But science does not stop.
我们如何保存证据 alive?

Living Systematic Reviews

COVID-19 PANDEMIC, 2020-2023

在大流行期间，证据的出现速度比传统评论的综合速度要快。

Living systematic reviews were continuously updated as new trials reported— sometimes within days of publication.

COVID-NMA 联盟对治疗、疫苗和诊断进行了实时回顾，并随着证据的发展更新了 real-time 中的建议。

Hydroxychloroquine went from "promising" to "ineffective" within months.

Defined by Cochrane: continual updates at ≤monthly intervals

When to Use Living Reviews

实时回顾决策树

回顾类型决策

↓

快速成为证据不断发展？

Yes + High Priority

Living ReviewContinuous updates

No / Stable

Standard ReviewUpdate every 2-5 years

↓

Resource intensiveRequires ongoing funding

证据合成的未来

Automation

ML-assisted
screening

IPD-MA

Individual patient
data pooling

Real-World

EHR-based
evidence

Adaptive

Platform trials
+ MA

“证据之约不是静态的。
它随着每一项新研究、每一个新问题而不断发展。
让您的评论保持活力。
让您的评论保持活力。方法透明。
将真相置于您所做的一切的中心。”