証拠の逆転: メタ分析コース

すべてのシグナルが真実であるわけではありません。

モジュール 0:冒頭

🎯 Learning Objectives

メタ分析を定義し、証拠合成におけるその役割を説明します
研究をプールすべきでない場合を特定します
証拠の階層構造と体系的レビューの場所について説明します座る
Recognize that meta-analysis can mislead when done poorly
このコースを支える 7 つの原則を思い出してください

このコースが存在するのは、

医学が間違っていたからです。

一度もありません。めったにありません。繰り返し。証拠がしっかりしていると信じていた患者を殺害する方法で。

What is Meta-Analysis?

同じ質問に対処する複数の独立した研究の結果を組み合わせるための統計的手法。

1976

Term coined by Gene Glass

~50,000

Published per year

#1

Evidence hierarchy*

*When well conducted. Quality of conduct matters more than study design alone — as GRADE recognizes.

研究をプールする理由

1

Increase Statistical Power

Individual studies may be too small to detect effects.

2

Improve Precision

Narrower confidence intervals around effect estimates.

3

Resolve Disagreement

研究が矛盾する場合、プールすることでシグナルを明確にすることができます。

4

Explore Heterogeneity

Identify why effects differ across populations or settings.

But meta-analysis can also

MISLEAD

When done poorly, it amplifies bias rather than truth.

プールしない場合

1

研究では根本的に異なるものを測定します（リンゴとoranges)

2

Extreme heterogeneity that cannot be explained

3

One study dominates all others (megastudy problem)

4

研究には調整できないバイアスのリスクが高く、

プールは権利ではなく特権です。

The decision to combine must be defended.

証拠の階層

Systematic Reviews & Meta-Analyses of RCTs

Randomized Controlled Trials

Cohort Studies

Case-Control Studies

Case Series / Expert Opinion

階層内の位置は研究の種類ではなく方法論の質に依存します。

このコースは、

evidence reversals.

を通して教えます。各モジュールは、医学がどのように間違っていたかという物語から始まります。次に、危害を防止できたであろう方法を学びます。

7 つの原則

次のフレーズが旅の途中で繰り返されます:

1. 「すべての信号が真実であるわけではありません。」

2. 「方法は患者を私たちの信頼から守るものです。」

3. "What was hidden in plain sight?"

4. 「出所のない数字は数字ではありません」 "

5. 「異質性はノイズではなくメッセージです。"

6. "証拠の不在は不在の証拠ではありません。"

7. "Certainty must be earned, not assumed."

Module 0 Quiz

1.メタ分析で研究をプールしてはいけない場合があるのはなぜですか?

A. Pooling is always better than single studies

B. When heterogeneity is extreme or studies measure different things

C. Pooling is always appropriate for RCTs

D. Statistical methods handle any situation

2. RCT の系統的レビューは証拠階層のどこに位置しますか?

A. At the top

B. Same level as individual RCTs

C. コホート研究の下

D. Same as expert opinion

旅を始めましょう。

モジュール 1: 質問

すべてのシグナルが真実であるわけではありません。

これはエラーについての話ではありません。

これは確実性についての話です。

モジュール 1: 質問

🎯 Learning Objectives

系統的レビューのために焦点を絞った PICO の質問を作成します
Distinguish surrogate outcomes from patient-important outcomes
Explain why biological plausibility alone is insufficient evidence
CAST 試験とその証拠に基づく影響について説明します薬
原則を適用する: 「すべての明るい兆候が導きであるわけではない」

~9,000

excess deaths per year

From a treatment everyone believed worked.

これは、私たちがどのように信じたか、そしてどのように間違っていたのかについての物語です。

The Observation

Patients with frequent PVCs after MI had 2-5x higher mortality.

400,000+

MI survivors/year

~40%

重要な PVC が含まれています

160,000

at elevated risk

A massive clinical need. A clear target.

The Response

Antiarrhythmic drugs were developed, FDA approved,
and prescribed to ~200,000 patients per year.

この物語には悪役は登場しません。

全員が行動しました。

誰もが納得するロジック

PREMISE 1

PVCs after MI predict sudden cardiac death

↓

PREMISE 2

Antiarrhythmic drugs suppress PVCs

↓

PREMISE 3

Suppressing PVCs should prevent sudden death

↓

CONCLUSION

Antiarrhythmics save lives in post-MI patients

その連鎖は論理的でした。この結論は避けられないと思われた。

CAST: The Cardiac Arrhythmia Suppression Trial

Finally, someone asked: "Does suppressing PVCs actually save lives?"

Design

Randomized, double-blind, placebo-controlled

Population

Post-MI patients with asymptomatic PVCs

Intervention

Encainide, flecainide, or moricizine vs placebo

Run-in

Only patients with ≥80% PVC suppression randomized

Primary endpoint

Death or cardiac arrest with resuscitation

Sample size

1,498 patients (encainide/flecainide arms)

結果: 1989 年 4 月

データ安全監視委員会は試験を早期に中止した。

Outcome	Drug (n=755)	Placebo (n=743)
Arrhythmic deaths	33	9
All cardiac deaths	43	16
Total deaths	56	22
Death rate	7.4%	3.0%

Relative Risk of Death: 2.5

95% CI: 1.6 - 4.5 | p < 0.001

不整脈を完全に抑制した薬剤は死亡率を 150% 増加させた。

The Humanコスト

Before CAST, ~200,000 Americans per year received these drugs.

~9,000

excess deaths per year - possibly more

Vietnam War: ~6,000 US deaths/year • These drugs: ~9,000+ deaths/year

For every number, a name we will never know.

Look again.

ロジック - 再考

PREMISE 1

PVCs after MI predict sudden cardiac death

↓

PREMISE 2

Antiarrhythmic drugs suppress PVCs

← THE LEAP

↓

PREMISE 3

Suppressing PVCs should prevent sudden death

↓

CONCLUSION

Antiarrhythmics save lives in post-MI patients

マーカーを抑制すると結果が修正されるという仮定は、テストされたことがありません。

What Went Wrong: The Surrogate Trap

1

PVC は損傷した組織のマーカーであり、死因ではありません

2

The drugs had proarrhythmic effects - triggering deadlier rhythms

3

代理母は改善しましたが、転帰は悪化しました - 解離した代理母

代理母は嘘をつきませんでした。私たちは間違った質問をしました。

PICO フレームワーク

Every answerable clinical question has four components:

P - POPULATION

患者は誰ですか?彼らの特徴は何ですか?

I - INTERVENTION

What treatment or exposure is being evaluated?

C - COMPARATOR

What is the alternative? Placebo? Standard care?

O - OUTCOME

What matters to patients? Hard endpoints vs surrogates.

CAST PICO

Post-MI patients with PVCs | Antiarrhythmics | Placebo | Mortality

🔍

調査演習: CAST 前の証拠

あなたは 1988 年に心臓専門医です。ある患者は MI から生還しましたが、頻繁に PVC を患っています。観察文献は明確です...

Study	PVC を有する患者	Mortality Risk
Lown (1977)	High-grade PVCs	2.4x higher
Bigger (1984)	>10 PVCs/hour	3.1x higher
Mukharji (1984)	Complex PVCs	4.8x higher

信号は明確です。そのメカニズムは納得できる。抗不整脈薬を処方していただけますか?

Before: Observational Logic

PVCs → Higher mortality

Drugs suppress PVCs

∴ Drugs should reduce mortality

After: CAST RCT (1989)

Death rate on drug: 7.4%

Death rate on placebo: 3.0%

RR = 2.5 (150% increase in deaths)

代理母は改善しました。患者たちは死亡した。これが、「重要な結果は何ですか?」

証拠総合のための教訓

1

生物学的妥当性は証明ではない

A logical mechanism doesn't guarantee the expected effect.

2

Surrogate endpoints can mislead

Improving a biomarker doesn't prove improvement in outcomes.

3

ランダム化試験は最も強力な因果関係の証拠を提供する

観察データのみ交絡による介入の因果関係が確立されることはほとんどありません。

4

合意は証拠ではありません

20万件の処方箋、FDAの承認、ガイドラインはすべて間違っていました。

This is why we do meta-analysis: to see past apparent truths.

ストーリー: DES-II代理人の悲劇

何あなたの質問によって誰が生き、誰が死ぬかが決まるとしたら?

REAL DATA

1989 年、心臓専門医は PVC 抑制がエンカイニドとフレカイニドで達成可能であることを知っていました。代理エンドポイントは完璧に見えました。実薬とプラセボに対して、薬物は 80%+. But CAST randomized 1,498 patients によってPVCを抑制しました。試験は早期に中止されました: 56 deaths in the drug group vs 22 in placebo. Mortality increased 2.5-fold. An estimated ~9,000 excess American deaths per year これらの薬剤が原因でした。

心臓専門医の選択: 1987

MI 後の患者は頻繁に PVC を使用しています。それらを完全に抑制する薬があります。何をしますか?

パス A: 代理母の治療

Prescribe encainide — PVCs vanish, the ECG looks clean

↓

バイオマーカーが改善します。自信が持てるようになります。患者が死亡。

OUTCOME: An estimated 50,000+ excess deaths across the US during years of use

PATH B: Demand a Mortality Trial

主張:「心電図だけでなく、生存率が改善することを見せてください。」

↓

この試験では害が明らかになりました。薬は撤去される。命は救われます。

結果: PICO の正しい質問により大惨事が防止されます

THE REVELATION

質問は決して「PVC を抑制できますか?」ではありませんでした。それは「PVC抑制は命を救うのか?」です。サロゲートエンドポイントが間違った質問に答えました。適切な PICO であれば、最初から結果として死亡を要求していたでしょう。

What appears certain may be wrong.

What everyone believes may be false.

患者が私たちの信頼のためにお金を払わないようにする方法は存在します。

これが、あなたがここにいる理由です。

Module 1 Quiz

1。抗不整脈ロジックの根本的なエラーは何でしたか?

A. 試験はランダム化されていません

B. Treating a surrogate (PVCs) was assumed to improve outcomes

C. サンプルサイズが小さすぎました

D. FDAの承認は急遽行われました

2。 PICO では、「O」は何を表しますか?なぜそれが重要ですか?

A. Observation - what researchers see

B. 目的 - 研究目標

C. Outcome - what matters to patients

D. 組織 - 研究構造

すべてのシグナルが真実であるわけではありません。

メソッドは、私たちの信頼から患者を守ります。

What was hidden in plain sight?

これは、

observational evidence.

モジュール 2: プロトコル

🎯 Learning Objectives

Explain why protocol pre-registration prevents bias
Identify key elements of a PROSPERO registration
Distinguish healthy user bias from true treatment effects
Describe why observational studies overestimated HRT benefits
「メソッドは、私たちの信頼から患者を守る」という原則を適用します。自信"

30+

observational studies

All showing hormone replacement therapy protected postmenopausal women from heart disease.

証拠は圧倒的に見えました。結論は確実であるように思われた。

看護師の健康調査

122,000 nurses followed for decades. HRT users had 40-50% lower cardiovascular mortality.

RR 0.56

Cardiovascular mortality

122,000

Women followed

20+ years

Follow-up

Landmark study. Impeccable methodology. Wrong conclusion.

隠れた偏見

1

Healthy User Bias: Women who chose HRT were healthier, wealthier, better educated

2

Compliance Bias: Women who took HRT consistently also took better care of themselves

3

Prescriber Bias: Doctors gave HRT to healthier women with fewer risk factors

治療は彼らを守っていなかった。彼らはすでに保護されていました。

WHI: The Women's Health Initiative

The largest randomized trial of HRT ever conducted.

Design

Randomized, double-blind, placebo-controlled

Population

Postmenopausal women aged 50-79

Intervention

Estrogen + Progestin vs Placebo

Sample size

16,608 women

Primary endpoint

Coronary heart disease

Planned duration

8.5 years

結果: 2002 年 7 月

Trial stopped early after 5.2 years. Harm exceeded benefits.

Outcome	Hazard Ratio	Direction
Coronary heart disease	1.29	HARM
Stroke	1.41	HARM
Breast cancer	1.26	HARM
Pulmonary embolism	2.13	HARM

Complete Reversal

30 年間の観察証拠が覆されました

The Lesson

PRE-SPECIFY

A protocol written before the search begins prevents fishing, prevents bias, prevents hindsight distortion.

ストーリー: ホルモンタイミング仮説

治療が効果がある場合はどうなるでしょうか。いくつか?

REAL DATA

WHI showed HRT increased cardiovascular events overall. But later analyses revealed a critical pattern: women who started HRT within 10 years of menopause had REDUCED cardiovascular risk. Women starting 20+ years after menopause had INCREASED risk. The overall null/harm result hid a timing effect.

アナリストのジレンマ

あなたは WHI サブグループを分析しています。全体的な結果は害を示しています。さらに詳しく調べますか?

PATH A: Report Overall Only

Conclude HRT is harmful for all postmenopausal women

↓

Simple message. Guidelines recommend against HRT universally.

OUTCOME: Deny potential benefit to younger menopausal women

PATH B: Pre-Specify Timing Subgroups

Analyze by years since menopause (biologically plausible)

↓

安全な HRT 開始のための「タイミングウィンドウ」を発見します。

OUTCOME: Enable personalized recommendations

THE REVELATION

釣りをする場合、サブグループ分析は危険です。生物学が効果の変化を予測する場合、これは不可欠です。タイミング仮説は生物学的に妥当であり、事前に指定されるべきでした。

PROSPERO Registration

1

検索する前に登録

PROSPERO: International prospective register of systematic reviews

2

決定をロック

PICO, search strategy, outcomes, analysis plan - all pre-specified

3

Document Amendments

変更は許可されますが、透明性と正当性が必要です

4

Prevent Duplication

事前にレビューが存在するかどうかを確認してください開始

Module 2 Quiz

1。看護師健康調査では、WHI では得られなかった HRT の利点が示されたのはなぜですか?

A. Nurses' Health had too few patients

B. Healthy user bias in observational studies

C. Nurses' Health had shorter follow-up

D. Different hormone formulations were used

2. What is the primary purpose of PROSPERO registration?

A. To register clinical trials

B. レビューの完了を迅速化するため

C. 方法を事前に指定し、偏見を防ぐため

D. レビューのための資金を獲得するため

事前指定は必要ありません

It is protection.

Against our own tendency to find what we expect.

メソッドは、私たちの信頼から患者を守ります。

What was hidden in plain sight?

モジュール 3: 検索

What was hidden in plain sight?

これは、

what they didn't publish.

モジュール 3: 検索

🎯 Learning Objectives

Develop a comprehensive search strategy using PRESS guidelines
Search multiple databases including grey literature sources
Identify trial registries and regulatory databases (ClinicalTrials.gov, FDA)
Explain how the rosiglitazone case exposed hidden cardiovascular harms
原則を適用する:「目に見えないところに何が隠されていたのか?」

$3.2B

annual sales at peak

アバンディア (ロシグリタゾン) は、世界のベストセラーの糖尿病薬。

公表された試験は安心できるものに見えました。未公開のものは別の物語を語っていました。

公開された証拠 (2007 年以前)

Published trials showed rosiglitazone effectively lowered HbA1c. Cardiovascular outcomes were rarely reported.

1999

FDA approval

6M+

Patients treated

~0.7%

HbA1c reduction

代理母は良好に見えました。しかし、実際の心血管イベントについてはどうでしょうか?

Nissen's Discovery: May 2007

博士。 Steven Nissen は、未公開の治験データを GSK 自身の Web サイトから入手しました。

GSK は法的和解により、臨床試験結果をオンラインに掲載するよう求められていました。 Nissen と Wolski は 42 件の試験を分析しましたが、その多くは雑誌に掲載されたことがありません。

データは技術的に公開されていました。

No one had systematically searched for it.

メタ分析の結果

Outcome	Odds Ratio	95% CI
Myocardial Infarction	1.43	1.03 - 1.98
CV Death	1.64	0.98 - 2.74

43% Increased Risk of Heart Attack

心筋梗塞の p = 0.03

Published in NEJM. The FDA called an emergency advisory committee meeting.

The FDA Advisory Committee: July 2007

22-1

Voted: CV risk exists

20-3

警告

委員会は分裂しました。撤回を望む人もいた。メタ分析には欠陥があると言う人もいた。

しかし、信号は見えないわけではありません。

The Aftermath

1

Black box warning added for heart failure risk (2007)

2

Severe restrictions on prescribing in the US (2010)

3

Withdrawn 完全に欧州市場から (2010 年)

4

FDA now requires cardiovascular outcome trials for all diabetes drugs

What a Comprehensive Search Requires

PUBLISHED

PubMed, Embase, CENTRAL, Web of Science

GREY LITERATURE

Conference abstracts, dissertations, regulatory docs

TRIAL REGISTRIES

ClinicalTrials.gov, WHO ICTRP, EU CTR

REGULATORY

FDA, EMA, Health Canada submissions

COMPANY DATA

GSK, Pfizer, Roche clinical trial registries

HAND SEARCH

Reference lists, contact authors, experts

PRESS チェックリスト

Peer Review of Electronic Search Strategies

1

調査質問の翻訳

検索は PICO を反映していますか要素?

2

ブール演算子と近接演算子

AND、OR、は正しく使用されていませんか?

3

Subject Headings

MeSH/Emtree 用語は適切で展開されていますか?

4

Text Words

Synonyms, spelling variants, truncation?

PRESS Checklist (continued)

5

Spelling, Syntax, Line Numbers

取得の原因となるエラーはありますか?失敗しましたか?

6

制限とフィルター

日付、言語、研究設計の制限は適切ですか?

Peer-reviewed searches substantially improve retrieval of key studies.

PRESS guideline: McGowan et al., 2016

Database Translation

各データベースに同じ検索を適用する必要があります:

PubMed

"diabetes mellitus, type 2"[MeSH] OR "type 2 diabetes"[tiab]

Embase

'non insulin dependent diabetes mellitus'/exp OR 'type 2 diabetes':ti,ab

Subject headings, field tags, and operators differ between databases.

ストーリー: タミフル透明性キャンペーン

次の場合に何が起こりますか?検索しても何も見つかりませんか?

REAL DATA

Governments stockpiled $9 billion パンデミックインフルエンザに対するオセルタミビル (タミフル) について。コクラン共同研究は証拠を検討しようとしました。 77 clinical trials, full reports existed for only 20の。ロシュは 5 yearsのデータ共有を拒否しました。 BMJ とコクランが最終的に入手したとき over 160,000 pages of clinical study reports, they found: Tamiflu reduced symptoms by less than 1 day, with no evidence it prevented hospitalizations or complications.

レビューアーのジレンマ: 2009

あなたはタミフルに関するコクランのレビューを更新しています。公表された試験結果は良好なようです。しかし、57件の試験ではアクセス可能な完全な報告書が存在しない。何をしますか?

PATH A: Analyze What's Published

Use the 20 available trials. Conclude Tamiflu is effective.

↓

あなたのレビューは備蓄の継続をサポートします。弱い証拠に 90 億ドルが費やされました。

OUTCOME: Billions wasted, true efficacy unknown

パス B: 完全なデータの要求

Refuse to publish until all trial data is accessible

↓

5-year campaign. 160,000+ pages finally obtained. Truth emerges.

OUTCOME: Evidence policy changed; EMA now publishes all trial reports

THE REVELATION

検索は、見つけられるものと同じくらい有効です。灰色の文献が企業の壁に隠されている場合、最も包括的な PubMed 検索でも真実を見逃してしまいます。タミフルの騒動は世界的な政策を変えました。EMA は現在、すべての医薬品の臨床研究報告書を発行しています。

If Nissen had searched only PubMed,

the signal would have remained hidden.

Comprehensive search is survival.

What was hidden in plain sight?

Module 3 Quiz

1。ロシグリタゾン心血管シグナルを明らかにした証拠ソースの種類は何ですか?

A. Published journal articles

B. Cochrane Library

C. Company clinical trial registry

D. FDA approval documents

2. What does PRESS stand for?

A. 証拠検索基準の出版レビュー

B. Peer Review of Electronic Search Strategies

C. 証拠総合研究報告用プロトコル

D. Primary Research Evidence Search System

What was hidden in plain sight?

モジュール 4: スクリーニング

出所のない数字は数字ではない。

これは、

what they chose to report.

モジュール 4: スクリーニング

🎯 Learning Objectives

Apply PRISMA flow diagram to document study selection
Implement dual-reviewer screening with conflict resolution
選択的結果レポートとデータ操作を特定する
Calculate inter-rater reliability (Cohen's kappa)
「出所のない数字は数字ではない」原則を適用する

88,000

heart attacks attributed to Vioxx

A blockbuster drug. A hidden signal. A preventable catastrophe.

1999 年から2004年、何百万人もの人々がこの鎮痛剤を服用しました。

Vioxx の台頭

ロフェコキシブ (Vioxx) は COX-2 選択的 NSAID でした。従来の鎮痛剤よりも胃にとって安全であるとして販売されています。

1999

FDA approval

$2.5B

Peak annual sales

80M+

Patients prescribed

VIGOR 試験 (2000)

Vioxx Gastrointestinal Outcomes Research

Design

Randomized, double-blind

Comparison

Vioxx vs Naproxen

Population

Rheumatoid arthritis

Sample

8,076 patients

Primary Outcome

GI events

Published

NEJM, November 2000

What VIGOR Published

GI Outcome	Vioxx	Naproxen
Confirmed GI events	2.1 per 100 pt-yrs	4.5 per 100 pt-yrs
Reduction	54% fewer GI events

見出し: Vioxx は胃にとってより安全です!

これは医師に言われたことです。これが患者が信じていたことです。

What VIGOR Buried

CV Outcome	Vioxx	Naproxen
Myocardial Infarction	20 events	4 events
Relative Risk	5x higher in Vioxx group

5-fold Increase in Heart Attacks

Mentioned only briefly, attributed to naproxen being "cardioprotective"

選択的報告

1

データカットオフ操作: 3 additional heart attacks occurred after the cutoff used in publication

2

Spin: CVシグナルはナプロキセンが心臓保護作用があると説明された(証拠なし)

3

Outcome switching: CVイベントは事前に特定されていたが強調されていなかった

4

Internal knowledge: メルクの電子メールは、信号について知っていたことを示しています

APPROVe試験（2004年）

結腸直腸ポリープ予防の試験 - 安全のため早期に中止されました。

RR 1.92

CV events vs placebo

Sept 2004

Vioxx withdrawn

Four years after VIGOR showed a 5x risk. Four years too late.

ストーリー: Vioxx決定ツリー

信号が来たときに何が起こるか考えましたか？騒音の中に隠れていますか?

REAL DATA

Vioxx (rofecoxib) は 1999. By 2004, estimates suggest 88,000-140,000 excess heart attacks and 30,000-40,000 deaths. Merck's own VIGOR trial showed 5x cardiovascular risk in 2000—but it was dismissed as a "naproxen cardioprotective effect."

分かれ道

で承認されました。あなたは 2001 年に FDA の審査員です。VIGOR データは、Vioxx と比べて心臓発作のリスクが 5 倍であることを示しています。 naproxen.

パス A: 説明を受け入れる

Believe Merck's hypothesis: naproxen is cardioprotective

↓

No additional safety studies required. Drug stays on market at full speed.

結果: 4 年間で 40,000 人以上の死亡

パス B: 証拠を要求

Require a dedicated CV safety trial before continued marketing

↓

Delay or restrict marketing until cardiovascular safety is established.

OUTCOME: Signal detected early, lives saved

THE REVELATION

シグナルは 2000 年に存在していました。間違った説明が行動を遅らせました。 4年までに。証拠なしで受け入れられた対立仮説は、数万人の命を犠牲にします。

PRISMA フロー図

Every step of screening must be documented and transparent.

Identification

Records from databases + other sources

↓

Screening

Title/abstract review (duplicates removed)

↓

Eligibility

Full-text assessment (with exclusion reasons)

↓

Included

Studies in synthesis

Dual Screening: Why Two Reviewers?

1

Reduces Selection Bias

One reviewer might unconsciously favor certain studies

2

Catches Errors

疲労、読み間違い、間違いは避けられません

3

Forces Explicit Criteria

Disagreements reveal ambiguity in inclusion rules

Typical agreement: κ = 0.6-0.8

Disagreements resolved by discussion or third reviewer

キャリブレーション: パイロットフェーズ

Before screening thousands of records, reviewers should calibrate on a sample of 50-100 records.

1

Screen the same set independently

2

Compare decisions and discuss disagreements

3

Refine inclusion criteria until κ > 0.7

4

キャリブレーションプロセスとルールを文書化します。変更点

PRISMA 2020 Updates

New in 2020

Separate reporting of database vs register searches

New in 2020

自動化ツールは報告する必要があります

New in 2020

Citation searching documented separately

New in 2020

Reasons for exclusion at full-text mandatory

PRISMA 2020 ではチェックリストが大幅に改訂され、合成方法、確実性評価、プロトコル登録に関するレポートが拡張されました。

If Vioxx's cardiovascular data had been screened by independent reviewers,

if all pre-specified outcomes had been required to be reported,

88,000 heart attacks might have been prevented.

出所のない数字は数字ではない。

Module 4 Quiz

1。 VIGOR 試験では、ナプロキセンと比較した Vioxx 群の MI の相対リスクはどのくらいでしたか?

A. 1.5x higher

B. 2x higher

C. 5x higher

D. 10x higher

2. Why is dual screening (two independent reviewers) important?

A. It makes screening faster

B. It reduces selection bias and catches errors

C. レビューする研究の数が減ります

D. It allows reviewers to skip full-text review

出所のない数字は数字ではない。

モジュール 5: 抽出

出所のない数字は数字ではない。

これは、

存在しなかった数値。

モジュール 5: 抽出

🎯 Learning Objectives

来歴フィールドを含む標準化されたデータ抽出フォームを設計する
Calculate effect sizes from various reported statistics (OR, RR, HR, SMD)
Implement dual-extraction with discrepancy resolution
データ捏造と不正行為の危険信号を特定する
Explain how the DECREASE fraud affected clinical guidelines

~10,000

possible excess deaths in Europe

捏造された臨床試験に基づくガイドラインからデータ。

DECREASE 試験は世界中の周術期ケアに影響を与えました。データは発明されたものです。

Don Poldermans: A Star Researcher

Professor at Erasmus Medical Center, Rotterdam. Author of over 500 papers. Lead author of ESC guidelines on perioperative cardiac care.

500+

Publications

DECREASE

Trial series I-VI

ESC

Guideline chair

一見非の打ちどころのない情報源。誰かがデータを見るまでは。

DECREASE 試験: 主張

Trial	Finding	Impact
DECREASE-I (1999)	90% reduction in cardiac death	Changed guidelines
DECREASE-IV (2009)	Beta-blockers safe in low-risk	Expanded recommendations

Effect sizes were implausibly large.

90% reduction? Almost nothing in medicine works that well.

The Investigation: 2011

1

Erasmus MC investigated after whistleblower complaints

2

捏造された患者データ: Patients who didn't exist or weren't enrolled

3

No informed consent: Many "participants" never consented

4

Poldermans dismissed: From Erasmus MC in 2011

害悪の連鎖

DECREASE が削除されたときメタ分析...

Benefit → Harm

Direction reversed

27% ↑

Stroke risk increase

POISE 試験 (2008 年) は有害性を示しました。 DECREASEと矛盾するため却下されました。

なぜこれが捕まらなかったのか?

1

Trust in authority: ポルダーマンスは自身の証拠をレビューするガイドライン作成者であった

2

No data verification: 誰も個々の患者データを求めなかった

3

Publication prestige: Published in top journals, assumed valid

4

Implausible effects accepted: 90% reductions should raise suspicion

Data Extraction: Defense Against Fraud

1

Dual Extraction

Two extractors independently - catches transcription errors and forces scrutiny

2

Record Provenance

Table, page, paragraph - every number traceable to source

3

Verify Against Registry

ClinicalTrials.govの結果と出版物 -不一致は危険信号です

4

Request IPD

Individual patient data reveals what aggregate summaries hide

Effect Size Calculation

抽出中に、報告されたデータから効果量を計算します。

BINARY OUTCOMES

Odds Ratio, Risk Ratio, Risk Difference from 2x2 tables

CONTINUOUS OUTCOMES

平均差、平均値と SD からの標準化平均差

常に最も信頼できるソースから抽出します。

Prefer: ITT results > per-protocol > subgroups

Red Flags During Extraction

!

Implausible effect sizes: 80-90% reductions should prompt scrutiny

!

Baseline imbalances: 「完全に一致しすぎる」グループ

!

Round numbers: "Exactly 50" or "exactly 100" patients per arm

!

Registry discrepancies: 公開された N が登録された N と異なる

Researcher

Effect Size Conversions

調査レポートの結果は異なる指標で表示されます。それらをプールするには、多くの場合、変換が必要になります:

From	To	Formula
SMD (d)	log-OR	log-OR = d × π / √3
log-OR	SMD (d)	d = log-OR × √3 / π
Correlation (r)	Fisher z	z = 0.5 × ln((1+r)/(1−r))
OR	RR	RR = OR / (1 − P₀ + P₀ × OR)
OR	NNT	NNT = 1 / (P₀ − OR×P₀ / (1−P₀+OR×P₀))

P₀ = コントロールグループのベースラインリスク。これらの式は近似的な条件を想定しています。ボレンシュタインらを参照。正確な導出については、(Ch. 7) を参照してください。

Researcher

イベント発生までの時間 (生存) データ

Many trials report time-to-event outcomes using hazard ratios (HR). Pooling HRs in meta-analysis requires special handling:

1

ログ (HR) + SE メソッド

試験からログ (HR) とその SE を抽出します。報告されていない場合は、CI から SE を導出します: SE = (ln(上) − ln(下)) / (2 × 1.96)。標準の逆分散法を使用してプールします。

2

HR が報告されない場合

カプラン・マイヤー曲線から IPD を再構成する方法 (Guyot et al. 2012)、または p 値とイベント数から HR を推定する方法 (Parmar et al. 1998) が存在します。利用可能な場合は、常に直接報告された調整済み心拍数を優先します。

HR < 1 favors treatment; HR > 1 favors control. Do not convert HRs to ORs or RRs—they measure fundamentally different quantities.

ストーリー: ボルト・コロイド事件

抽出したデータが本物ではなかったらどうしますか?

REAL DATA

ヨアヒム・ボルトは、麻酔液管理において最も多作な研究者でした。彼の出版物のうち 180 以上が撤回されました - 医学史上最大の撤回事件の 1 つです。彼の捏造データは、ヒドロキシエチルデンプン (HES) が安全であることを示していました。彼の研究を含むメタ分析では、HES は無害であると結論づけられました。ボルトの研究が削除されると、プールされた効果は逆転しました: HES increased kidney injury by 59% (RR 1.59, 95% CI 1.26-2.00) and mortality by ~9% (RR 1.09). An estimated thousands of patients received a harmful fluid based on fabricated evidence.

抽出者の警戒: 2010

輸液蘇生メタアナリシス用のデータを抽出しています。ボルトの研究は文献の大半を占めています (90 以上の論文)。内部告発者が懸念を表明した。何をしますか?

PATH A: Extract as Published

Trust peer-reviewed publications. Extract Boldt's data like any other.

↓

Your meta-analysis shows HES is safe. Guidelines recommend it.

OUTCOME: Thousands receive a nephrotoxic fluid

PATH B: Verify Provenance

倫理承認をクロスチェックし、ソースデータを要求し、疑わしい研究を除外する感度分析を実施します

↓

Discover missing ethics approvals. Flag studies. Re-analyze without them.

OUTCOME: True signal emerges — HES causes harm

THE REVELATION

出所は官僚主義ではありません。それは証拠とフィクションの違いです。抽出されたすべての数値は、検証可能な患者データを伴う倫理的に承認された研究に遡る必要があります。出所がなければ、所有者のいない番号は武器になる可能性があります。

メタ分析のすべての番号

must trace back to a verifiable source.

出所のない数字は数字ではない。

Fraudulent data can kill as surely as fraudulent drugs.

Module 5 Quiz

1。 DECREASE 試験データがベータブロッカーのメタ分析から削除されたときに何が起こりましたか?

A. The benefit became even larger

B. No change in conclusions

C. The direction reversed to show potential harm

D. 結果は決定的ではなくなりました

2. Why should dual extraction be standard practice?

A. It catches transcription errors and forces scrutiny

B. It makes extraction faster

C. より多くの研究を見つけるのに役立ちます

D. It reduces the amount of work needed

出所のない数字は数字ではない。

モジュール 6: バイアス

メソッドは、私たちの信頼から患者を守ります。

これは、

バイアスは見えません。

モジュール 6: バイアス

🎯 Learning Objectives

Apply Risk of Bias 2.0 (RoB 2) to randomized trials
ROBINS-I を非ランダム化研究に適用
Assess all five RoB 2 domains (randomization, deviations, missing data, measurement, selection)
Distinguish confounding by indication from true treatment effects
Explain how BART revealed hidden harms of aprotinin

20+

何年も市場に出て

アプロチニンは外科手術を減らすためのゴールドスタンダードでした

その後、誰かが RCT を実施しました。真実は異なりました。

The Hidden Bias: Confounding by Indication

1

Sicker patients got aprotinin: Surgeons used it in complex, high-risk cases

2

Survivors bias: Dead patients can't report complications

3

Publication bias: 否定的な研究は発表されませんでした

観察研究では、薬の効果と患者のベースラインリスクを区別できませんでした。

BART: ランダム化された真実

Blood Conservation Using Antifibrinolytics in a Randomized Trial

Outcome	Aprotinin	Alternatives
30-day mortality	6.0%	3.9%
Relative Risk	1.53 (53% increased death)

Trial Stopped Early for Harm

11月に市場から撤退2007

🔍

調査: バイアスの評価

観察研究をレビューしています。バイアスのリスクの考え方を適用する:

Question	Observational	BART (RCT)
Random allocation?	❌ Surgeon choice	✓ Yes
Baseline comparable?	❌ Sicker got drug	✓ Balanced
Blinding?	❌ Open label	✓ Double-blind

Confounding by indication: 外科医は最も病気の患者にアプロチニンを投与しました。観察研究では、生存者バイアスを測定する際に、生存率は薬物によるものと考えられていました。

Risk of Bias 2.0: The Five Domains

D1

Randomization Process

D2

意図した介入からの逸脱

D3

結果データの欠落

D4

結果の測定

D5

報告結果の選択

ROBINS-I: 非ランダム化研究の場合

RCT が利用できない場合は、ROBINS-I (非ランダム化研究におけるバイアスのリスク) を使用します。介入)

1

Confounding

Baseline differences between groups

2

Selection of Participants

Exclusions related to intervention

3

Classification of Interventions

Misclassification of exposure status

4

意図した介入からの逸脱

Co-interventions, contamination

5

Missing Data

Differential loss to follow-up

6

Measurement of Outcomes

Ascertainment bias

7

Selection of Reported Result

Selective reporting

Ratings: Low / Moderate / Serious / Critical / No information

ストーリー: アプロチニン BART 試験

64 件の研究が一致し、すべてが間違っている場合はどうなりますか?

REAL DATA

アプロチニンは、出血を減らすために心臓手術で使用され、 20 years. 64 small randomized trials 安全で効果的であることが示唆されました。メタ分析により利点が確認されました。次に、 BART trial (2008) randomized 2,331 patients: aprotinin vs. tranexamic acid vs. aminocaproic acid. Result: aprotinin increased mortality by 53% (RR 1.53、95% CI 1.06-2.22)。試験は危害のため早期に中止されました。バイエルは数か月以内にアプロチニンを市場から撤退しました。

外科医の証拠: 2006

あなたは抗線溶薬を選択している心臓外科医です。 64件の小規模な試験ではアプロチニンが支持されましたが、死亡率を検出する機能を備えた試験はありませんでした。大規模な RCT (BART) が登録中です。待ちますか?

パス A: メタ分析を信頼します

64 trials can't all be wrong. Continue prescribing aprotinin.

↓

小規模な試験では、死亡ではなく出血が測定されました。誰も死亡に耐える十分な力を持っていませんでした。メタ分析では検出力不足のサロゲート結果がプールされました。

OUTCOME: Excess deaths in cardiac surgery patients

PATH B: Assess Risk of Bias First

64 件の試験すべてを RoB で評価します。規模が小さく、代理結果を使用し、減少率が高いことに注意してください。適切に機能する RCT を待ちます。

↓

BART reveals the truth. Switch to safer alternatives.

OUTCOME: Lives saved by demanding adequately powered evidence

THE REVELATION

証拠の量は質と同じではありません。間違った結果を測定した64件の検出力不足の試験は、死亡率を測定した1件の適切な検出力試験を上回ることはありません。バイアスのリスク評価は形式的なものではありません。これは、患者と、代理主導の小さな証拠からの誤解を招く結論との間の盾です。

Sixty-four small trials measured bleeding, not death.

One adequately powered trial revealed 53% increased mortality.

証拠の量は、質や証拠の代わりにはなりません。

Module 6 Quiz

1. Why did 64 small trials miss aprotinin's harm?

A. Underpowered for mortality; used surrogate outcomes

B. Confounding by indication

C. Outcome measured incorrectly

D. Follow-up too short

メソッドは、私たちの信頼から患者を守ります。

モジュール 7: 総合

異質性はノイズではなくメッセージです。

マグネシウム論争: 1991 ～ 1995 年

When pooling leads us astray.

モジュール 7: 総合

🎯 Learning Objectives

Calculate pooled effect sizes using fixed-effect and random-effects models
Choose between DerSimonian-Laird and HKSJ estimators appropriately
Interpret forest plots including weights, confidence intervals, and diamonds
Explain why small-study effects can mislead meta-analyses
原則を適用します: 「異質性はメッセージであり、メッセージではありません」ノイズ"

The Year: 1991

「あなたは希望と証拠の岐路に立たされています...」

Heart disease kills more people worldwide than any other cause. In 1991, a new hope emerges: Could something as simple and cheap as intravenous magnesium save lives after myocardial infarction?

生物学的理論的根拠は健全でした:

Magnesium stabilizes cardiac membranes, prevents arrhythmias, and vasodilates coronary arteries.

LIMIT-2:ランドマーク試験

Leicester Intravenous Magnesium Intervention Trial, 1992

2,316

Patients enrolled

24%

Mortality reduction

p = 0.04

Statistically significant

A cheap, safe intervention that could save 250,000 lives per year globally.

医学界は興奮しました。

The Meta-Analysis: 1993

Researchers pooled seven randomized trials of IV magnesium in MI:

Trial	Year	N	Odds Ratio
Morton 1984	1984	40	0.10
Rasmussen 1986	1986	273	0.35
Smith 1986	1986	400	0.48
Abraham 1987	1987	94	0.87
Shechter 1990	1990	103	0.27
Ceremuzynski 1989	1989	48	0.22
LIMIT-2	1992	2,316	0.74

🔍

Investigation Exercise: The Meta-Analyst's Dilemma

あなたあなたは、MI に対するマグネシウムに関する証拠を総合するように依頼されました。 7 つのトライアルからのデータが目の前にあります。

この森林区画のパターンが見えますか?

Pooled OR = 0.44 (95% CI: 0.27–0.71)

55% mortality reduction! Publish in the Lancet?

しかし、ちょっと待ってください...トライアルのサイズについて何か気づきましたか?

警告兆候

What should have given us pause?

1

Small sample sizes: Six of seven trials had <500 patients

2

Extreme effects: OR of 0.10 (90% reduction) is implausible for any drug

3

All positive: ネガティブな試験はどこにありましたか?ファイルドロワーの問題...

4

Funnel asymmetry: Small trials showed much larger effects than larger ones

🔍

ファネルプロットテスト

プールする前に、出版バイアスをチェックする必要があります。ファネルプロットを調べてみましょう。

年: 1995 — ISIS-4 レポート

「そして真実が明らかになりました...」

The Fourth International Study of Infarct Survival (ISIS-4) enrolled 58,050 patients across 1,086 hospitals in 31 countries.

58,050

Patients

2,216

Deaths in Mg group

2,103

Deaths in placebo

OR = 1.06 (95% CI: 1.00–1.12)

No benefit. If anything, a trend toward harm.

📊

前後: 全体像

森林区画に大規模な試験を追加すると何が起こるかを見てください...

BEFORE ISIS-4

7 small trials (N = 3,274)

OR = 0.44

Strong benefit signal

AFTER ISIS-4

8 trials (N = 61,324)

OR = 1.02

No effect

Why Did Small Trials Mislead?

1

Publication Bias

Small negative trials were never published—they sat in file drawers

2

Small-Study Effects

Smaller trials tend to show larger effects due to methodological weaknesses

3

Random High Bias

偶然にも、いくつかの小さなトライアルで極端な結果が得られ、それが公開されます

4

Random-Effects Amplification

Random-effects models give more weight to small trials, amplifying bias

Fixed vs. Random Effects

Which model should you choose?

FIXED EFFECT MODEL

Assumes one true effect. Weights studies by inverse variance (precision). Large trials dominate.

Magnesium result: OR = 0.96 (p = 0.52)

RANDOM EFFECTS MODEL

Assumes distribution of effects. Gives more weight to small trials. Wider confidence intervals.

Magnesium result: OR = 0.59 (p = 0.01)

⚠️ モデルの選択が結論を決定しました!

ランダム効果はバイアスを修正しません。小規模な研究効果がある場合、より小規模な試験に比重が移って結論が変わる可能性があります。

マグネシウムの教訓

1。プールされた推定値を信頼する前に、出版バイアス を確認してください。ファンネルプロットとエッガーテストがツールになります。

2. Be wary of small-study effects. If only small trials show benefit, wait for a large, well-conducted trial.

3. Model choice matters. ランダム効果により、偏った証拠が増幅される可能性があります。両方のモデルを検討し、その意味を理解してください。

4. One large trial can overturn many small ones. これが、ISIS-4 のようなメガトライアルが非常に価値のある理由です。

Researcher

メタアナリシスにおける特別な研究デザイン

すべての RCT が標準の並列グループデザインを使用しているわけではありません。一般的な 2 つの選択肢では、結果をプールするときに特別な処理が必要です。

1

Cluster-Randomized Trials

個人ではなくグループ (病院、学校) をランダム化します。 design effect = 1 + (m−1) × ICC により、有効サンプルサイズが減少します。プールする前に N を設計効果で割るか、試験からの調整された SE を使用します。クラスタリングを無視すると、人為的に狭い CI が生成されます。

2

Crossover Trials

各患者は両方の治療を受けます。ペア計画により分散は減少しますが、正しくプールするには within-patient correlation (またはペア解析 SE) が必要です。並列グループ SE の使用は保守的です。間違った N 個を使用すると、患者が二重カウントされます。

詳しい公式と実際の例については、コクランハンドブック v6.4、第 23 章を参照してください。

ストーリー: 初期の界面活性剤逆転

研究を組み合わせる方法によって、治療法が救命効果があるかどうかが決まるとしたらどうなるでしょうか。役に立たない?

REAL DATA

未熟児用の初期の界面活性剤は、初期の界面活性剤と後期の界面活性剤の比較 6 small trials showing reduced mortality (RR 0.84). A fixed-effect meta-analysis confirmed benefit (p=0.04). But a random-effects model showed no significance (p=0.12) — the confidence interval crossed 1.0. Later, SUPPORT (2010) and VON (2012), two large pragmatic trials with ~2,000 neonates combined, found no benefit によってサポートされました。臨床実践は、小規模な試験と間違ったモデルに基づいて変更されてきました。

新生児科医のモデル選択: 2005

あなたは、初期の界面活性剤に関するコクランのレビューを更新しています。 6 つの小規模な試験では、固定効果モデルによる利点が示されています。変量効果モデルは有意ではありません。どちらを報告しますか?

PATH A: Report Fixed-Effect Only

Fixed-effect is significant. Report the positive result. Change practice.

↓

NICUs adopt early surfactant. Later trials show no benefit. Practice reverses.

OUTCOME: Years of unnecessary intubation of premature infants

PATH B: Report Both Models

FE と RE の結果を表示します。重要性がモデルの選択に依存することを示すフラグ。大規模な試験を募集します。

↓

Honest uncertainty. Large trials prioritized. True answer emerges faster.

OUTCOME: Premature babies spared unnecessary intervention

THE REVELATION

固定効果を使用するか変量効果を使用するかによって結論が変わる場合、その結論は脆弱です。両方報告してください。不確実性を認めてください。そして覚えておいてください: 小規模な試験からの脆弱な結果は、慣行を変更する義務ではありません。

Module 7 Quiz

1。 ISIS-4 では得られなかったメリットがマグネシウムのメタ分析で示されたのはなぜですか?

A. ISIS-4 の方法論に欠陥がありました

B. Calculation error in meta-analysis

C. Publication bias in small trials

D. LIMIT-2 の能力が不足していました

2. What warning sign should have alerted reviewers to potential bias?

A. Asymmetric funnel plot (small trials showing larger effects)

B. Low heterogeneity (I² = 0%)

C. Strong biological plausibility

D. Too few trials to analyze

3. When publication bias is suspected, which model may amplify the bias?

A. Fixed effect model

B. Random effects model

C. Bayesian model

D. Network meta-analysis

Small trials can show false signals.

Large trials anchor the truth.

異質性はノイズではなくメッセージです。

モジュール 8: 異質性

異質性はノイズではなくメッセージです。

ACCORD: 2008

平均値が真実を隠すとき。

モジュール 8: 異質性

🎯 Learning Objectives

I²、τ²、予測区間を計算して解釈する
Apply ICEMAN criteria to assess subgroup credibility
Distinguish between clinical, methodological, and statistical heterogeneity
Conduct and interpret leave-one-out sensitivity analyses
Explain how ACCORD revealed differential effects across subgroups

The Year: 2008

「あなたは、歴史上最も衝撃的な裁判終了の一つを目撃しようとしています...」

何十年もの間、糖尿病コミュニティには 1 つの指針がありました: lower blood sugar is better。画期的な DCCT (1993) と UKPDS (1998) は、集中的なグルコース制御により、微小血管合併症 (失明、腎不全、神経損傷) が減少することを示しました。

論理的推定:

If controlling glucose prevents complications, shouldn't intensive control prevent cardiovascular disease too?

ACCORD: Action to Control Cardiovascular Risk in Diabetes

The definitive test of intensive glucose control

10,251

Type 2 diabetics

HbA1c <6%

Intensive target

HbA1c 7-7.9%

Standard target

すべての患者は、確立された心血管疾患または複数の危険因子のいずれかで、心血管リスクの高い 2 型糖尿病を患っていました。この試験は 5.6 年間計画されました。

February 6, 2008

データ安全監視委員会は緊急会議を招集します。

After 3.5 years, they make an unprecedented decision:

試験を中止します。

衝撃的な結果

Outcome	Intensive	Standard	HR (95% CI)
Primary CV endpoint	352 events	371 events	0.90 (0.78–1.04)
All-cause mortality	257 deaths	203 deaths	1.22 (1.01–1.46)
Severe hypoglycemia	10.5%	3.5%	3.0× higher

22% increase in mortality

54 excess deaths in the intensive arm

🔍

Investigation Exercise: The Clinician's Dilemma

あなたは 500 人の糖尿病患者を抱える内分泌学者です。 ACCORDの結果が公開されています。 HbA1c <6% を目指して努力している患者さんに何と言いますか?

集中管理は誰にとっても有害ですか?それとも一部の人だけ？

サブグループ分析で判明:

Subgroup	Intensive HR	Interpretation
No prior CVD	1.00 (0.76–1.32)	No effect
Prior CVD	1.45 (1.15–1.84)	Significant harm
Baseline HbA1c <8%	1.02 (0.75–1.40)	No effect
Baseline HbA1c ≥8%	1.29 (1.03–1.60)	Harm

The average effect masked critical heterogeneity!

CVDが確立されている、またはベースラインコントロールが不十分な患者にとって、集中治療は有害でした。

異質性の理解: I²以降

研究 (またはサブグループ) が異なることを示した場合影響を評価するには、この変動を定量化する必要があります。

I² = 0–25%: 異質性が低い。効果は研究全体で一貫しています。

I² = 25–50%: Moderate. Look for sources of variation.

I² = 50–75%: Substantial. Consider whether pooling is appropriate.

I² = 75–100%: Considerable. A single pooled estimate may mislead.

しかし、I² だけでは何をすべきかわかりません。さらに調査する必要があることを示します。

Tau² (τ²): 研究間の分散

I² は不均一性による分散の割合を示しますが、τ² は

I² (percentage)

「研究間の真の違いによる差異は全体の何分の 1 ですか?」

Scale: 0% to 100%

τ² (absolute)

「研究間の真の効果はどの程度異なりますか?」

Same scale as the effect measure

Use τ² to calculate prediction intervals

予測区間は、新しい研究で期待される効果の範囲を示します。多くの場合、信頼度よりもはるかに広いです。間隔。

📊

The Prediction Interval: What ACCORD Really Tells Us

Consider a meta-analysis of intensive glucose control across multiple trials...

Confidence Interval

HR 1.10 (0.95–1.27)

「平均効果の最良の推定値」

Prediction Interval

HR 1.10 (0.70–1.73)

"The range of effects in a new setting"

予測間隔は利益と害の両方に及びます!

In some settings, intensive control might help. In others, it could kill.

When Is a Subgroup Effect Credible?

Subgroup Credibility Criteria (adapted from ICEMAN, Schandelmaier 2020 & Sun 2012)

1

サブグループ分析は事前に指定されていましたか?

事後サブグループはデータが発生する傾向があります。浚渫

2

Is there a plausible biological rationale?

メカニズムは明確であり、データから独立している必要があります

3

Is the effect consistent across related outcomes?

死亡に害が現れる場合、MIや脳卒中にも同様の害はありますか?

4

Is there independent replication?

サブグループ効果は他の研究で確認されていますか?

ICEMAN Applied to ACCORD

Criterion	Assessment	Score
Pre-specified?	はい—以前のCVDはプロトコル	✓
Biological rationale?	Yes—hypoglycemia more dangerous with CVD	✓
Consistent outcomes?	Yes—CV mortality and all-cause mortality aligned	✓
Independent replication?	Partially—ADVANCE, VADT showed similar patterns	~

ICEMAN Rating: High Credibility

The differential harm in high-risk patients appears genuine.

臨床的意義

CVDのない患者の場合: Moderate glucose control (HbA1c ~7%) remains the goal. Intensive control may reduce microvascular complications.

CVDが確立している患者の場合: Avoid intensive targets. Hypoglycemia is dangerous for damaged hearts.

高齢の患者の場合: Relaxed targets. Quality of life matters. Tight control causes falls, confusion, and excess mortality.

"One size fits all" treatment is not patient-centered medicine.

Meta-Regression: Explaining Heterogeneity

When heterogeneity is high, meta-regression can identify study-level covariates that explain variation.

THE QUESTION

効果量は研究によって系統的に変化するか特性?

Covariates

Year, dose, duration, baseline risk, study quality

Output

Regression coefficient (slope), R², residual heterogeneity

Caution

メタ回帰には、共変量あたり 10 件以上の研究が必要です。研究はほとんどなく、あくまで探索的なものです。生態学的誤謬: 研究レベルの関連性は個人には当てはまらない可能性があります。

Example: In ACCORD, meta-regression might test if treatment effect varies by baseline HbA1c, showing harm concentrated in patients with very high levels.

ストーリー: SPRINT 血圧革命

What number saves lives? Who decides?

REAL DATA

何十年もの間、目標は次のとおりでした: 血圧を治療して <140 mmHg systolic. Then came SPRINT (2015): 9,361 high-risk patients randomized to intensive (<120) vs standard (<140) targets. Intensive treatment reduced CV events by 25% and death by 27%. Trial stopped early for benefit. Guidelines changed worldwide.

Before SPRINT: The Guidelines Committee

2014 年に血圧のガイドラインを設定しています。目標は何年も 140 未満でした。より良い証拠を待つべきですか?

PATH A: Maintain Status Quo

Keep <140 target (established practice, minimal controversy)

↓

Guidelines unchanged. Physicians continue treating to <140.

OUTCOME: Miss opportunity to prevent deaths

PATH B: Fund the Definitive Trial

目標を更新する前に SPRINT の結果を待ちます

↓

SPRINT demonstrates benefit. Update target to <120 for high-risk patients.

OUTCOME: Estimated 100,000+ lives saved globally

JNC 7 (2003): <140

Years of uncertainty

SPRINT (2015): <<120 (高リスク)

THE REVELATION

「標準治療」は修正されていません。試験が仮定に疑問を投げかけると状況は変わります。誰も明白な疑問を検証しなかったため、10 年間、患者は過小治療されてきた可能性があります。

Module 8 Quiz

1。 ACCORD 試験はなぜ早期に中止されたのですか?

A. Intensive control showed clear cardiovascular benefit

B. Intensive control increased mortality

C. 登録が遅すぎました

D. Budget ran out

2. What does a prediction interval tell us that a confidence interval doesn't?

A. The true effect is more precisely estimated

B. サンプルサイズは適切です

C. 新しい研究で期待される効果の範囲

D. 使用された数式

3. According to ICEMAN, which factor is MOST important for subgroup credibility?

A. サブグループ仮説

B. Large sample size in the subgroup

C. Statistically significant p-value

D. Multiple outcomes showing same direction

研究結果が一致しない場合は、

その意見の相違に耳を傾けます。

異質性はノイズではなくメッセージです。

証拠の不在は、不在の証拠ではありません。

モジュール 9: 隠された研究

証拠の不在は、不在の証拠ではありません。

Reboxetine: 2010

決して日の目を見なかった 74%。

モジュール 9: 隠された研究

🎯 Learning Objectives

Interpret funnel plots for asymmetry detection
エッガーのテストとその他の統計テストを適用します。出版バイアス
バイアス調整のためのトリムアンドフィル手法の実装
Critically appraise the limitations of publication bias tests
原則の適用: 「証拠の不在は不在の証拠ではない」

The Year: 1997

"A new hope for depression patients who cannot tolerate SSRIs..."

レボキセチン (Edronax) は、新規の抗うつ薬、つまり選択的ノルエピネフリン再取り込み阻害剤 (NRI) でした。 SSRI とは異なり、異なる神経伝達物質系を標的としました。フルオキセチンまたはセルトラリンに失敗した、または耐性がなかった患者にとって、フルオキセチンまたはセルトラリンは新しいメカニズムを提供しました。

1997

EU approval

50+

Countries approved

Millions

Prescriptions written

公開された証拠

What doctors could find in medical journals:

Comparison	Published Trials	Published Result
Reboxetine vs Placebo	3 trials (n=507)	Significantly better (SMD = 0.56)
Reboxetine vs SSRIs	4 trials (n=628)	Equivalent or better

公開された文献には明確なストーリーが語られています:

Reboxetine works. Patients benefit. Prescribe with confidence.

しかし、目に見えなかった試験についてはどうですか?

In 2010, German researchers at IQWiG made a request to the European Medicines Agency...

They demanded access to all 試験データ - 公開されたものと未公開のもの。

What they found changed everything.

全体像

Eyding et al., BMJ 2010

Comparison	Published Only	ALL DATA
Reboxetine vs Placebo	SMD 0.56 (benefit)	SMD 0.10 (no benefit)
Patients in analysis	507 (14%)	2,731 (100%)
Reboxetine vs SSRIs	Equivalent	劣悪（危害に対するRR 1.23）
Patients in analysis	628 (26%)	2,411 (100%)

患者データの74%は公開されなかった

隠された試験では利点が示されなかったなど害

🔍

Investigation Exercise: The File Drawer

あなたは、2008 年の系統的査読者です。PubMed、Embase、およびコクランライブラリですべてのレボセチン試験を検索します。利点を示している 7 件の公開試験が見つかりました。

この証拠を信頼できますか?

⚠️ ファネルは大幅に非対称です!

すべての公開された研究は片側に集まっています。ヌル試験と陰性試験はどこにありますか?

Publication Bias Toolkit

1

Funnel Plot

Plot effect size vs. standard error. A symmetric funnel suggests no bias; asymmetry raises alarms.

2

Egger's Regression Test

Regress effect/SE on 1/SE. A non-zero intercept (P < 0.10) suggests small-study effects. Note: inflated false-positive rate with binary outcomes; use Peters' test instead.

3

Peters' Test

For binary outcomes, regresses log OR on inverse of total sample size. Less prone to false positives.

4

Trim-and-Fill

「欠落している」研究を代入してファネルを対称にし、プールされた効果を再計算します。

📊

インタラクティブ: トリムアンドフィル分析

適用しましょうリボセチンデータをトリムアンドフィルして、調整された推定値がどのようになるかを確認します...

Published Only

7 trials

SMD = 0.56

Significant benefit

Trim-and-Fill

7 + 5 imputed = 12 trials

SMD = 0.23

Reduced, still nominally significant

But even trim-and-fill underestimated the problem!

すべてのデータの真の効果は SMD = 0.10 (本質的にゼロ) でした。
Trim-and-fill is conservative—it doesn't fully correct for selective publication.

The Best Defense: Trial Registries

出版バイアスの検出方法は不完全です。本当の解決策は、 prospective registration.

ClinicalTrials.gov

US registry (2000)

WHO ICTRP

Global portal

PROSPERO

Review registration

トライアル版を検索するときは、必ずレジストリを確認してください。 registered 試行回数と publishedの数を比較します。ギャップは警告信号です。

Since 2005, ICMJE requires trial registration as a condition of publication.

AllTrials キャンペーン

"All trials registered. All results reported."

レボセチンのスキャンダルは、他の医薬品の同様の事件とともに、世界的な動きを引き起こしました:

✓

2013: EMA 臨床データポリシー

European Medicines Agency commits to publishing clinical study reports

✓

2016: FDA Amendments Act enforcement

Mandatory results reporting on ClinicalTrials.gov within 12 months

✓

AllTrials Coalition

Over 90,000 supporters, 700+ organizations demanding transparency

レボセチン余波

!

Germany's IQWiG recommended against reboxetine for depression

!

英国のNICEはレボキセチンを「非推奨」に格下げ

!

FDAは2001年にレボキセチンを拒否していた（未発表データにアクセスできた）

10年以上にわたり、患者はレボキセチンと同等の薬を投与されてきた。プラセボ。

肯定的な試験のみが公表されたため。

ストーリー: パロキセチン研究 329 の欺瞞

公表された結論が実際のデータと逆だったらどうなるか?

REAL DATA

グラクソ・スミスクラインの研究 329パロキセチンをテストした adolescent depression。発表された論文 (2001 年) は、パロキセチンは "generally well tolerated and effective." であると結論付けました。実際のデータ: パロキセチン failed on all 8 pre-specified outcomes. When re-analyzed (RIAT 2015), suicidal/self-harm events: パロキセチン群では 23 対、プラセボ群では 5 でした。公開された論文では、製造の重要性に対して事後的な結果を再定義しました。 2015 年、元の臨床研究報告書を使用した RIAT (目に見えない試験や放棄された試験の復元) の再分析は、パロキセチンは neither safe nor effective for adolescents.

処方者のパズルであると結論付けました。 2003

あなたは児童精神科医です。唯一の大規模試験である研究 329 では、パロキセチンは十代の若者に効果があると述べています。しかし、FDAは青少年に対するそれを承認していません。親がそれを処方するように頼みます。何をしますか?

パス A: 出版物を信頼します

A peer-reviewed JAACAP paper says it works. Prescribe off-label.

↓

Millions of prescriptions worldwide. Suicidal events in adolescents.

OUTCOME: FDA issues black box warning for SSRIs in youth (2004)

PATH B: Check the Trial Registry

ClinicalTrials.gov で元のエンドポイントを検索します。公開された結果が登録されたプロトコルと一致しないことに注意してください。

↓

赤フラグ: 結果の切り替えが検出されました。あなたは薬を差し控えます。患者はより安全です。

OUTCOME: Publication bias identified before harm

THE REVELATION

出版バイアスは、研究を見逃していることだけではありません。それは、公表された研究の中に真実が欠けているということです。結果の切り替え、ゴーストライティング、選択的なレポートにより、失敗した治験をマーケティングツールに変えることができます。公開された結果をトライアルレジストリプロトコルと常に比較してください。

Module 9 Quiz

1.レボキセチン試験データの何パーセントが公表された文献から隠蔽されていますか?

A. 25%

B. 50%

C. 74%

D. 90%

2. Why can trim-and-fill underestimate the correction needed?

A. It assumes effects are normally distributed

B. 対称性を達成するために研究を押し付けているだけであり、現実を完全には反映していない可能性があります

C. 少なくとも 20 件の研究が必要です

D. 非常に大規模な研究でのみ機能します

3. What is the best prospective defense against publication bias?

A. Funnel plots in all meta-analyses

B. Egger's test before pooling

C. Prospective trial registration

D. More medical journals

できないこと

may be more important than what you can.

証拠の不在は、不在の証拠ではありません。

Certainty must be earned, not assumed.

モジュール 10: 確実性

Certainty must be earned, not assumed.

Early Surfactant: 2012

高品質の証拠が進化するとき。

モジュール 10: 確実性

🎯 Learning Objectives

完全な GRADE フレームワークを適用して、確実性を評価します。証拠
Evaluate all five downgrade factors (RoB, inconsistency, indirectness, imprecision, publication bias)
Identify when to upgrade for large effect, dose-response, or confounding
Construct Summary of Findings tables with absolute effect estimates
原則を適用する:「確実性は仮定ではなく獲得する必要がある」

The Year: 1990s

"A revolution in neonatal care..."

呼吸窮迫症候群 (RDS) は、未熟児の主な死亡原因でした。外因性 surfactant（肺胞の崩壊を防ぐ物質）の開発は、新生児医療における大きな進歩の 1 つでした。

問題は、いつ界面活性剤を投与すべきかということになりました。

Prophylactically (to all high-risk infants) or selectively (only after RDS develops)?

オリジナルのコクランレビュー(2003)

Multiple RCTs conducted before the era of routine CPAP

Outcome	Prophylactic vs Selective	Certainty
Neonatal mortality	RR 0.73 (favors prophylactic)	High
BPD or death	RR 0.84 (favors prophylactic)	High

Recommendation: Give surfactant prophylactically

Guidelines worldwide adopted this approach

しかし、新生児ケアの世界は変化していました...

A new technology emerged: Continuous Positive Airway Pressure (CPAP)

Non-invasive support that could help preterm lungs without intubation.

古い証拠はまだ適用されますか?

2012 コクランアップデート

New trials conducted in the CPAP era

Outcome	Old Trials	New Trials
BPD or death	RR 0.84 (favors prophylactic)	RR 1.12 (favors selective)
機械的ケアの必要性換気	予防薬で下げる	予防薬で上げる!

Complete Reversal

In the CPAP era, prophylactic surfactant causes more harm

🔍

Investigation: Why Did Evidence Evolve?

あなたは新生児科医です。同僚の質問: 「ランダム化試験はどのようにして互いに矛盾するのでしょうか?」

元の証拠は間違っていましたか?

1

Indirectness Changed

Old trials: No CPAP available. New trials: CPAP standard of care.

2

コンパレータが改良されました

Selective surfactant + CPAP is better than prophylactic intubation.

3

Context Matters

ある時代の証拠は次の時代には当てはまらない可能性があります

This is why GRADE assesses Indirectness!

High-quality evidence can become inapplicable when context changes.

GRADE フレームワーク

Grading of Recommendations, Assessment, Development and Evaluations

GRADE は次の質問に答えます: この推定値にはどの程度自信がありますか?

⊕⊕⊕⊕ HIGH: Very confident. True effect is close to the estimate.

⊕⊕⊕◯ MODERATE: Moderately confident. True effect likely close, but may differ substantially.

⊕⊕◯◯ LOW: Limited confidence. True effect may differ substantially.

⊕◯◯◯ VERY LOW: Very little confidence. True effect likely substantially different.

GRADE: Factors That Downgrade Certainty

RCT の証拠は HIGH から始まります。以下の場合に格下げされる可能性があります。

1

Risk of Bias

Flawed randomization, lack of blinding, incomplete follow-up, selective reporting

2

Inconsistency

Unexplained heterogeneity across studies (large I², non-overlapping CIs)

3

Indirectness

母集団、介入、比較対象、または質問からの結果の違い

4

Imprecision

Wide confidence intervals, small sample size, few events

グレード: 第 5 因子

5

Publication Bias

Asymmetric funnel plot, missing registered trials, sponsor influence

Each factor can downgrade by one or two levels

High → Moderate → Low → Very Low

Example: バイアスのリスクが高く (↓1)、深刻な間接性 (↓1) の RCT (高から始まる) のメタ分析は次のようになります。評価 LOW.

📊

Interactive: Apply GRADE to Surfactant

古い試験と新しい試験を使用して、予防用界面活性剤の証拠の確実性を評価しましょう。

OLD TRIALS (Pre-CPAP)

Starting: HIGH (RCTs)

Risk of Bias: Low (−0)

Inconsistency: None (−0)

Indirectness: Serious (−1)

Different standard of care today

Final: ⊕⊕⊕◯ MODERATE

NEW TRIALS (CPAP Era)

Starting: HIGH (RCTs)

Risk of Bias: Low (−0)

Inconsistency: None (−0)

Indirectness: None (−0)

Matches current practice

Final: ⊕⊕⊕⊕ HIGH

GRADE: Factors That Upgrade Certainty

観察証拠は低から始まります。以下の条件でアップグレードできます。

+1

Large Magnitude of Effect

RR >2 または <0.5 で、もっともらしい交絡はありません

+1

Dose-Response Gradient

Higher exposure = larger effect in a consistent pattern

+1

Residual Confounding

All plausible confounders would reduce the effect (strengthens causal inference)

Communicating Certainty

GRADE requires transparent language about confidence:

HIGH: "Prophylactic surfactant reduces mortality..."

MODERATE: "Prophylactic surfactant probably reduces mortality..."

LOW: "Prophylactic surfactant may reduce mortality..."

VERY LOW: "We are uncertain whether prophylactic surfactant reduces mortality..."

この言語により、臨床医は証拠の強さを確実に理解できます。

ストーリー: 低酸素症による未熟児酸素のパラドックス

Can too much of a lifesaver become a killer?

REAL DATA

1940s-50s: High oxygen concentrations saved premature babies from respiratory failure. Then came an epidemic of blindness—retrolental fibroplasia (now called ROP). Doctors reduced oxygen dramatically. Blindness dropped. But then: increased deaths and brain damage 。必要な最適酸素レベル decades of trials to find. Recent SUPPORT/BOOST II trials finally defined the therapeutic window: SpO2 91-95%.

新生児科医のジレンマ: 1955

あなたは新生児科医です。高酸素状態の未熟児は失明してしまいます。何をしますか?

PATH A: Dramatic Reduction

Drastically reduce oxygen to prevent blindness

↓

Blindness rates drop. But some babies die or suffer brain damage from hypoxia.

OUTCOME: Trading one harm for another

パス B: 系統的研究

酸素を慎重に滴定し、用量反応関係を研究します

↓

Takes decades but eventually identifies the optimal range.

OUTCOME: Optimize both survival and vision

1940s: High O2 saves lives

1950s: Blindness epidemic

1960 年代から 70 年代: 低酸素による死亡

2010s: SUPPORT/BOOST define optimal range

THE REVELATION

すべての介入には治療期間があります。それを見つけるには、仮定ではなく測定が必要です。証拠がバランスを決定するまで、振り子は 60 年間揺れました。

Module 10 Quiz

1。 2003 年から 2012 年にかけて界面活性剤の推奨が逆転したのはなぜですか?

A. 元の試験は不正でした

B. CPAP changed the comparator (indirectness)

C. Not enough patients in original trials

D. 結果の測定方法は異なりました

2。 GRADE ダウングレード要素ではないものは次のうちどれですか?

A. Risk of bias

B. Imprecision

C. Publication bias

D. Large magnitude of effect

3.確実性が低い証拠にはどのような表現を使用する必要がありますか?

A. 「介入により減少します...」

B. 「介入によりおそらく減少します...」

C. 「介入により減少する可能性があります...」

D. 「かどうかは不明です...」

数値は次のとおりです。十分ではありません。

どの程度確信しているのかを伝える必要があります。

Certainty must be earned, not assumed.

メソッドは、私たちの信頼から患者を守ります。

モジュール 11: 生活のレビュー

メソッドは、私たちの信頼から患者を守ります。

COVID-19 Hydroxychloroquine: 2020

緊急性が高まったとき

モジュール 11: 生活のレビュー

🎯 Learning Objectives

臨床試験の逐次分析を適用して証拠が十分であるかどうかを判断する
生きた系統的レビューを設計および維持する
Establish update triggers and futility/harm boundaries
Manage multiplicity and alpha-spending in sequential analyses
Explain how rapid evidence synthesis evolved during COVID-19

March 2020: A World in Crisis

「ウイルスは私たちの理解よりも速く拡散します...」

新型コロナウイルス感染症により数千人が死亡しました。 ICUが溢れた。ワクチンも治療法もありませんでした。そして、一縷の希望: hydroxychloroquine (HCQ)—an old malaria drug—showed antiviral activity in lab studies.

March 20

ゴートレ調査 (フランス)

36 pts

Non-randomized

Viral

Clearance improved

採用ラッシュ

ゴートレ調査から数週間以内:

!

March 28: FDA issues Emergency Use Authorization for HCQ

!

April 4: India bans HCQ export (hoarding fears)

!

Global: Shortages affect lupus and rheumatoid arthritis patients

Millions received HCQ based on a 36-patient observational study

What could go wrong?

🔍

調査: ゴートレ調査

あなたは、フランスの HCQ 調査の評価を依頼された EBM 専門家です。設計を検討します...

Issue	Impact
Non-randomized	Selection bias—who got HCQ?
6 patients excluded	3 went to ICU, 1 died, 1 withdrew, 1 had nausea
Surrogate outcome	Viral load, not clinical outcomes
別の病院からの対照	Different care, different testing
No blinding	Expectation bias in lab testing

この研究では、RoB 2.0 に関するバイアスのリスクが高いと評価されます

GRADE certainty: VERY LOW. Yet it changed global policy.

Why Observational COVID Studies Misled

1

Immortal Time Bias

Patients must survive long enough to receive treatment. Survivors are compared to non-survivors.

2

Confounding by Indication

Sicker patients may get different treatments. Healthier patients received HCQ early.

3

Healthy User Effect

Patients who seek treatment tend to be healthier overall.

4

Outcome Reporting

肯定的な結果が得られた研究はより早く発表されました。

2020 年 6 月: RCT レポート

Large, rigorous trials completed at remarkable speed

Trial	N	Result
RECOVERY (UK)	4,716	No benefit on mortality (RR 1.09)
WHO SOLIDARITY	954	No benefit (RR 1.19)
ORCHID (US)	479	停止されました。無駄性

HCQ provided no benefit—and may have caused harm

June 15, 2020: FDA revokes Emergency Use Authorization

📊

タイムライン: 観察証拠と RCT 証拠

March-May 2020

Observational: ~20 studies

Suggest benefit

Pooled OR ~0.65

June-July 2020

RCTs: RECOVERY, SOLIDARITY

Show no benefit/harm

Pooled RR ~1.10

3 か月で「有望」から「効果がない」まで

これが、ランダム化と進化する証拠を追跡するための生きたレビューが必要な理由です。

Living Systematic Reviews

急速に進化するための新しいアプローチ証拠:

1

Continuous Surveillance

文献を毎週または毎日検索して新しい証拠を探します

2

Cumulative Meta-Analysis

Update pooled estimates as each new trial reports

3

試験逐次分析 (TSA)

Determine when sufficient information has accumulated to conclude

4

Transparent Versioning

Track every change, maintain full audit trail

試験逐次分析 (TSA)

When have we learned enough?

TSA は、単一試験の中間分析と同様に、メタ分析に停止境界を適用します。これは、 required information size (RIS) needed to detect or exclude a clinically meaningful effect.

RIS

Required sample size

α-spending

Controls type I error

Boundaries

Benefit / Harm / Futility

新型コロナウイルス感染症における HCQ について、TSA は 2020 年 6 月までに無駄の境界を越えたことを示しました。

バイアスが蔓延している場合の HCQ サーガからの教訓

1. Observational studies can mislead spectacularly 。同じ方向を指している多くの研究でも、間違っている可能性があります。

2. RCTs can be conducted quickly when the will exists. RECOVERY enrolled 5,000+ patients in weeks.

3.生活レビューは不可欠です for evolving topics. Fixed-point-in-time reviews become obsolete instantly.

4. Political pressure doesn't change biology. プレッシャーの下でも厳格な方法で患者を守ります。

ストーリー: LEAP ピーナッツアレルギー革命

予防が原因ならどうしますか?

REAL DATA

For decades, pediatric guidelines recommended: avoid peanuts in infancy to prevent allergy. Meanwhile, peanut allergy rates tripled 1997 年から 2008 年まで。 LEAP (2015): 640 high-risk infants randomized to early peanut introduction vs. avoidance. Result: Early introduction reduced peanut allergy by 81% が来ました (1.9% 対 13.7%)。予防戦略が流行の原因となっていました。

アレルギー専門医の岐路: 2010

あなたは小児アレルギー専門医です。回避ガイドラインにもかかわらず、ピーナッツアレルギーは増加しています。定説に疑問を持ちますか?

PATH A: Follow Guidelines

Continue recommending peanut avoidance in high-risk infants

↓

Guidelines are "evidence-based." Safe to follow consensus.

OUTCOME: Peanut allergies continue to rise

パス B: 定説に疑問を呈します

Design a trial to test if early introduction might be protective

↓

LEAP trial reveals the truth. Guidelines reverse worldwide.

OUTCOME: Prevent an epidemic

2000: AAP recommends avoidance

2008: Allergy rates triple

2015: LEAP が証拠を覆します

2017: Guidelines flip to early introduction

THE REVELATION

「まず、危害を加えない」には証拠が必要です。思い込みは、たとえ善意のものであっても、大規模な損害を引き起こす可能性があります。免疫系は耐性を獲得するために曝露を必要とし、回避すると感作が生じます。

Module 11 Quiz

1. Gautret ヒドロキシクロロキン研究の主な欠陥は何でしたか?

A. Too few patients

B. No blinding

C. Excluding patients who deteriorated

D. Too short follow-up

2. What does Trial Sequential Analysis help determine?

A. Which studies have high risk of bias

B. When enough evidence has accumulated

C. 不均一性の程度

D. Which treatment is best

3。新型コロナウイルスの観察研究ではHCQの利点が示されたのに、RCTでは効果が得られなかったのはなぜですか?

A. RCTs enrolled sicker patients

B. RCTs used different outcomes

C. 観察研究の偏り

D. 観察研究にはより優れたデータがあった

Speed cannot replace rigor.

But rigor can be fast.

Living reviews balance both.

すべてのシグナルが真実であるわけではありません。

モジュール 12: 高度な手法

すべてのシグナルが真実であるわけではありません。

Advanced Methods

Beyond pairwise meta-analysis.

モジュール 12: 高度な手法

🎯 Learning Objectives

Interpret network meta-analysis geometry and SUCRA rankings
Apply bivariate models for diagnostic test accuracy meta-analysis
Conduct dose-response meta-analysis with flexible splines
Understand when individual patient data (IPD) meta-analysis is needed
それぞれの先進的な手法の仮定と限界を認識するメソッド

ペアワイズが不十分な場合

「質問が A と B よりも複雑な場合もあります...」

これまでに学習したメソッドが基礎を形成します。しかし、臨床の現実では、さらに多くのことが要求されることがよくあります。 Which of 10 antidepressants is best? What's the optimal dose of statin? Does this test accurately diagnose early cancer?

このモジュールでは、4 つの高度な方法が紹介されており、それぞれが異なる複雑な質問に答えます。

Network Meta-Analysis (NMA)

When you have many treatments but few head-to-head trials

NMA combines direct evidence (A vs B) with indirect evidence (A vs C, B vs C → inferred A vs B) to compare multiple treatments simultaneously.

SUCRA

Ranking probabilities, not effect size

Consistency

Direct = Indirect?

Networks

Visualize evidence

🔍

NMA Example: Antidepressants

The landmark Cipriani 2018 NMA compared 21 antidepressants using 522 trials.

The Challenge

21 drugs, but not every pair tested head-to-head

Many vs. placebo, few vs. each other

The Solution

NMA はネットワーク全体で直接証拠と間接証拠を組み合わせます

有効性と受け入れ可能性に関して 21 項目すべてをランク付けします

結果: 一部の薬は有効性で上位にランクされ、他の薬は受け入れやすさで上位にランクされます

単一の薬は普遍的に「最良」というわけではありません。信頼区間、推移性、および臨床的トレードオフを使用してランキングを解釈します。

NMA: Critical Assumptions

1

Transitivity

Effect modifiers should be similarly distributed across comparisons; otherwise indirect comparisons may be biased

2

Consistency

直接証拠と間接証拠が一致します (テスト可能)

3

Connected Network

All treatments linked through at least one common comparator

When assumptions fail, NMA can mislead

常に推移性を評価し、テストを行ってください。不一致。

Dose-Response Meta-Analysis

最適な投与量を見つける

Uses the Greenland-Longnecker method 制限付き三次スプラインを使用して、用量と効果の間の非線形関係をモデル化します。

1

Non-linear patterns

J-shaped (alcohol & mortality), U-shaped (vitamin D), threshold (aspirin)

2

Clinical relevance

「多ければ多いほど良い」だけではなく、利益と害のバランスが最適な用量を見つけます

個別の患者データ (IPD)

サブグループ分析のゴールドスタンダード

Instead of published summary data, obtain 生治験実施者からの患者レベルのデータ 。正確なサブグループ分析、イベント発生までの時間モデリング、および標準化された定義を可能にします。

One-Stage

Single hierarchical model (not mega-trial)

Two-Stage

Analyze, then pool

80%+ target

データ利用可能性の目標

早期乳がん治験者の共同グループは、1980 年代に IPD MA の先駆者となりました。

Diagnostic Test Accuracy (DTA)

「介入」が必要な場合テスト

DTA meta-analysis synthesizes sensitivity (真陽性率) と specificity (true negative rate)—two correlated outcomes requiring bivariate models.

1

Bivariate/HSROC Model

感度と特異度の相関関係を考慮

2

SROC Curve

95% の信頼度および予測領域を含む ROC 曲線の概要

3

QUADAS-2

Quality Assessment of Diagnostic Accuracy Studies

適切な方法の選択

Question	Method
Does A beat B?	Pairwise MA
Which of many treatments is best?	Network MA (NMA)
最適なものは何か用量は?	Dose-Response MA
Who benefits most? (subgroups)	IPD MA
この検査はどの程度正確ですか?	DTA MA
効果は時間の経過とともにどのように変化しますか?	Survival/Time-to-Event MA

方法は質問と一致している必要があります。間違った方法で質問を強制しないでください。

ストーリー: 敗血症のステロイド

Three large trials. Three different answers. What do you believe?

REAL DATA

CORTICUS (2008): 499 patients. Hydrocortisone in septic shock. No mortality benefit. ADRENAL (2018): 3,658 patients. Hydrocortisone. No mortality benefit. APROCCHSS (2018): 1,241 patients. Hydrocortisone + fludrocortisone. Mortality reduced (43% vs 49.1%, p=0.03). Same class of intervention. Different protocols. Different results.

ガイドライン作成者の挑戦

あなたは敗血症ガイドラインを作成しています。 3つの主要な裁判では意見が一致していない。どのように推奨しますか?

PATH A: Simple Average

Pool all three trials. Overall effect uncertain. Conclude "evidence unclear."

↓

Guidelines say steroids are optional. No strong recommendation.

OUTCOME: Clinicians left without clear guidance

PATH B: Investigate Heterogeneity

Analyze why APROCCHSS differed (fludrocortisone, longer duration, different population)

↓

効果的なプロトコルと効果のないプロトコルが異なることを確認します。

OUTCOME: Recommend the specific effective protocol

THE REVELATION

矛盾するトライアルは失敗ではありません。それらは、治療が効果のある場所と効果のない場所を示す地図です。試験間の違い（用量、期間、共同介入、集団）が理解の鍵となります。

Module 12 Quiz

1.ペアワイズと比較したネットワークメタ分析の主な利点は何ですか?

A. データ抽出が必要ありません

B. It compares treatments not directly tested against each other

C. バイアス評価のリスクが不要になります

D. It produces better forest plots

2. Why does DTA meta-analysis require bivariate models?

A. To handle more than two studies

B. 出版バイアスを調整するため

C. 感度と特異度は次のとおりです。相関

D. To generate forest plots

3. What does the "consistency" assumption in NMA require?

A. All studies must be high quality

B. 直接証拠と間接証拠が一致する必要があります

C. Sample sizes must be similar

D. No missing studies

Methodologist

コースエコシステム

このコースでは、系統的レビューワークフロー全体をカバーします。さらに詳しく知りたい場合は、関連コースをご覧ください:

DTA Course
Bivariate/HSROC, SROC curves, QUADAS-2

Risk of Bias Mastery
RoB 2, ROBINS-I/E, domain-level assessment

GRADE Certainty
Full SoF tables, GRADE-CERQual

IPD Meta-Analysis
One-stage/two-stage, mixed-effects models

Publication Bias Detective
Copas, PET-PEESE, p-curve, selection models

Umbrella Reviews
AMSTAR 2, ROBIS, overlap correction

Prognostic Reviews
CHARMS, PROBAST, c-statistic pooling

Living Reviews + Rapid Reviews
TSA, update triggers, abbreviated methods

Module 12 Complete

「メソッドは質問に一致する必要があります。高度なメソッドは高度な質問に答えます。しかし、基本は決して変わりません。」

これで、コアワークフローはマスターされました。次の 10 個のモジュールでは、ベイジアン推論、ネットワークメタ分析、個々の患者データ、用量反応モデリング、堅牢性と脆弱性、公平性、AI 支援合成、定性的証拠、多変量手法、再現性などの最前線を探ります。

すべてのシグナルが真実であるわけではありません。

モジュール 13: ベイジアンターン

すべてのシグナルが真実であるわけではありません。

モジュール 13: ベイジアンターン

🎯 Learning Objectives

頻度主義推論とベイズ推論の違いを説明します
Interpret prior distributions, likelihoods, and posterior distributions
Distinguish credible intervals from confidence intervals
Understand when Bayesian meta-analysis offers advantages
Recognize how prior choice affects conclusions

ストーリーオープナー: STAMPEDE

In 2005, a trial began

that would never truly end.

前立腺がんに対する STAMPEDE 試験では、マルチアーム、マルチステージ (MAMS) プラットフォーム設計が使用されました。証拠が蓄積されるにつれて、武器が追加または削除される可能性があります。その統計は頻度主義的でしたが、適応哲学は、データが蓄積されるにつれて決定を更新するというベイズの精神を具体化しました。

頻度主義の世界観

In frequentist statistics, probability means long-run frequency。 95% CI は、「95% の確率で真の効果が現れる」ことを意味するものではありません。つまり、研究を無限に繰り返した場合、区間の 95% に真実が含まれることになります。

p-value

P(H₀ | data) ではなく、P(data | H₀)

95% CI

信念ではなくカバレッジのプロパティ

Fixed

真のパラメータは固定

ベイズ世界観

In Bayesian statistics, probability represents degree of belief. We start with a prior (データの前に信じていること)、 likelihood (データが教えてくれること)で更新し、 posterior (updated belief).

1

Prior × Likelihood = Posterior

ベイズを取得します。定理: P(θ|data) ∝ P(data|θ) × P(θ)

2

Credible Intervals

95% の信頼区間は、指定されたモデル以前の条件で確率的に解釈可能です。

Researcher

Choosing Priors

1

Non-informative (Vague)

Normal(0, 10000) または均一。データを優位にしましょう。頻度主義の結果を模倣します。

2

Weakly Informative

Normal(0, 1) for log-OR. Regularizes extreme estimates while remaining flexible.

3

Informative

Based on previous evidence. Powerful but controversial. Must be pre-specified.

4

Half-Cauchy for τ

Recommended for heterogeneity. Half-Cauchy(0, 0.5) allows large τ but concentrates near zero.

Researcher

MCMC Sampling

Most Bayesian models cannot be solved analytically. We use Markov Chain Monte Carlo (MCMC) で事後分布からサンプルを抽出します。ツール: JAGS、Stan、brms (R)、PyMC (Python)。

Chains

Multiple independent chains (typically 4)

R̂

Convergence: R̂ < 1.01 (strict; older texts use < 1.1)

ESS

Bulk-ESS > 400 (平均)。 CI の tail-ESS > 400

Methodologist

Bayesian Model Averaging

Instead of choosing between fixed-effect and random-effects models, Bayesian model averaging (BMA) は、事後確率によって各モデルに重み付けを行います。これは、最終推定値におけるモデルの不確実性を説明します。

BF

Bayes Factors

BF₁₀ > 10 = H₁ の強力な証拠。 BF₁₀ < 1/10 = H₀ の強力な証拠。

対話型ツールのプレースホルダー

Interactive: Posterior Visualizer

前の強度を調整して、事後強度にどのような影響を与えるかを確認します。より多くのデータが以前のデータをどのように圧倒するかをご覧ください。

Prior Strength: Vague

Prior Mean (log-OR): 0.00

STAMPEDE ストーリー

STAMPEDE は、進行性前立腺がんの治療法を比較する 5 つの研究部門とともに 2005 年に発足しました。 2016 年までにアビラテロンを追加し、死亡が 37% 減少したことが示されました (HR 0.63、95% CI 0.52 ～ 0.76)。

プラットフォームの設計はベイズ適応的思考を体現しています。中間分析がアームの選択をガイドし、治療法が出現すると新しいアームが導入され、無駄なアームは早期に廃棄され、患者を効果のない状態から救います。

STAMPEDE は 100 以上の施設で 10,000 人以上の患者を登録し、前立腺がん治療を根本的に変えました。ベイジアンの考え方により、証拠が蓄積され、リアルタイムで決定を知らせることができます。

Decision Tree: When to Go Bayesian?

Frequentist vs Bayesian Meta-Analysis

次の場合にはベイジアンを選択してください: (1) 本物の事前情報がある、(2) 確率的ステートメントが必要 (「80% の確率効果 > 0」)、(3) 頻度主義の特性が信頼できないとしている研究がほとんどない、または (4) モデルの平均化を実行したい。

Bayesian with weakly informative prior

A common practical default. Regularizes extreme estimates without forcing strong prior conclusions.

ベイジアン事前情報

事前の証拠が強力で、事前に指定されている場合のみ。感度分析を行う必要があります。

Stay frequentist

Simpler, well-understood. Preferred when k is large and no prior information.

Remember Module 1?

CAST Through a Bayesian Lens

CAST のベイジアン分析で基礎科学からの有益な事前分布 (抗不整脈薬は PVC を抑制する) を使用していたら、事後分析は依然として有害な方向に強くシフトしていただろう。十分なデータがあれば、強力な事前分布であっても可能性が高まります。教訓: ベイジアン手法は不正な事前分布から保護するものではありません。しかし、ベイジアン手法では次のような仮定が立てられます transparent.

Module 13 Quiz

Q1. What does a 95% Bayesian credible interval mean?

A. 95% of repeated experiments would produce intervals containing the true value

B. 真のパラメータがこの範囲内にある確率は 95% です

C. The interval has a 95% chance of being correct

D. 将来のデータの 95% はこの範囲に収まります

Q2. 研究間の不均一性に対して推奨される事前分布は何ですか(τ)?

A. Uniform(0, 100)

B. Normal(0, 1)

C. Half-Cauchy(0, 0.5)

D. Fixed at 0.5

Module 13 Complete

「ベイジアンターンは数学に関するものではありません。それは誠実さに関するものです。つまり、仮定を可視化することです。」

すべてのシグナルが真実であるわけではありません。

モジュール 14:ネットワーク

メソッドは、私たちの信頼から患者を守ります。

モジュール 14:ネットワーク

🎯 Learning Objectives

Explain why pairwise comparisons are insufficient when many treatments exist
Interpret network geometry (nodes, edges, thickness)
うつ病の推移性、一貫性、間接証拠の役割を理解する
Interpret SUCRA rankings and league tables
Recognize when NMA assumptions are violated

A clinician faces a patient

。どの薬ですか?

一般的に処方される抗うつ薬は 21 種類あります。ほとんどの直接比較試験は 2 つまたは 3 つだけを比較します。 (2018、Lancet) は、522 件の試験と 116,477 人の患者を単一のネットワークに接続しました。

ネットワークメタ分析のロジック

1

Direct Evidence

Trials directly comparing A vs B give the most reliable estimate.

2

Indirect Evidence

A 対 C および B 対 C が存在する場合、A 対 B を推論できます。これは「推移的」仮定です。

3

Mixed Evidence

NMA combines both, weighted by precision, to rank all treatments simultaneously.

Interactive: Network Graph

それぞれノードは治療です。エッジの厚さは、これら 2 つの処理を比較する研究の数を表します。

Researcher

Transitivity & Consistency

Transitivity: 間接推定値 (共通のコンパレータによる) は直接推定値に近似する必要があります。これには、効果修飾子が比較全体で同様に分散される必要があります。

Consistency: 直接証拠と間接証拠を比較する統計的テスト。グローバル (治療設計による相互作用) テストとローカル (ノード分割) テストは、不一致ループの特定に役立ちます。

Researcher

SUCRA & P-scores

SUCRA

累計ランキング中のサーフェス。値が大きいほど、順位付けの確率が高いことを示しますが、優位性が保証されるわけではありません。

P-score

頻度主義者は、確率の要約をランク付けすることに似ています。効果の大きさと不確実性を考慮して解釈します。

Caution: Ranking is seductive but misleading when differences between treatments are small or uncertain. Always report credible/confidence intervals alongside ranks.

Methodologist

Component NMA

When interventions are complex (e.g., behavioral + pharmacological), component NMA decomposes multi-component treatments to estimate the individual contribution of each component. Uses additive models: effect(A+B) = effect(A) + effect(B) + interaction.

Cipriani Network

2018 年の Lancet 分析では、21 種類の抗うつ薬すべてがプラセボより効果的であることがわかりました。アミトリプチリン、ミルタザピン、ベンラファクシンが有効性で最高位にランクされました。アゴメラチン、フルオキセチン、エスシタロプラムは、受容性の点で最高ランクにランクされました (ドロップアウトが最も少ない)。

すべての結果で「勝った」単一の薬剤はありません。ネットワークは、ペアワイズ分析では見えないトレードオフを明らかにしました。

Decision Tree: Is NMA Appropriate?

NMA Feasibility Check

6 つの異なるスタチンを比較する 15 の RCT があります。直接的な証拠があるペアもあれば、ないペアもあります。

Check transitivity, then fit NMA

患者集団と研究デザインが比較全体で十分に類似していることを確認します。

間接的な証拠を無視します。

統計的検出力が失われ、証拠ベースにギャップが残ります。

Pool all into one pairwise comparison

証拠の構造に違反します。スタチンは異なる薬剤です。

Module 14 Quiz

Q1. NMA で間接証拠が有効であるためにはどのような仮定が必要ですか?

A. Transitivity — effect modifiers are balanced across comparisons

B. Homogeneity — I² must be below 25%

C. All studies must have similar sample sizes

D. すべての研究は二重盲検である必要があります

Module 14 Complete

「ネットワークは、ペアごとの比較ではできないもの、つまり治療選択の全体像を確認します。」

すべてのシグナルが真実であるわけではありません。

モジュール15: 個人

What was hidden in plain sight?

モジュール15: 個人

🎯 Learning Objectives

Explain why aggregate data can mask treatment–covariate interactions
Distinguish one-stage from two-stage IPD models
Recognize ecological bias in aggregate meta-analysis
Understand the practical challenges of IPD collection
Interpret treatment–covariate interaction plots

For decades, breast cancer trials

概要を公開。患者ではありません。

早期乳がん臨床試験担当者共同グループ (EBCTCG) は、数百件の臨床試験にわたって 100,000 人を超える女性から個人の記録を収集しました。彼らのIPDメタ分析は、タモキシフェンの利点がエストロゲン受容体の状態、つまり集計データでは目に見えないものに大きく依存していることを示しました。

概要が隠していたこと

公開されているタモキシフェンのすべての試験で全体的な結果が報告されています。何百もの研究によると、タモキシフェンには適度な効果があるようです。しかし、「適度な利益」は、重大な真実を隠した平均値でした。

隠れたサブグループの分割

RR 0.59

ER-positive subgroup: 41% reduction in recurrence

RR 0.97

ER-negative subgroup: essentially no benefit at all

全体的な統合効果、つまり反応する患者と反応しない患者の混合は、統計上のフィクションでした。一方のグループの利益を誇張し、他方のグループでは利益が存在しない場合の暗黙の利益を示す「控えめな」平均。

集計対個別の患者データ

AD

Aggregate: published effect + CI only

IPD

Individual: raw patient-level records

IPD により、(1) 一貫した結果の定義、(2) 患者特性によるサブグループ分析、(3) イベント発生までの時間モデリング、(4) 生態学的バイアスのチェックが可能になります。それは、 gold standard for exploring treatment effect modification.

Researcher

One-Stage vs Two-Stage IPD

1

Two-Stage

Analyze each study separately, then combine estimates (like standard MA). Simple but loses information.

2

One-Stage

単一の混合効果モデルをすべての患者データに同時に適合させることです。インタラクションやまれなイベントに対してはより強力です。

Key: どちらも研究のクラスタリングを考慮する必要があります。決して 1 つの大規模試験からのものであるかのように IPD をプールしないでください。これにより交絡が生じます (シンプソンのパラドックス)。

Methodologist

Ecological Bias

A meta-regression using study-level mean age might show older patients benefit more. But this could be ecological bias-研究レベルの関連性は患者レベルの真実を反映しません。 IPD だけが分離できる within-study from between-study effects.

全体が部分に嘘をつくとき

シンプソンのパラドックス: データが交絡変数によってグループ化されると、集計データに現れる傾向が逆転します。

実際のパラドックス

A mega-trial analysis found Treatment X beneficial overall. But それぞれのパラドックス研究、それは有害でした。どうやって？研究間のベースラインリスクの違いが幻想を生み出しました。つまり、より病気の集団がたまたまより多くの治療を受け、総利益が増大したのです。

Cates (2002, BMJ)、クラスタリングを考慮せずに研究全体をプールすると、効果の見かけの方向が逆転する可能性があることが示されました。

これが、IPD 1 段階モデルに研究をクラスタリング変数として含める理由であり、研究間の交絡が治療を装うことを防ぐためです。効果。

EBCTCG レガシー

EBCTCG の IPD メタ分析は、40 年間にわたって乳がん治療を定義してきました。彼らの 2005 年のタモキシフェンと無治療の分析では、ER 陽性腫瘍では明らかな利点が示されました (RR 0.59) が、ER 陰性腫瘍では利点がありませんでした (RR 0.97)。

IPD がなければ、全体的な効果は両方のグループにわたってプールされ、利点が薄まり、ER 陽性患者の効果の大きさが否定される可能性があります。ゲイン。

Decision Tree: When Is IPD Worth Pursuing?

Do you suspect treatment–covariate interactions?

Yes →

試験の 80% 以上から IPD を取得できますか?

Yes → One-stage IPD meta-analysis with interaction terms

No → 2 段階: 利用可能な IPD をリクエストし、残りの集計を行います

No →

Is ecological bias a concern?

Yes → IPD preferred even without interactions

No → Aggregate data meta-analysis may suffice

EBCTCG は、40 年間にわたる数百の試験からデータを収集しました。ほとんどの IPD メタ分析には 5 ～ 20 件の試験が含まれます。決定は野心ではなく質問によって決まります。

Methodologist

パターンの繰り返し

モジュール 3 を覚えていますか? HRT は観察研究では有益であるように見えましたが、RCT では有害であるように見えました。同じ集計マスキングが発生しました。つまり、全体的な利益がサブグループの害を隠しました。

その後の Women's Health Initiative の IPD 分析では、 timing mattered、閉経後 10 年以内に HRT を開始した女性は、それ以降に開始した女性とは異なる結果が得られたことが示されました。「タイミング仮説」は、公開された集計概要では見えませんでした。

教訓は繰り返します: 集計データは、重要な治療と共変量の相互作用を曖昧にする可能性があります。乳がんにおける ER の状態であっても、HRT におけるタイミングであっても、個人レベルのデータは要約に隠されているものを明らかにします。

Module 15 Quiz

Q1. 集約データのメタ分析に対する IPD の主な利点は何ですか?

A. 常により多くの研究が含まれています

B. コストが安く、より高速

C. It can explore treatment–covariate interactions without ecological bias

D. 変量効果モデルの必要性が排除されます

Module 15 Complete

「プールされたすべての推定値の背後には、集計では伝えることができない個人のストーリーがあります。」

異質性はノイズではなくメッセージです。

モジュール 16:用量

異質性はノイズではなくメッセージです。

モジュール 16:用量

🎯 Learning Objectives

Explain why simple pairwise comparisons miss dose–response relationships
Distinguish linear, quadratic, and spline dose–response models
Interpret restricted cubic splines with knots
Identify threshold effects and J/U-shaped curves
Understand model comparison with AIC/BIC

数十年にわたり、適度な飲酒

は、

「J 字型曲線」は、非飲酒者の心血管死亡率が適度な飲酒者よりも高いことを示しました。しかしストックウェルらは、 (2016) は、J カーブが元飲酒者 (病気のためやめた人) を「断酒者」として誤分類することによって生じた産物であることを実証しました。

A Scientific Consensus Built on Sand

2010 年までに、100 を超える観察研究で J カーブが確認されました。医学の教科書がそう教えてくれました。心臓専門医がそれを引用した。ワイン業界のロビイストは、それを巡る会議に資金を提供した。

100+

J カーブを確認する観察研究

15–25%

Lower cardiovascular mortality in moderate drinkers vs abstainers

証拠は圧倒的であるように見えた。しかし、比較グループである「禁欲者」が汚染されていたとしたらどうでしょうか?

病気の禁煙者

A Hidden Confounder

The Problem

People who stop drinking often do so because they are already ill-肝臓病、薬物相互作用、癌の診断など。これらの「元飲酒者」は、ほとんどの研究で「禁酒者」として分類されました。

The Effect: The reference group (abstainers) appeared less healthy- 禁酒が有害だからではなく、病気の人が禁酒に加わったためです。

When Stockwell et al. (2016, J Stud Alcohol Drugs) removed former drinkers and applied appropriate study-quality corrections: J カーブが消えました。保護効果は幻想でした。

Dose–Response Meta-Analysis

Standard meta-analysis asks: "Does treatment X work?" Dose–response meta-analysis asks: "At what dose 治療 X は最も効果的ですか?」複数の研究にわたる線量レベルと結果の関係をモデル化します。

Linear

Simplest: log(RR) = β × dose

Spline

Flexible: piecewise polynomials with knots

Fractional

Polynomial: dose^p1 + dose^p2

Researcher

Restricted Cubic Splines

RCS place knots 事前に指定された線量点で、それらの間の滑らかな多項式を当てはめます。通常、線量分布の分位点で 3 ～ 5 ノットです。境界ノットを超えて線形です。非線形性のテストは比較されます。スプラインモデルと単純な線形モデルを比較します。

AIC

Model Comparison

AIC/BIC は線形フィットとスプラインフィットを比較します。低い = 優れています。また、線形からの逸脱 (スプライン項の p 値) もテストします。

Interactive: Dose–Response Builder

さまざまな仮定でモデルの形状がどのように変化するかを確認します。

アルコールJ カーブの誤りが暴かれた

ストックウェルの2016年の再分析では、元飲酒者が「禁酒者」参照グループから正しく除外されると、適度な飲酒の保護効果が消失することが判明した。 J カーブは病気で禁煙する人のバイアスによって引き起こされました。

用量反応メタ分析により、曲線の形状は「ゼロ用量」をどのように定義するかに大きく依存するという真実が明らかになりました。間違った参照カテゴリーにより、幻の利益が生まれました。

When Curves Shape Policy

The phantom J-curve influenced alcohol guidelines worldwide:

UK

NHS Guidance (until 2016)

「適度な飲酒は心臓を守る可能性がある」と公式ガイドラインに登場。ストックウェルの修正後、英国は制限を週あたり 14 単位に修正しました。 all 飲酒者（以前は男性21名）。「安全」と宣言された量はありませんでした。

US

Dietary Guidelines Advisory Committee

J カーブ研究は 2015 年まで引用されていました。2020 年の委員会は、参照グループのバイアスを認めて、男性の制限を 1 日あたり 1 ドリンクに引き下げることを推奨しました。

AU

Australian Guidelines

Safe drinking limits were delayed by industry-funded J-curve research promoting “cardioprotective” moderate intake.

Decision Tree: Is Dose-Response Analysis Appropriate?

露出レベルが 3 つ以上ありますか (露出か非露出かだけでなく)?

Yes →

この関係はおそらく非線形でしょうか?

Yes → Restricted cubic splines (3–5 knots). Compare AIC with linear model.

No → Linear dose-response meta-regression may suffice

No →

Standard pairwise meta-analysis (no dose-response possible with only two levels)

Warning: 参照カテゴリがクリーンかどうかを常に確認してください。 J カーブのレッスン: 汚染された参照グループはファントム非線形性を生み出します。

Module 16 Quiz

Q1. What makes restricted cubic splines useful in dose–response meta-analysis?

A. They always produce a straight line

B. They flexibly capture non-linear dose–response curves

C. 必要な研究の数が減ります

D. They simplify the model to fewer parameters

Module 16 Complete

「投与量によって毒が作られます。そして曲線の形状によって、その毒が本物かどうかがわかります。」

証拠の不在は、不在の証拠ではありません。

モジュール 17: 脆弱性

証拠の不在は、不在の証拠ではありません。

モジュール 17: 脆弱性

🎯 Learning Objectives

脆弱性指数の計算と解釈
GOSH プロットを使用して影響力のある研究とサブセット効果を特定する
Interpret contour-enhanced funnel plots
Copas 選択モデルと PET-PEESE を出版バイアスに適用する
Understand how sensitivity analyses strengthen meta-analytic conclusions

Governments stockpiled billions

彼らが見ることができなかった証拠に基づいて。

H1N1の後、政府はオセルタミビル（タミフル）の備蓄に数十億ドルを費やした。コクランのチーム (Jefferson et al. 2014) は、未公開データにアクセスするために何年も戦いました。彼らが最終的にそれを実行したとき、合併症を防ぐための証拠は蒸発しました。

脆弱性指数

脆弱性指数では次のことが求められます。 "How many patients would need to change outcome to flip a statistically significant result to non-significant?" p > 0.05 になるまで、イベントの数が少ないグループにイベントを繰り返し追加します (非イベントをイベントに変換します)。

FI = 1

Extremely fragile. One patient flip changes conclusion.

FI > 8

Reasonably robust. Less sensitive to individual outcomes.

Interactive: Fragility Calculator

Enter a 2×2 table to calculate the fragility index. Watch events shift until significance flips.

Events

Total N

Treatment

Control

Researcher

GOSH Plots

研究の不均一性の図による概要 (GOSH) はメタ分析モデルを研究の考えられるすべてのサブセットに適合させます。各ドットは、1 つのサブセットのプール効果と I² をプロットします。クラスターは異なるサブグループを示唆します。外れ値の雲は、ある研究が異質性を推進していることを示唆しています。

k 個の研究には 2 つあります。^k−1 subsets. For k > 15, random sampling is used.

Researcher

Contour-Enhanced Funnel Plots

Standard funnel plots show effect size vs standard error. Contour-enhanced バージョンでは、p < 0.01、p < 0.05、および p < 0.10 の陰影領域が追加されます。欠落している研究が重要ではない領域にある場合、出版バイアスが発生している可能性があります。それらが重要な領域にある場合は、他の原因（研究の質など）によって非対称性が説明される可能性があります。

Methodologist

Copas Selection & PET-PEESE

1

Copas Selection Model

研究が発表される確率を、SE と効果の大きさの関数としてモデル化します。真の効果と選択メカニズムを共同で推定します。

2

PET-PEESE

Precision-Effect Test (PET): regress effects on SE. If intercept = 0, no true effect. PEESE uses SE² for better performance when a true effect exists.

オセルタミビルの物語

ロシュが資金提供した最初のメタ分析 (Kaiser 2003) では、オセルタミビルがインフルエンザ合併症を 67% 減少させることが示されました。しかし、10件の試験のうち8件は公表されていなかった。コクランが臨床研究報告書を入手した後、合併症に対する利益は有意ではない 11% に減少しました。

脆弱性は単なる統計的なものではなく、情報的なものでした。証拠ベース自体にはほとんどのデータが欠けていました。

デシジョンツリー: 脆弱性の結果を解釈する

脆弱性指数を計算しました。数字は何を意味しますか?

FI ≤ 3

Highly fragile. いくつかの異なる出来事があれば、結論は覆されるでしょう。解釈には細心の注意を払ってください。

FI 4–8

Moderately fragile. 小さな変動に敏感です。これを変える可能性のある未発表の試験はありますか?

FI > 8

Relatively robust. But remember: fragility is only one dimension. Publication bias can undermine even robust results.

Walsh et al. (2014, J Clin Epidemiol) は、トップジャーナルに掲載された 399 件の RCT において、脆弱性指数の中央値はわずか 8 であることを発見しました。25% 以上が FI ≤ 3 でした。臨床実践に影響を与える画期的な試験は、多くの場合、統計上の糸口に引っかかっていました。

Methodologist

Beyond the Index: Structural Fragility

オセルタミビルの物語は明らかにしました three types of fragility。そして脆弱性指数は、まず。

1

Statistical Fragility (FI)

p 値を反転させるイベントはいくつありますか?これは脆弱性指数で測定されるものです。個々の患者の転帰に対する感度を定量化します。

2

Informational Fragility

どの程度の証拠が隠されていますか?ロシュのオセルタミビル試験10件のうち8件は未公表だった。証拠ベースは構造的に不完全でした。

3

Analytical Fragility

研究者の自由度がどれだけあれば結論を変えることができますか?異なる結果定義、分析母集団、または統計手法。

モジュール 10 (パロキセチン) へのコールバック: 異なる結果定義を使用した再分析により、結論が完全に逆転しました。それは分析の脆弱性でした。エンドポイント自体に異議があったため、FI は決して計算されませんでした。完全な堅牢性評価では、3 つの側面すべてが検査されます。

Module 17 Quiz

Q1. 試験では、アームあたり 200 人の患者、治療で 12 のイベント、対照で 25 のイベントが行われます (p=0.03)。脆弱性指数は 3 です。これは何を意味しますか?

A. 効果量は正確に 3

B. Changing just 3 patient outcomes would flip the result to non-significant

C. 結果は 3 つの確認研究で非常に堅牢です

D. 研究には少なくとも 3 人の患者が必要です

Module 17 Complete

「それを打ち破るすべての試みで生き残った数が価値のある数字です」 "

すべてのシグナルが真実であるわけではありません。

モジュール 18: 資本

Certainty must be earned, not assumed.

モジュール 18: 資本

🎯 Learning Objectives

Identify how trial exclusion criteria create evidence gaps
PROGRESS-Plus フレームワークを適用して公平性を評価する証拠
Use PRISMA-Equity reporting guidelines
Understand transportability: when trial findings fail in practice
Design equity-sensitive search and synthesis strategies

SPRINT proved tight blood pressure control

saves lives. But whose lives?

画期的な SPRINT 試験では、糖尿病、脳卒中、心不全の患者は除外されていました。米国の高血圧患者の 75% 以上は資格を持っていません。証拠は強力でしたが、適用範囲は狭かったです。

スライド A: 欠落した多数派

ほとんどの患者を除外した試験

SPRINT は 9,361 人の患者を登録し、集中的な血圧管理 (目標 <120 mmHg) により心血管イベントが減少することを証明しました。 25% (HR 0.75、95% CI 0.64 ～ 0.89)。しかし、包含基準は別のことを物語っていました。

誰が除外されたか:

Diabetes - 高血圧症の米国成人の 35%
Prior stroke - 高血圧人口の 8%
Symptomatic heart failure — 6% of hypertensive adults
Expected survival <3 years - 最も虚弱な人々患者
Nursing home residents — excluded entirely
GFR <20 mL/min — advanced kidney disease

結果: 高血圧症の米国成人の 75% 以上は資格がありません。証拠は強力でした。しかし、誰のためのものでしょうか?

スライド B: 証拠の地理

証拠の出所

78%

of cardiovascular mega-trial participants came from high-income countries (2000–2020).

6%

from sub-Saharan Africa — where cardiovascular disease is rising fastest.

ポリピル試験: 5 件中 4 件は平均 BMI <25 の集団で実施されました。米国の平均BMIは30です。薬物代謝、併存疾患パターン、医療アクセス、遺伝的変異はすべて、集団によって異なります。 Efficacy in one population does not guarantee effectiveness in another.

参考：多国間試験とPROGRESS-Plusギャップ

PROGRESS-Plus Framework

P

Place of residence

R

Race / ethnicity

O

Occupation

G

Gender / sex

R

Religion

E

Education

S

SES (socioeconomic)

S

Social capital

Plus: Age, disability, sexual orientation, other vulnerable groups.

Researcher

PRISMA-Equity & Transportability

PRISMA-Equity PRISMAは、レビューで公平性がどのように扱われたかについての報告を要求するよう拡張しました：集団の特徴、不利な点によるサブグループ分析、および十分なサービスを受けられていない人々への適用性の評価

Transportability: 試験の有効性は現実世界の有効性と同じではありません。対象集団の分布に合わせて試験データの重みを再設定する方法が存在します。

スライド C: 輸送性に関する質問

Researcher

From Trial to Real World: Transportability

Transportability = 試験対象集団 X の結果を対象集団 Y に適用できますか?これは哲学的な質問ではなく、形式的な方法があります。

1

Inverse Probability of Participation Weighting (IPPW)

Re-weights trial participants so they resemble the target population on key covariates.

2

Generalizability Index

観察された特性に関して試験サンプルが対象集団とどの程度類似しているかを定量化します。

Stuart et al. (2015, Stat Med): SPRINT の結果を米国の高血圧人口に合わせて再重み付けすると、推定効果は減衰し、HR 0.82 (対治験では 0.75) となりました。治療はまだ効果があります。しかし、人口が変化すると規模も変わります。

SPRINT と欠落多数

SPRINT は、9,361 人の患者を対象とした適切に設計された試験でした。その発見（集中的な血圧管理と標準的な血圧管理のHR 0.75）は、世界中のガイドラインを変えました。しかし、その後の分析では、試験母集団に最も近いサブグループで効果が最も強く、除外されたグループについては不確実であることが示されました。

証拠の統合における公平性とは、単に「効果があるか?」と尋ねることを意味します。しかし、「誰に効果があるのか?」

ディシジョンツリー: レビューの公平性評価

ROOT: レビューの証拠はターゲットに似た母集団から得られていますか?

YES → Good. But check: Are subgroups (age, sex, ethnicity, SES) reported separately?

Yes: Use subgroup effects for population-specific recommendations
No: Flag as limitation — equity gap in reporting

NO → Does PROGRESS-Plus analysis reveal differential effects?

Yes: Population-specific recommendations needed. Consider transportability re-weighting.
No: Cautious generalization with explicit equity statement in discussion

スライド E: モジュール 3 へのコールバック

Methodologist

Callback: The HRT Lesson Revisited

モジュール 3 を覚えていますか? HRT の記事は、 healthy-user bias 有害な治療法を有益であるかのように見せかけていたことを示しました。 SPRINT は逆の問題を抱えている可能性があります。「健康なボランティア」効果により効果的な治療法が現れる可能性があります more effective than it would be in the real world.

すべてのメタ分析では次のことを問う必要があります。誰が含まれていたのか?誰が除外されましたか?それは重要ですか?

Module 18 Quiz

Q1. What does the PROGRESS-Plus framework help reviewers assess?

A. Statistical heterogeneity

B. Equity and applicability across disadvantaged populations

C. 含まれる研究の内部的妥当性

D. 証拠の全体的な確実性

Module 18 Complete

「弱者を排除する証拠は彼らに役立つと主張することはできません。」

すべてのシグナルが真実であるわけではありません。

モジュール 19:マシン

出所のない数字は数字ではない。

モジュール 19:マシン

🎯 Learning Objectives

Describe how AI/ML is used in systematic review screening
Explain active learning and human-in-the-loop workflows
Assess automation validation: recall, workload savings, and risk
アルゴリズムによるスクリーニングの限界とバイアスを認識する
証拠において責任ある AI 使用のためのフレームワークを適用する合成

When COVID-19 hit,

papers arrived faster than humans could read.

2021 年までに、300,000 を超える COVID 論文が存在しました。コクランは、機械学習分類子を使用して、迅速なレビューのための研究のトリアージを行いました。95% を超える再現率を維持しながら、スクリーニング作業負荷を最大 70% 削減しました。

The Flood

By April 2020, 4,000 COVID preprints appeared every week.

PubMed indexed 500 new COVID articles per day.

Cochrane's screening queue hit 10,000 unreviewed titles.

🔍 不可能性の数学

A pair of reviewers screens ~200 titles per day.

At 500 new articles/day, they fell further behind with every hour.

生きたレビューは、生きられる前に死にかけていました。

最初試み

このアイデアは新しいものではありませんでした。コーエンら。 (2006 年、JAMIA) は、機械学習がスクリーニング作業負荷を 50% 削減でき、再現率の低下は 5% 未満であることを初めて示しました。

📅

2006: Cohen et al. — SVM classifiers for drug class reviews. Proof of concept.

📅

2016: RobotReviewer (Marshall et al., JMLR) — ML for risk of bias assessment. Inter-rater reliability comparable to human reviewers.

📅

2021: ASReview (van de Schoot et al., Nature Machine Intelligence) — active learning that simulated 95% workload reduction.

しかし、シミュレーションは現実ではありません。 COVID は、大規模な最初の真のテストとなります。

AI in Systematic Reviews

1

Screening Prioritization

Active learning ranks citations by relevance. Reviewers screen the most likely relevant first.

2

データ抽出アシスト

NLP は、PICO 要素、結果、および結果を抽出します。常に人による検証が必要です。

3

Risk of Bias Assessment

ML classifiers predict RoB domains. Experimental—human judgment remains gold standard.

Researcher

Validating Automation

Recall

>95% required. Missing 1 study can change conclusions.

WSS@95%

Work Saved over Sampling at 95% recall.

Stopping

When to stop screening? Consecutive irrelevant threshold.

基本的な緊張: 自動化は時間を節約しますが、新たなエラーの原因が発生します。ツール、バージョン、トレーニングデータ、および停止基準を常に報告してください。

検証の危機

🔍 検証のパラドックス

マシンが関連するスタディを見逃したかどうかを確認するには、 you need a human to screen everything.

But if humans screen everything, なぜマシン?

The solution: prospective holdout validation.

Random 10% sample screened by both human and machine
比較: 人間が見つけたものをマシンは見逃しましたか?
If recall drops below 95%, retrain and expand human screening

信頼しますが、確認してください。機械はその役割を獲得しますが、それを継承するわけではありません。

Cochrane's COVID Response

コクランは、数百万のレコードで訓練された機械学習分類器を使用して、新型コロナウイルス感染症研究登録簿を構築しました。このシステムは、手動スクリーニングを数週間から数日に短縮しながら、99% の感度を達成しました。

しかし、このマシンはツールであり、代替品ではありませんでした。含まれているすべての研究は依然として人間の査読者によって検証されています。教訓: AI は査読者を補うものであり、置き換えるものではありません。

ほとんど発見されなかった研究

2020 年 6 月、RECOVERY 試験でデキサメタゾンの結果が発表されました。the first treatment proven to reduce COVID mortality (28-day mortality: 22.9% vs 25.7%, RR 0.83).

プレプリントは標準外のタイトルで medRxiv に掲載されました。このようなシナリオはパンデミック中に繰り返し発生しました。既存の用語で訓練された ML 分類子は、なじみのない枠組みを低くランク付けしました。

いくつかの実際のレビューでは、フラグが設定されたタイトルをスキャンしている人間の査読者が主要な薬剤名を認識し、分類子が優先順位を下げた研究をエスカレートしました。

これらの人間がいなかったら、画期的な治療結果が世に出るまでに何週間もかかっていたかもしれません。レビュー。

マシンの読み取りが速くなります。人間はより深く読みます。どちらか一方だけでは十分ではありません。

Decision Tree: When Should You Use AI?

あなたのレビューは 5,000 以上のタイトルを審査しますか?

Yes → Consider AI-assisted screening

Active learning prioritization. Dual-screen random 10% holdout. Stop when 3 consecutive batches yield 0 relevant studies.

Report: classifier type, training data, recall on holdout, stopping rule.

No → Manual screening is feasible

For <5,000 titles, dual human screening remains gold standard. AI adds complexity without proportionate benefit.

これは継続的なレビューですか?

If yes → AI is especially valuable. Continuous classifier retraining on new evidence. But: 最終的な包含決定をマシンに行わせないでください。

パターンの繰り返し

Methodologist

パターンの繰り返し

モジュール 6 を覚えていますか?ポルダーマンスは、周術期のベータ遮断薬のガイドラインを 10 年間にわたって導いた DECREASE データを捏造しました。

AI can now detect statistical anomalies automatically:

GRIM test: 報告された平均値は整数のサンプルサイズと一致していますか?
SPRITE: 報告された要約統計量は、もっともらしい個別データから再構成できますか?
Statcheck: Do reported p-values match the test statistics?

これらのツールは異常を発見しましたin hundreds of published papers—faster than any human auditor.

しかし、マシンはフラグを立てます。人間の裁判官。撤回の決定は依然として人間的なものです。

Module 19 Quiz

Q1. システマティックレビューにおける AI 支援スクリーニングの最小許容リコールはどれくらいですか?

A. 80%

B. 90%

C. >95%

D. 100%

Module 19 Complete

「機械はより速く読み取ります。人間はより深く読み取ります。一緒に、彼らは真実を読み取ります。」

すべてのシグナルが真実であるわけではありません。

モジュール 20:定性的

メソッドは、私たちの信頼から患者を守ります。

モジュール 20:定性的

🎯 Learning Objectives

Explain why some questions require qualitative evidence synthesis
Describe meta-ethnography (Noblit & Hare) and thematic synthesis
Apply the CERQual framework to assess confidence in qualitative findings
Understand mixed-methods synthesis approaches
Recognize when qualitative evidence changes practice

WHO は質問をした

どの RCT も回答できませんでした。

なぜ世界中の女性は出産中に軽蔑や虐待を経験していますか?ボーレンら。 (2015) は、34 か国からの 65 の定性的研究を、虐待の 7 つの領域の枠組みに統合しました。

スライド A: ランダム化を超えた質問

ランダム化を超えた質問

2014 年、WHO は世界的危機に対処するためのパネルを招集しました。女性たちは身体的虐待、言葉による屈辱を受け、出産時のケアも拒否されていた。これは珍しい出来事ではありませんでした。 34 countries.

They needed to understand WHY. What drives disrespect and abuse in maternity care?

からの報告があり、RCT はこれに答えることができませんでした。女性を虐待的なケアと敬意を持ったケアにランダムに割り当てることはできません。助産師の目を盲目にすることはできません。リッカート尺度で「尊厳」を測定することはできません。 証拠は定性的である必要があります。

Meta-Ethnography

Developed by Noblit & Hare (1988), meta-ethnography translates 数値を集計するのではなく、研究全体の概念。一次データ（参加者の引用）と二次データ（著者の解釈）から新しい解釈枠組み（三次構成）を生成します。

Reciprocal

研究は相互に確認します

Refutational

研究は相互に矛盾します

Line of
argument

研究は新しいものを構築します理論

What Bohren Found: A Taxonomy of Mistreatment

1. Physical abuse

Hitting, pinching, slapping during labor

2. Sexual abuse

Inappropriate touching, non-consensual procedures

3. Verbal abuse

Shouting, threats, judgmental comments

4. Stigma & discrimination

Based on HIV status, ethnicity, age, poverty

5. Professional standards failure

Neglect, lack of informed consent

6. Poor rapport

Poor communication, dismissiveness

7. Health system conditions

Overcrowding, understaffing, lack of supplies

65 の研究。 34か国。同じパターンが言語、文化、システムを超えて繰り返されます。これは逸話ではありませんでした。これは合成された証拠です。

Researcher

CERQual: 定性的証拠の信頼性

CERQual assesses confidence in qualitative review findings across four components:

1

Methodological Limitations

貢献研究の質。

2

Coherence

データが発見をどの程度裏付けているか。

3

Adequacy

データの豊富さ (データの数だけではない)

4

Relevance

レビュー質問のコンテキストへの適用性。

スライド C: 証拠から行動へ

When Qualitative Evidence Changes Practice

Bohren's synthesis informed the WHO's 2018 Recommendations on Intrapartum Care for a Positive Childbirth Experience. Specific changes grounded in qualitative evidence:

Rec. 15

Companionship during labor

Rec. 1

Respectful maternity care

Rec. 3

Effective communication

Rec. 12

Emotional support

定性的証拠に基づいたこれらの推奨事項は、現在、WHO 加盟 194 か国の産科ケアの指針となっています。いかなる森林区画もそれらを生み出すことはできませんでした。 I² 統計ではそれらを明らかにすることはできませんでした。

Bohren's Framework of Mistreatment

2015 年の定性的統合では、身体的虐待、性的虐待、言葉による虐待、偏見と差別、職業的基準を満たしていない、不完全な関係、医療制度の状態の 7 つの領域が特定されました。この枠組みは、分娩中ケアに関する WHO 勧告 (2018 年) に影響を与えました。

どの p 値も、分娩中に平手打ちされた経験を捉えることはできません。定性的統合は、数字では表現できないものに声を与えました。

Decision Tree: When Is Qualitative Synthesis Appropriate?

ROOT: あなたの研究の質問は、経験、認識、障壁、または促進者に関するものですか?

YES → あなたの質問は、「WHETHER」だけでなく、「HOW」または「WHY」に関するものですか?

Yes: Qualitative evidence synthesis (meta-ethnography, thematic synthesis, or framework synthesis)
No: 混合手法を検討してください: 効果のための定量的 + 定性的な分析メカニズム

NO → ご質問は有効性/有効性に関するものですか?

Yes: Quantitative meta-analysis
But: 実装の障壁の定性的レビューで補完します (CERQual で評価)

Key insight: 最も強力な系統的レビューは両方の答えです: それは機能しますか? (定量的) そして、なぜそれがうまくいくのか、それとも失敗するのか? (定性的)

Module 20 Quiz

Q1. What distinguishes meta-ethnography from quantitative meta-analysis?

A. これには 3 ～ 5 件の研究のみが含まれています

B. It translates concepts across studies rather than pooling numbers

C. It does not require a systematic search

D. It is less rigorous than quantitative synthesis

Module 20 Complete

「重要なものすべてが重要であるわけではありません。重要なものすべてが重要であるわけではありません。」

異質性はノイズではなくメッセージです。

モジュール 21:多変量

異質性はノイズではなくメッセージです。

モジュール 21:多変量

🎯 Learning Objectives

研究内の結果がいつ相関しているかを認識する
Explain multivariate random-effects models
Apply robust variance estimation (RVE) for dependent effect sizes
ネストされた 3 レベルのモデルを理解するデータ
Choose between multivariate approaches based on data structure

Cardiovascular trials report

死亡率、心筋梗塞、脳卒中など。

これらの転帰は患者内で相関しています。死亡した患者には MI エンドポイントを設定することはできません。標準的なメタ分析では、依存関係を無視し、証拠を二重にカウントする可能性を無視して、各結果を独立して扱います。

スライド A: 便利な嘘

誰も質問しない仮定

標準的なメタ分析の教科書を開いてください。モデルは、各研究が one independent effect size. But reality is different.

単一の心血管試験で死亡率、心筋梗塞、脳卒中、血行再建に寄与していると仮定しています。単一の心理療法研究では、3、6、12 か月後のうつ病、不安、生活の質が報告されています。

30 trials

× 4 outcomes

= 120

effect sizes

Most analysts either: (a) treat all 120 as independent (inflating precision by a factor of √4), or (b) 1 つの結果を選択し、残りは破棄します。 どちらのアプローチも間違っています。

依存関係の問題

In standard pairwise meta-analysis, each study contributes one effect size. But many studies report multiple outcomes, subgroups, timepoints, or arms—creating dependent 効果の大きさ。これを無視すると、精度が増大し、推論が歪められます。

RVE

Robust Variance Estimation. Sandwich estimator handles unknown correlation.

3-Level

Study → Outcome nesting modeled explicitly.

Researcher

Robust Variance Estimation

RVE (Hedges, Tipton & Johnson, 2010) uses a sandwich-type 依存効果間の真の相関関係に関係なく、有効な標準誤差を提供する推定器。研究内の相関関係を知ったり推定したりする必要はありません。 20 件以上のスタディに最適です。

Small-sample correction: Tipton & Pustejovsky (2015) は、クラスター数が少ない場合にサタースウェイトの自由度を使用して、RVE 用の小サンプル補正 (CR2) を開発しました。

スライド B: 数学的真実

Researcher

What Dependence Does to Your Confidence Intervals

If 4同じ研究からの結果には研究内相関関係 ρ = 0.5 があります:

Treating as independent

CI width = X

依存性の説明

CI width = 1.58X

信頼区間は 58% widerである必要があります。これを無視したすべてのメタ分析では、誤って正確な結果が公表されました。

RVE (Hedges, Tipton & Johnson, 2010): Uses a “sandwich” variance estimator that produces correct standard errors without needing to know the exact within-study correlation.

Researcher

Three-Level Models: Making Structure Explicit

1

Level 1: Sampling Variance

Measurement error within each effect size estimate.

2

Level 2: Within-Study Variance

結果と時点は、単一の研究内でも異なります。

3

Level 3: Between-Study Variance

研究は、母集団、設定、方法が互いに異なります。

Example: うつ病に対する心理療法のメタ分析において (k=50 研究、180 効果)サイズ）、 35% の分散は研究内（異なる結果）であり、 65% は研究間（異なる治療法、集団）でした。この分解により、効果がネストされている場合 (研究内の複数の結果や研究グループ内の研究など)、不均一性がどの程度であるかが明らかになります within vs between studies.

Methodologist

Three-Level Models: Formal Framework

。 three-level model 分散を (1) サンプリング分散 (レベル 1)、(2) 研究内分散 (レベル 2)、および (3) 研究間の分散 (レベル 3) に分割します。これにより、レベル間で強度を借用しながら正しい推論が維持されます。

心血管チャレンジ

スタチンのメタ分析には、死亡率、心筋梗塞、脳卒中、血行再建を報告する 30 件の試験が含まれる可能性があります。つまり、30 クラスターからの 120 の効果サイズになります。これらを 120 の独立した推定値として扱うと、研究内の相関関係に関連する係数によって精度が高まります。

RVE or multivariate models handle this correctly—producing wider, honest confidence intervals.

Decision Tree: Which Approach for Dependent Effect Sizes?

ROOT: メタ分析には研究ごとに複数の効果がありますか?

YES → 研究内の相関関係を知っていますか (または推定できますか)?

Yes: Multivariate random-effects model (most efficient)
No: RVE with small-sample correction (robust to unknown correlations)

NO → Standard univariate random-effects model

Sub-question: 複数の効果は異なる結果からのものですか、タイムポイント、それともサブグループ?

Different outcomes → Three-level model or RVE with clustering
Different timepoints → Network of timepoints with temporal correlation
Different subgroups → Consider if subgroups are meaningful or should be averaged

Module 21 Quiz

Q1. What problem does Robust Variance Estimation (RVE) solve?

A. Publication bias

B. 同じ研究からの複数の効果量間の依存性

C. Between-study heterogeneity

D. Small-study effects

Module 21 Complete

「結果が複雑な場合、それらが独立しているかのように振る舞うことは便宜上の嘘です。」

出所のない数字は数字ではない。

モジュール 22:証明

出所のない数字は数字ではない。

モジュール 22:証明

🎯 Learning Objectives

Understand how computational errors propagate through policy
再現性を定義し、複製可能性と区別する
証拠のハッシュと証明保持を適用する数値
Use reproducibility checklists for meta-analysis
事前登録とオープンデータの役割を認識

A graduate student opened a spreadsheet

、緊縮財政時代は誤りの上に築かれていたことが判明した。

2010年、ラインハルト氏とロゴフ氏は、債務対GDP比が90%を超える国はマイナス成長であると主張した。これはヨーロッパ全土の緊縮政策に影響を与えた。 2013 年、トーマス・ハーンドンは、平均から 5 か国を除外する Excel のエラーを発見しました。修正された結果: 崩壊ではなく、緩やかなプラス成長。

Reproducibility vs Replicability

Reproducible

Same data + same code = same result

Replicable

新しいデータ + 同じ手法 = 一貫した結果

Reproducibility is the minimum standard。他の人が報告されたデータからプールされた推定値を再現できない場合、分析は検証できません。メタ分析では、抽出されたデータ、分析スクリプト、ソフトウェアバージョン、およびランダムシードを共有する必要があります。

Researcher

Proof-Carrying Numbers

Every number in a meta-analysis should carry its provenance: データの出所、変換方法、生成されたコード。 Evidence hashing creates a cryptographic fingerprint of inputs so any change (accidental or deliberate) is detectable.

SHA

Input Hash

抽出データの SHA-256 ハッシュ。 1 つのセルが変更されると、ハッシュも変更されます。出所チェーン: データ → コード → 結果 → ハッシュ。

Interactive: Reproducibility Checklist

各項目にチェックを入れて、メタ分析の再現性を評価します。レビューのスコアはどのくらいですか?

経済を変えた Excel のエラー

ラインハルトロゴフの「債務時代の成長」は、議会証言、欧州委員会報告書、IMF 政策概要で引用されました。 Excel のエラー (行 30 ～ 34 が AVERAGE 式から除外されている) は、オーストラリア、オーストリア、ベルギー、カナダ、デンマークの 5 か国が単に欠落していることを意味します。

修正された平均は、-0.1% から +2.2% になりました。緊縮政策は何百万人もの人々に影響を与えました。再現性は学術的な完璧主義ではありません。それは大惨事に対する安全策です。

Remember Module 5?

DECREASE Through the Lens of Reproducibility

ドン・ポルダーマンスによるDECREASE試験は、データが捏造されたとして撤回されました。証拠を運ぶ数字（ハッシュ化された入力、来歴連鎖、検証された計算）が存在していれば、捏造は検出可能だったでしょう before 証拠はメタ分析に入り、手術ガイドラインを変更しました。

Module 22 Quiz

Q1. ラインハルト・ロゴフの誤りとは何でしたか?

A. They used too small a sample

B. An Excel formula excluded 5 countries, reversing the conclusion

C. They studied the wrong time period

D. They used the wrong statistical test

Module 22 Complete

「出所のない数字は数字ではありません。再現性のない分析は証拠ではありません。」

Certainty must be earned, not assumed.

モジュール 23: 初めてのメタスプリント

Certainty must be earned, not assumed.

モジュール 23: 初めてのメタスプリント

🎯 Learning Objectives

40 日間のシステマティックレビューワークフローを理解する
Map the Seven Principles to real practice phases
Recognize Definition-of-Done (DoD) gates as quality checkpoints
Appreciate why structure prevents the failures you've studied
Graduate ready to conduct (not just understand) meta-analysis

その道のり完了

ストーリーを学びました。

今度は道を歩まなければなりません。

あなたが調べたすべての証拠の逆転は、チームのおかげで起こりました。 knew メソッドはありましたが、実行されませんでした follow them systematically.

META-SPRINT フレームワーク

5 つのフェーズゲートを備えた 40 日間の構造化ワークフロー。各ゲートは、品質が保証されるまで先に進むことを妨げる Defining-of-Done (DoD) チェックポイントです。

40

Days to Completion

5

DoD Phase Gates

Day 34

Hard Freeze

Why 40 days? 厳密さを保つには十分な長さ、スコープのクリープを防ぐのに十分な短さです。ロシグリタゾンの心臓信号は、透明性を強制する期限がなかったため、何年も埋もれていました。

5 つのゲート

5 つのフェーズゲート

A

DoD-A: Protocol Lock (Days 1-3)

PICOS defined, timepoint rules set, model choices pre-specified. No moving target.

B

DoD-B: Search Lock (Days 6-10)

All databases searched, grey literature checked, PRESS validated. No hidden studies.

C

DoD-C: Extraction Lock (Days 10-28)

Dual extraction, provenance linked, RoB assessed. No fabricated numbers.

The Five Phase Gates (continued)

D

DoD-D: 分析ロック (21 ～ 33 日目)

Forest plots generated, sensitivity analyses run, heterogeneity explored. No cherry-picking.

E

DoD-E: Submission Lock (Days 33-40)

GRADE certainty rated, clinical summary written, manuscript finalized. No overconfidence.

Day 34 Freeze: その後は新しい研究を追加できません。 34 日目。これにより、業界が有利な研究を「発見」し続けた BMP 脊椎手術メタ解析を悩ませていた「武器化されたスコープクリープ」が防止されます。

実際の 7 つの原則

Every principle you learned maps to a specific phase gate:

DoD-A "すべての信号が真実であるわけではありません— 何をカウントするかを事前に指定します証拠

DoD-B "What was hidden in plain sight?" — Search comprehensively

DoD-C "出所のない数字は数字ではありません" — すべてのデータポイントをリンクします

DoD-D "異質性はノイズではなくメッセージです" — Investigate, don't ignore

DoD-E "Certainty must be earned, not assumed" — GRADE everything

レッドチームの原則

自分のチームが破ろうとしています

毎日、交代する 2 人のチームメンバーが敵対者として 12 分間かけてデータ品質をチェックします。このようにして、Boldt の不正行為はフレンドリーなレビューではなく、ありえない採用率に気づいた懐疑的なチェックによって捕らえられたのです。

CondGO: When Things Go Wrong

What happens when you discover a critical problem mid-sprint?

CondGO = Conditional Go

A bounded rescue protocol. You have exactly 72 hours 許可されたアクションのみを使用して問題を解決します。修正できない場合は、レビューを中止する必要があります。

📖 Avandia のレッスン: GSK は 2000 年に心血管信号を確認しましたが、強制的な期限はありませんでした。彼らは7年間「見守り続けた」。数万人が被害を受けた。 CondGO が存在するのは、「最終的には対処します」が人を殺すからです。

あなたはこのコースをストーリーから始めました。

あなたは練習の準備ができて終了します。

META-SPRINT ワークフローは、これまでに学んだすべてを取り入れ、失敗を防ぐ 40 日間のシステムに構築します。

実際の系統的レビューを実施する準備ができたら、META-SPRINT アプリケーションを開きます。ここで学んだストーリーが、すべてのステップでリマインダーとして表示され、あなたを導きます。

ストーリー: CTT コラボレーション — メソッドが数百万を節約するとき

What does it look like when every principle is followed?

REAL DATA

コレステロール治療臨床試験担当者 (CTT) のコラボレーションは、メタ分析のゴールドスタンダードです。彼らは 170,000 人以上の参加者から個々の患者データを取得 across 26 statin trials. Pre-specified protocol. IPD from all major trials. Standardized outcomes. Result: statins reduce major vascular events by 21% per mmol/L LDL reduction (RR 0.79, 95% CI 0.77-0.81), regardless of baseline risk. This finding, replicated across 15 年間にわたる 5 つのメタ分析, has prevented an estimated millions of heart attacks and strokes worldwide.

適用された 7 つの原則

CTT のストーリーは、このコースのすべての原則に従った場合に何が起こるかを示しています。代替案を検討してください:

パス A: 原則なし

No protocol. Published data only. No RoB. No heterogeneity investigation. No GRADE.

↓

Conflicting small trials. Statin controversy persists. Millions untreated.

OUTCOME: Preventable cardiovascular deaths continue

パス B: CTT 方式

事前登録プロトコル。すべての試験からの IPD。標準化された結果。透過的なメソッド。 GRADE 確実性が高い。

↓

明確な答え。世界的なガイドラインが変わります。利益を受ける人に処方されるスタチン。

OUTCOME: Millions of lives saved by rigorous evidence synthesis

THE REVELATION

このコースのすべての原則は、それが存在しないと害が生じるために存在します。 CTT コラボレーションは、方法が厳密であり、データの出所があり、バイアスが評価されて確実性が得られる場合、メタ分析が医療において最も強力なツールになることを証明しています。あなたは今、これらの原則を実践しています。使用してください。

Capstone Quiz

1. META-SPRINT の 34 日目の「ハードフリーズ」の目的は何ですか?

A.ピアレビューの時間を確保するため

B.後から追加されたスタディによる結果の操作を防ぐため

C. To speed up publication

D.ジャーナルの締め切りに合わせて調整するため

2. The CondGO protocol gives teams how long to fix critical problems?

A. 24 hours

B. 48 hours

C. 72 hours

D. 1 week

3. Red-team adversarial QA caught Joachim Boldt's fraud by noticing:

A. Impossible patient recruitment rates

B. p-hacking in statistical tests

C. Inconsistent effect sizes

D. Whistleblower testimony

あなたが学んだストーリーは歴史ではありません。

これらは、今後の作業を守る警告です。

最初のメタ分析を行うときは、ファネルを無視する前に、
remember CAST before you trust a signal,
remember Poldermans before you skip provenance,
レボセチンを思い出してください。

これで準備は完了です。構造に従ってください。謙虚に行動してください。 7 つの原則に従ってください。

すべてのシグナルが真実であるわけではありません。

モジュール 24: 最終試験

Certainty must be earned, not assumed.

Final Examination

Final Exam: Part 1 of 2

メタ分析の原則の習熟度をテストします。各質問は、コースの中核となる概念を取り上げています。

Q1. ある研究者は、「健康に対する運動の影響」を研究したいと考えています。この調査質問の主な問題は何ですか?

A. It lacks randomization

B. Sample size is too small

C. It is not answerable—lacks specific PICO elements

D. It lacks ethical approval

Q2. ファンネルプロットでは、左下の領域に調査が欠落しているという顕著な非対称性が示されています。これは何を示唆していますか?

A. Large studies have more precise estimates

B. 小規模な否定的な研究は未発表である可能性があります

C. The true effect is stronger than estimated

D. Random sampling error

Q3. メタ分析では、I² = 85% および τ² = 0.42 と報告されています。最も適切な解釈は何ですか?

A. There is an 85% chance of a true effect

B. The effect size is very large

C. Substantial between-study variance exists; investigate sources

D. 結果は臨床的に重要です

Q4. GRADE では、ランダム化対照試験からの一連の証拠に対する開始確実性は何ですか?

A. High

B. Moderate

C. Low

D. Very low

Q5. In RoB 2.0, which domain assesses whether outcome assessors knew the treatment allocation?

A. D1: Randomization process

B. D2: 意図した介入からの逸脱

C. D3: 結果の欠落データ

D. D4: 結果の測定

Final Exam: Part 2 of 2

Q6. CAST 試験では、抗不整脈薬は不整脈を抑制したにもかかわらず死亡率を増加させたことが示されました。これは次の例です:

A. Random sampling error

B. Surrogate outcome failure

C. Confounding by indication

D. Reverse causation

Q7. When should a random-effects model be preferred over a fixed-effect model?

A. When sample sizes are large

B. 結果が二値の場合

C. When between-study heterogeneity is expected

D. When publication bias is suspected

Q8. According to ICEMAN criteria, which makes a subgroup analysis MORE credible?

A. Hypothesis specified a priori

B. Large number of subgroups tested

C. No biological rationale

D. Inconsistent effects across trials within subgroup

Q9. What assumption must be checked in network meta-analysis to ensure valid indirect comparisons?

A. All studies have equal sample sizes

B. すべての研究は同じ結果を測定します

C. Transitivity (consistency of effect modifiers)

D. Double-blinding in all trials

Q10. 治験逐次分析 (TSA) では、無益境界を越えることは何を示していますか?

A. 治療の原因害

B. 今後の研究では有意義な効果が示される可能性は低い

C. 証拠は有益であると決定的である

D. メタ分析の能力が不十分

Part 1 Complete — continue to Part 2 (Advanced Modules)

パート 2: 高度なモジュールの質問(Q11-Q25)

Final Exam: Part 2 of 2 (Advanced)

Questions 11–25 cover Modules 13–22 (Bayesian, NMA, IPD, Dose-Response, Fragility, Equity, AI, Qualitative, Multivariate, Reproducibility).

Q11. ベイジアンメタ分析では、多くの研究で曖昧な事前分布を使用するとどうなりますか?

A. 事後分布は頻度主義の結果とよく一致します

B. 事前分布は事後分布を支配します

C. The credible interval becomes infinitely wide

D. モデルは失敗しますconverge

Q12. シプリアーニの抗うつ薬 NMA では、なぜ単一の薬剤が「勝者」と宣言されなかったのですか?

A. 研究が少なすぎます

B. Different drugs ranked best on different outcomes

C. 入手可能な間接的な証拠はありません

D. SUCRA はあり得ません計算済み

Q13. なぜ 1 つの大規模試験からのものであるかのように IPD をプールしてはいけないのですか?

A. IPD always has fewer studies than aggregate

B. 研究のクラスタリングが無視され、交絡が生じます

C. イベントまでの時間データを処理できません

D. Binary outcomes cannot be pooled

Q14. What caused the alcohol "J-curve" to disappear in Stockwell's reanalysis?

A. 何も示さない新しい研究が追加されました利点

B. 元飲酒者が禁酒参照グループから正しく除外されました

C. サンプルサイズが増加しました

D. 交絡因子の調整が改善されました

Q15. オセルタミビルの物語で、コクランは未発表の臨床研究レポートにアクセスして何を発見しましたか?

A. この薬はまったく効果がありませんでした

B. その効果は当初考えられていたよりも大きかった

C. 合併症に対する利点はほとんどなくなりました

D. Side effects were more common than reported

Q16. 米国の高血圧患者の何パーセントが SPRINT 試験の対象にならなかったでしょうか?

A. About 25%

B. About 50%

C. Over 75%

D. Nearly 100%

Q17. Why is AI considered an "augmenter" rather than a "replacer" in systematic reviews?

A. AI is slower than human reviewers

B. AI has perfect recall

C. AI screens fast but cannot make human-level contextual judgments

D. AI is too expensive for most reviews

Q18. What does the "adequacy" component of CERQual assess?

A. 研究の数のみ

B. 発見を裏付けるデータの豊富さと量

C. 研究全体での発見の一貫性

D. Generalizability to other populations

Q19. A meta-analysis includes 30 statin trials, each reporting 4 correlated outcomes (120 effect sizes). Which approach is correct?

A. Treat all 120 as independent effect sizes

B. Use RVE with small-sample correction

C. Pick only one outcome per study

D. 各研究内の 4 つの結果の平均

Q20. ラインハルト・ロゴフの誤りでは、高額債務の修正平均増加率はいくらでしたか

A. −0.1% (same as claimed)

B. +2.2%

C. 0%

D. +5%

Passing Score: 15/20 across both parts

該当するモジュールに戻って、見逃した質問を確認してください。各質問は核となる概念をテストします。

すべてのシグナルが真実であるわけではありません。

メソッドは、私たちの信頼から患者を守ります。

Congratulations

あなたは証拠逆転: メタ分析コースを完了しました。

あなたの統合が真実に導かれ、あなたの情報収集が知恵に導かれますように。
そして謙虚さによって結論が導かれますように。

ザ・セブン原則:

「すべての信号が真実であるわけではありません。」

「方法は患者を私たちの信頼から守るものです。」

"What was hidden in plain sight?"

「出所のない数字は数字ではありません」 "

「異質性はノイズではなくメッセージです。"

"証拠の不在は不在の証拠ではありません。"

"Certainty must be earned, not assumed."

「まっすぐな道へ導いてください...」

Your Progress

7 つの原則

Badges Earned

Learning Streak

モジュール 0:冒頭

🎯 Learning Objectives

What is Meta-Analysis?

研究をプールする理由

Increase Statistical Power

Improve Precision

Resolve Disagreement

Explore Heterogeneity

プールしない場合

証拠の階層

7 つの原則

Module 0 Quiz

1.メタ分析で研究をプールしてはいけない場合があるのはなぜですか?

2. RCT の系統的レビューは証拠階層のどこに位置しますか?

モジュール 1: 質問

🎯 Learning Objectives

The Observation

The Response

誰もが納得するロジック

CAST: The Cardiac Arrhythmia Suppression Trial

結果: 1989 年 4 月

The Humanコスト

ロジック - 再考

What Went Wrong: The Surrogate Trap

PICO フレームワーク

調査演習: CAST 前の証拠

Before: Observational Logic

After: CAST RCT (1989)

証拠総合のための教訓

生物学的妥当性は証明ではない

Surrogate endpoints can mislead

ランダム化試験は最も強力な因果関係の証拠を提供する

合意は証拠ではありません

REAL DATA

Module 1 Quiz

1。抗不整脈ロジックの根本的なエラーは何でしたか?

2。 PICO では、「O」は何を表しますか?なぜそれが重要ですか?

モジュール 2: プロトコル

🎯 Learning Objectives

看護師の健康調査

隠れた偏見

WHI: The Women's Health Initiative

結果: 2002 年 7 月

REAL DATA

PROSPERO Registration

検索する前に登録

決定をロック

Document Amendments

Prevent Duplication

Module 2 Quiz

1。看護師健康調査では、WHI では得られなかった HRT の利点が示されたのはなぜですか?

2. What is the primary purpose of PROSPERO registration?

モジュール 3: 検索

🎯 Learning Objectives

公開された証拠 (2007 年以前)

Nissen's Discovery: May 2007

メタ分析の結果

The FDA Advisory Committee: July 2007

The Aftermath

What a Comprehensive Search Requires

PRESS チェックリスト

調査質問の翻訳

ブール演算子と近接演算子

Subject Headings

Text Words

PRESS Checklist (continued)

Spelling, Syntax, Line Numbers

制限とフィルター

Database Translation

REAL DATA

Module 3 Quiz

1。ロシグリタゾン心血管シグナルを明らかにした証拠ソースの種類は何ですか?

2. What does PRESS stand for?

モジュール 4: スクリーニング

🎯 Learning Objectives

Vioxx の台頭