증거 반전: 메타 분석 과정

모든 신호가 진실은 아닙니다.

모듈 0: 시작

🎯 Learning Objectives

메타 분석을 정의하고 증거 합성에서 메타 분석의 역할을 설명합니다
연구를 수행해서는 안 되는 경우를 식별합니다. 풀링됨
증거 계층 구조 및 체계적 검토 위치 설명
Recognize that meta-analysis can mislead when done poorly
이 과정의 기반이 되는 7가지 원칙을 상기하세요

이 과정은

약물이 잘못되었기 때문에 존재합니다.

한 번도 없습니다. 드물지 않습니다. 자꾸. 증거가 타당하다고 믿었던 환자들을 죽이는 방식으로요.

What is Meta-Analysis?

동일한 질문을 다루는 여러 독립적인 연구의 결과를 결합하는 통계적 방법입니다.

1976

Term coined by Gene Glass

~50,000

Published per year

#1

Evidence hierarchy*

*When well conducted. Quality of conduct matters more than study design alone — as GRADE recognizes.

합동 연구를 수행해야 하는 이유

1

Increase Statistical Power

Individual studies may be too small to detect effects.

2

Improve Precision

Narrower confidence intervals around effect estimates.

3

Resolve Disagreement

연구가 충돌하는 경우 풀링을 통해 신호를 명확히 할 수 있습니다.

4

Explore Heterogeneity

Identify why effects differ across populations or settings.

But meta-analysis can also

MISLEAD

When done poorly, it amplifies bias rather than truth.

합동을 수행하지 않는 경우

1

연구는 근본적으로 다른 것을 측정합니다(사과와 오렌지)

2

Extreme heterogeneity that cannot be explained

3

One study dominates all others (megastudy problem)

4

연구는 조정할 수 없는 편향 위험이 높습니다.

풀링은 권리가 아니라 특권입니다.

The decision to combine must be defended.

증거 계층

Systematic Reviews & Meta-Analyses of RCTs

Randomized Controlled Trials

Cohort Studies

Case-Control Studies

Case Series / Expert Opinion

계층 구조의 위치는 연구 유형이 아니라 방법론 품질에 따라 다릅니다.

이 과정은

evidence reversals.

각 모듈은 의학이 어떻게 잘못되었는지에 대한 이야기로 시작됩니다. 그런 다음 피해를 예방할 수 있는 방법을 배웁니다.

7가지 원칙

여행 전반에 걸쳐 다음 문구가 반환됩니다.

1. "모든 신호가 진실은 아닙니다."

2. "방법은 환자를 신뢰로부터 보호합니다."

3. "What was hidden in plain sight?"

4. "출처가 없는 숫자는 a가 아닙니다. 번호."

5. "이질성은 잡음이 아니라 메시지이다."

6. "증거의 부재는 부재의 증거가 아니다."

7. "Certainty must be earned, not assumed."

Module 0 Quiz

1. 때때로 메타 분석에서 연구를 통합하면 안 되는 이유는 무엇입니까?

A. Pooling is always better than single studies

B. When heterogeneity is extreme or studies measure different things

C. Pooling is always appropriate for RCTs

D. Statistical methods handle any situation

2. RCT에 대한 체계적인 검토는 증거 계층 구조에서 어디에 위치합니까?

A. At the top

B. Same level as individual RCTs

C. 코호트 연구 아래

D. Same as expert opinion

여정을 시작하세요.

모듈 1: 질문

모든 신호가 진실은 아닙니다.

이것은 오류에 대한 이야기가 아닙니다.

확실성에 대한 이야기입니다.

모듈 1: 질문

🎯 Learning Objectives

체계적인 검토를 위한 집중형 PICO 질문을 공식화합니다
Distinguish surrogate outcomes from patient-important outcomes
Explain why biological plausibility alone is insufficient evidence
CAST 실험과 증거 기반에 미치는 영향을 설명합니다. 의학
원칙 적용: "모든 밝은 표시가 지침은 아닙니다"

~9,000

excess deaths per year

From a treatment everyone believed worked.

이것은 우리가 어떻게 믿었는지, 그리고 우리가 어떻게 틀렸는지에 대한 이야기입니다.

The Observation

Patients with frequent PVCs after MI had 2-5x higher mortality.

400,000+

MI survivors/year

~40%

중요한 PVC를 사용하여

160,000

at elevated risk

A massive clinical need. A clear target.

The Response

Antiarrhythmic drugs were developed, FDA approved,
and prescribed to ~200,000 patients per year.

이에는 악당이 등장하지 않습니다. 이야기.

모두가 가능한 최선의 증거에 따라 행동했습니다.

모두를 설득한 논리

PREMISE 1

PVCs after MI predict sudden cardiac death

↓

PREMISE 2

Antiarrhythmic drugs suppress PVCs

↓

PREMISE 3

Suppressing PVCs should prevent sudden death

↓

CONCLUSION

Antiarrhythmics save lives in post-MI patients

체인은 논리적이었습니다. 결론은 불가피하다고 느껴졌습니다.

CAST: The Cardiac Arrhythmia Suppression Trial

Finally, someone asked: "Does suppressing PVCs actually save lives?"

Design

Randomized, double-blind, placebo-controlled

Population

Post-MI patients with asymptomatic PVCs

Intervention

Encainide, flecainide, or moricizine vs placebo

Run-in

Only patients with ≥80% PVC suppression randomized

Primary endpoint

Death or cardiac arrest with resuscitation

Sample size

1,498 patients (encainide/flecainide arms)

결과: 1989년 4월

데이터 안전 모니터링 위원회(Data Safety Monitoring Board)가 임상시험을 조기 중단했습니다.

Outcome	Drug (n=755)	Placebo (n=743)
Arrhythmic deaths	33	9
All cardiac deaths	43	16
Total deaths	56	22
Death rate	7.4%	3.0%

Relative Risk of Death: 2.5

95% CI: 1.6 - 4.5 | p < 0.001

부정맥을 완벽하게 억제하는 약물은 사망률을 150% 증가시켰습니다.

The Human 비용

Before CAST, ~200,000 Americans per year received these drugs.

~9,000

excess deaths per year - possibly more

Vietnam War: ~6,000 US deaths/year • These drugs: ~9,000+ deaths/year

For every number, a name we will never know.

Look again.

논리 재검토

PREMISE 1

PVCs after MI predict sudden cardiac death

↓

PREMISE 2

Antiarrhythmic drugs suppress PVCs

← THE LEAP

↓

PREMISE 3

Suppressing PVCs should prevent sudden death

↓

CONCLUSION

Antiarrhythmics save lives in post-MI patients

마커를 억제하면 결과가 수정된다는 가정은 테스트되지 않았습니다.

What Went Wrong: The Surrogate Trap

1

PVC는 사망 원인이 아닌 손상된 조직의 지표였습니다

2

The drugs had proarrhythmic effects - triggering deadlier rhythms

3

대리모는 개선되었지만 결과는 악화되었습니다. 해리된 대리모

대리모는 거짓말을 하지 않았습니다. 잘못된 질문을 했습니다.

PICO 프레임워크

Every answerable clinical question has four components:

P - POPULATION

환자는 누구입니까? 특징은 무엇입니까?

I - INTERVENTION

What treatment or exposure is being evaluated?

C - COMPARATOR

What is the alternative? Placebo? Standard care?

O - OUTCOME

What matters to patients? Hard endpoints vs surrogates.

CAST PICO

Post-MI patients with PVCs | Antiarrhythmics | Placebo | Mortality

🔍

조사 연습: CAST 이전의 증거

당신은 1988년 심장 전문의입니다. 한 환자는 MI에서 살아남았지만 PVC를 자주 사용했습니다. 관찰 문헌은 명확합니다...

Study	PVC 환자	Mortality Risk
Lown (1977)	High-grade PVCs	2.4x higher
Bigger (1984)	>10 PVCs/hour	3.1x higher
Mukharji (1984)	Complex PVCs	4.8x higher

신호는 명확합니다. 메커니즘은 그럴듯합니다. 항부정맥제를 처방하시겠습니까?

Before: Observational Logic

PVCs → Higher mortality

Drugs suppress PVCs

∴ Drugs should reduce mortality

After: CAST RCT (1989)

Death rate on drug: 7.4%

Death rate on placebo: 3.0%

RR = 2.5 (150% increase in deaths)

대리인이 개선되었습니다. 환자들은 사망했습니다. "중요한 결과는 무엇입니까?"

증거 종합에 대한 교훈

1

생물학적 타당성은 증명되지 않습니다

A logical mechanism doesn't guarantee the expected effect.

2

Surrogate endpoints can mislead

Improving a biomarker doesn't prove improvement in outcomes.

3

무작위 시험은 가장 강력한 인과 관계 증거를 제공합니다

관찰 데이터만으로는 거의 사용되지 않습니다. 교란으로 인한 개입의 인과관계를 확립합니다.

4

합의는 증거가 아닙니다

200,000건의 처방, FDA 승인 및 지침이 모두 잘못되었습니다.

This is why we do meta-analysis: to see past apparent truths.

스토리: DES-II 대리인 비극

당신이 묻는 질문이 누가 살고 누가 죽는지를 결정한다면 어떻게 될까요?

REAL DATA

1989년에 심장 전문의들은 엔카이니드와 플레카이니드를 사용하여 PVC 억제가 가능하다는 것을 알았습니다. 대리 평가변수는 완벽해 보였습니다. 약물은 활성 약물과 위약에 의해 80%+. But CAST randomized 1,498 patients PVC를 억제했습니다. 임상시험은 조기 중단되었습니다. 56 deaths in the drug group vs 22 in placebo. Mortality increased 2.5-fold. An estimated ~9,000 excess American deaths per year 이러한 약물이 원인이었습니다.

심장병 전문의의 선택: 1987

귀하의 MI 이후 환자는 PVC를 자주 사용합니다. 이를 완전히 억제하는 약물이 있습니다. 어떻게 하시나요?

경로 A: 대리 치료

Prescribe encainide — PVCs vanish, the ECG looks clean

↓

바이오마커가 개선됩니다. 당신은 자신감을 느낍니다. 환자가 사망합니다.

OUTCOME: An estimated 50,000+ excess deaths across the US during years of use

PATH B: Demand a Mortality Trial

주장: "ECG뿐만 아니라 생존도 향상된다는 사실을 보여주세요."

↓

시험 결과 해로움이 드러났습니다. 약물이 철회됩니다. 생명이 구해졌습니다.

결과: 올바른 PICO 질문은 재앙을 예방합니다

THE REVELATION

질문은 결코 "PVC를 억제할 수 있습니까?"가 아니었습니다. "PVC 억제가 생명을 구하는가?" 였습니다. 대리 엔드포인트가 잘못된 질문에 답변했습니다. 올바른 PICO라면 처음부터 죽음을 결과로 요구했을 것입니다.

What appears certain may be wrong.

What everyone believes may be false.

환자가 우리의 신뢰를 위해 돈을 지불하지 않도록 하는 방법이 존재합니다.

이것이 바로 당신이 여기에 있는 이유입니다.

Module 1 Quiz

1. 항부정맥 논리의 근본적인 오류는 무엇이었나요?

A. 임상시험이 무작위화되지 않았습니다

B. Treating a surrogate (PVCs) was assumed to improve outcomes

C. 표본 크기가 너무 작았습니다

D. FDA 승인이 성급했습니다

2. PICO에서 "O"는 무엇을 의미하며 왜 중요한가요?

A. Observation - what researchers see

B. 목표 - 연구 목표

C. Outcome - what matters to patients

D. 조직 - 연구 구조

모든 신호가 진실은 아닙니다.

방법은 환자를 신뢰로부터 보호합니다.

What was hidden in plain sight?

이 내용은

observational evidence.

모듈 2: 프로토콜

🎯 Learning Objectives

Explain why protocol pre-registration prevents bias
Identify key elements of a PROSPERO registration
Distinguish healthy user bias from true treatment effects
Describe why observational studies overestimated HRT benefits
다음 원칙을 적용합니다. 자신감"

30+

observational studies

All showing hormone replacement therapy protected postmenopausal women from heart disease.

증거가 너무 압도적인 것 같았습니다. 결론은 확실해 보였습니다.

간호사의 건강 연구

122,000 nurses followed for decades. HRT users had 40-50% lower cardiovascular mortality.

RR 0.56

Cardiovascular mortality

122,000

Women followed

20+ years

Follow-up

Landmark study. Impeccable methodology. Wrong conclusion.

숨겨진 편견

1

Healthy User Bias: Women who chose HRT were healthier, wealthier, better educated

2

Compliance Bias: Women who took HRT consistently also took better care of themselves

3

Prescriber Bias: Doctors gave HRT to healthier women with fewer risk factors

치료가 그들을 보호하지 못했습니다. 그들은 이미 보호를 받았습니다.

WHI: The Women's Health Initiative

The largest randomized trial of HRT ever conducted.

Design

Randomized, double-blind, placebo-controlled

Population

Postmenopausal women aged 50-79

Intervention

Estrogen + Progestin vs Placebo

Sample size

16,608 women

Primary endpoint

Coronary heart disease

Planned duration

8.5 years

결과: 2002년 7월

Trial stopped early after 5.2 years. Harm exceeded benefits.

Outcome	Hazard Ratio	Direction
Coronary heart disease	1.29	HARM
Stroke	1.41	HARM
Breast cancer	1.26	HARM
Pulmonary embolism	2.13	HARM

Complete Reversal

30년간의 관찰 증거가 뒤집혔습니다

The Lesson

PRE-SPECIFY

A protocol written before the search begins prevents fishing, prevents bias, prevents hindsight distortion.

스토리: 호르몬 타이밍 가설

치료가 효과가 있다면 어떨까요? 좀요?

REAL DATA

WHI showed HRT increased cardiovascular events overall. But later analyses revealed a critical pattern: women who started HRT within 10 years of menopause had REDUCED cardiovascular risk. Women starting 20+ years after menopause had INCREASED risk. The overall null/harm result hid a timing effect.

분석가의 딜레마

WHI 하위 그룹을 분석하고 있습니다. 전반적인 결과는 해로움을 보여줍니다. 더 깊이 파고들시나요?

PATH A: Report Overall Only

Conclude HRT is harmful for all postmenopausal women

↓

Simple message. Guidelines recommend against HRT universally.

OUTCOME: Deny potential benefit to younger menopausal women

PATH B: Pre-Specify Timing Subgroups

Analyze by years since menopause (biologically plausible)

↓

안전한 HRT 시작을 위한 "타이밍 창"을 알아보세요.

OUTCOME: Enable personalized recommendations

THE REVELATION

낚시할 때 하위 그룹 분석은 위험합니다. 생물학이 효과 수정을 예측할 때 필수적입니다. 타이밍 가설은 생물학적으로 타당하며 사전에 지정되었어야 합니다.

PROSPERO Registration

1

검색하기 전에 등록

PROSPERO: International prospective register of systematic reviews

2

결정 잠그기

PICO, search strategy, outcomes, analysis plan - all pre-specified

3

Document Amendments

변경은 허용되지만 투명하고 정당해야 합니다

4

Prevent Duplication

이전에 리뷰가 이미 존재하는지 확인하세요. 시작

Module 2 Quiz

1. 간호사 건강 연구에서 WHI가 제공하지 않은 HRT의 이점을 보여준 이유는 무엇입니까?

A. Nurses' Health had too few patients

B. Healthy user bias in observational studies

C. Nurses' Health had shorter follow-up

D. Different hormone formulations were used

2. What is the primary purpose of PROSPERO registration?

A. To register clinical trials

B. 검토 완료 속도를 높이기 위해

C. 방법을 사전 지정하고 편견을 방지하기 위해

D. 검토 자금을 확보하기 위해

사전 지정은 그렇지 않습니다. 관료주의.

It is protection.

Against our own tendency to find what we expect.

방법은 환자를 신뢰로부터 보호합니다.

What was hidden in plain sight?

모듈 3: 검색

What was hidden in plain sight?

이 내용은

what they didn't publish.

모듈 3: 검색

🎯 Learning Objectives

Develop a comprehensive search strategy using PRESS guidelines
Search multiple databases including grey literature sources
Identify trial registries and regulatory databases (ClinicalTrials.gov, FDA)
Explain how the rosiglitazone case exposed hidden cardiovascular harms
원칙 적용: "눈에 잘 띄는 곳에 무엇이 숨겨져 있었습니까?"

$3.2B

annual sales at peak

Avandia(rosiglitazone)는 다음 중 하나였습니다. 세계에서 가장 많이 팔리는 당뇨병 치료제입니다.

공개된 임상시험은 안심할 수 있는 것처럼 보였습니다. 출판되지 않은 사람들은 다른 이야기를 했습니다.

공개된 증거(2007년 이전)

Published trials showed rosiglitazone effectively lowered HbA1c. Cardiovascular outcomes were rarely reported.

1999

FDA approval

6M+

Patients treated

~0.7%

HbA1c reduction

대리인은 좋아 보였습니다. 하지만 실제 심혈관 사건은 어떻습니까?

Nissen's Discovery: May 2007

Dr. Steven Nissen은 GSK의 자체 웹사이트에서 미공개 시험 데이터를 얻었습니다.

GSK는 법적 합의에 따라 임상 시험 결과를 온라인에 게시해야 했습니다. Nissen과 Wolski는 42개의 임상시험을 분석했는데, 그 중 상당수는 저널에 게재되지 않았습니다.

데이터는 기술적으로 공개되었습니다.

No one had systematically searched for it.

메타 분석 결과

Outcome	Odds Ratio	95% CI
Myocardial Infarction	1.43	1.03 - 1.98
CV Death	1.64	0.98 - 2.74

43% Increased Risk of Heart Attack

심근경색에 대한 p = 0.03

Published in NEJM. The FDA called an emergency advisory committee meeting.

The FDA Advisory Committee: July 2007

22-1

Voted: CV risk exists

20-3

계속 경고가 있는 시장

위원회가 분할되었습니다. 일부는 철회를 원했습니다. 어떤 사람들은 메타 분석에 결함이 있다고 말했습니다.

하지만 신호는 보이지 않았습니다.

The Aftermath

1

Black box warning added for heart failure risk (2007)

2

Severe restrictions on prescribing in the US (2010)

3

Withdrawn 유럽 시장 전체(2010)

4

FDA now requires cardiovascular outcome trials for all diabetes drugs

What a Comprehensive Search Requires

PUBLISHED

PubMed, Embase, CENTRAL, Web of Science

GREY LITERATURE

Conference abstracts, dissertations, regulatory docs

TRIAL REGISTRIES

ClinicalTrials.gov, WHO ICTRP, EU CTR

REGULATORY

FDA, EMA, Health Canada submissions

COMPANY DATA

GSK, Pfizer, Roche clinical trial registries

HAND SEARCH

Reference lists, contact authors, experts

PRESS 체크리스트

Peer Review of Electronic Search Strategies

1

연구 질문 번역

검색에 PICO가 반영됩니까? 요소가 있습니까?

2

부울 및 근접 연산자

AND, OR이 올바르게 사용되지 않았습니까?

3

Subject Headings

MeSH/Emtree 용어가 적절하고 확장되어 있습니까?

4

Text Words

Synonyms, spelling variants, truncation?

PRESS Checklist (continued)

5

Spelling, Syntax, Line Numbers

검색을 유발할 수 있는 오류가 있습니까? 실패?

6

제한 및 필터

날짜, 언어, 연구 설계 제한이 적절합니까?

Peer-reviewed searches substantially improve retrieval of key studies.

PRESS guideline: McGowan et al., 2016

Database Translation

동일한 검색이 각 데이터베이스에 적용되어야 합니다.

PubMed

"diabetes mellitus, type 2"[MeSH] OR "type 2 diabetes"[tiab]

Embase

'non insulin dependent diabetes mellitus'/exp OR 'type 2 diabetes':ti,ab

Subject headings, field tags, and operators differ between databases.

스토리: 타미플루 투명성 유행성 독감에 대한 오셀타미비르(타미플루) 캠페인

검색했는데 아무것도 찾지 못하면 어떻게 되나요?

REAL DATA

Governments stockpiled $9 billion Cochrane Collaboration은 증거를 검토하려고 시도했습니다. 77 clinical trials, full reports existed for only 20. Roche는 5 years에 대한 데이터 공유를 거부했습니다. BMJ와 Cochrane이 마침내 over 160,000 pages of clinical study reports, they found: Tamiflu reduced symptoms by less than 1 day, with no evidence it prevented hospitalizations or complications.

검토자의 딜레마: 2009

타미플루에 대한 Cochrane 리뷰를 업데이트하고 있습니다. 게시된 임상시험은 긍정적인 것으로 보입니다. 그러나 57개 임상시험에는 접근 가능한 전체 보고서가 없습니다. 무엇을 하시나요?

PATH A: Analyze What's Published

Use the 20 available trials. Conclude Tamiflu is effective.

↓

귀하의 리뷰는 지속적인 비축을 지원합니다. 취약한 증거에 90억 달러 지출.

OUTCOME: Billions wasted, true efficacy unknown

경로 B: 완전한 데이터 요구

Refuse to publish until all trial data is accessible

↓

5-year campaign. 160,000+ pages finally obtained. Truth emerges.

OUTCOME: Evidence policy changed; EMA now publishes all trial reports

THE REVELATION

검색은 무엇을 찾을 수 있느냐에 달려 있습니다. 회색 문헌이 기업 벽 뒤에 숨겨져 있으면 아무리 포괄적인 PubMed 검색이라도 진실을 놓치게 됩니다. 타미플루 사건은 글로벌 정책을 변화시켰습니다. EMA는 이제 모든 의약품에 대한 임상 연구 보고서를 게시합니다.

If Nissen had searched only PubMed,

the signal would have remained hidden.

Comprehensive search is survival.

What was hidden in plain sight?

Module 3 Quiz

1. 어떤 유형의 증거 출처에서 로시글리타존 심혈관 신호가 밝혀졌습니까?

A. Published journal articles

B. Cochrane Library

C. Company clinical trial registry

D. FDA approval documents

2. What does PRESS stand for?

A. 증거 검색 표준 출판물 검토

B. Peer Review of Electronic Search Strategies

C. 증거 종합 연구 보고 프로토콜

D. Primary Research Evidence Search System

What was hidden in plain sight?

모듈 4: 스크리닝

출처가 없는 숫자는 숫자가 아닙니다.

이 내용은

what they chose to report.

모듈 4: 스크리닝

🎯 Learning Objectives

Apply PRISMA flow diagram to document study selection
Implement dual-reviewer screening with conflict resolution
선택적 결과 보고 및 데이터 조작 식별
Calculate inter-rater reliability (Cohen's kappa)
원칙 적용: "출처가 없는 숫자는 숫자가 아닙니다"

88,000

heart attacks attributed to Vioxx

A blockbuster drug. A hidden signal. A preventable catastrophe.

Between 1999 2004년에는 수백만 명이 이 진통제를 복용했습니다. 일부는 집에 돌아오지 않았습니다.

Vioxx의 부상

Rofecoxib(Vioxx)는 COX-2 선택적 NSAID였습니다. 기존 진통제보다 위장에 더 안전한 것으로 판매됩니다.

1999

FDA approval

$2.5B

Peak annual sales

80M+

Patients prescribed

VIGOR 시험(2000)

Vioxx Gastrointestinal Outcomes Research

Design

Randomized, double-blind

Comparison

Vioxx vs Naproxen

Population

Rheumatoid arthritis

Sample

8,076 patients

Primary Outcome

GI events

Published

NEJM, November 2000

What VIGOR Published

GI Outcome	Vioxx	Naproxen
Confirmed GI events	2.1 per 100 pt-yrs	4.5 per 100 pt-yrs
Reduction	54% fewer GI events

제목: Vioxx는 위장에 더 안전합니다!

의사들은 이렇게 말했습니다. 이것이 환자들이 믿었던 것입니다.

What VIGOR Buried

CV Outcome	Vioxx	Naproxen
Myocardial Infarction	20 events	4 events
Relative Risk	5x higher in Vioxx group

5-fold Increase in Heart Attacks

Mentioned only briefly, attributed to naproxen being "cardioprotective"

선택적 보고

1

데이터 컷오프 조작: 3 additional heart attacks occurred after the cutoff used in publication

2

Spin: CV 신호는 나프록센이 심장 보호 효과가 있다고 설명되었습니다(증거 없음)

3

Outcome switching: CV 사건은 사전 지정되었지만 강조되지 않았습니다.

4

Internal knowledge: 머크 이메일은 그들이 신호에 대해 알고 있음을 보여줍니다.

APPROVe 시험(2004)

대장 폴립 예방을 위한 시험 - 안전을 위해 조기 중단되었습니다.

RR 1.92

CV events vs placebo

Sept 2004

Vioxx withdrawn

Four years after VIGOR showed a 5x risk. Four years too late.

스토리: Vioxx 결정 트리

고려해 보셨나요? 신호가 소음 속에 숨어 있으면 어떻게 되나요?

REAL DATA

Vioxx(rofecoxib)는 1999. By 2004, estimates suggest 88,000-140,000 excess heart attacks and 30,000-40,000 deaths. Merck's own VIGOR trial showed 5x cardiovascular risk in 2000—but it was dismissed as a "naproxen cardioprotective effect."

갈림길

에서 승인되었습니다. 귀하는 2001년 FDA 검토자입니다. VIGOR 데이터에 따르면 Vioxx와 비교하여 심장마비 위험이 5배 더 높습니다. naproxen.

경로 A: 설명 수락

Believe Merck's hypothesis: naproxen is cardioprotective

↓

No additional safety studies required. Drug stays on market at full speed.

결과: 4년 동안 40,000명 이상 사망

경로 B: 증거 요구

Require a dedicated CV safety trial before continued marketing

↓

Delay or restrict marketing until cardiovascular safety is established.

OUTCOME: Signal detected early, lives saved

THE REVELATION

신호는 2000년에 있었습니다. 잘못된 설명으로 인해 조치가 지연되었습니다. 4년. 증거 없이 받아들여진 대안 가설은 수만 명의 목숨을 앗아갔습니다.

PRISMA 흐름도

Every step of screening must be documented and transparent.

Identification

Records from databases + other sources

↓

Screening

Title/abstract review (duplicates removed)

↓

Eligibility

Full-text assessment (with exclusion reasons)

↓

Included

Studies in synthesis

Dual Screening: Why Two Reviewers?

1

Reduces Selection Bias

One reviewer might unconsciously favor certain studies

2

Catches Errors

피로, 잘못된 판독 및 실수는 피할 수 없습니다

3

Forces Explicit Criteria

Disagreements reveal ambiguity in inclusion rules

Typical agreement: κ = 0.6-0.8

Disagreements resolved by discussion or third reviewer

보정: 파일럿 단계

Before screening thousands of records, reviewers should calibrate on a sample of 50-100 records.

1

Screen the same set independently

2

Compare decisions and discuss disagreements

3

Refine inclusion criteria until κ > 0.7

4

보정 프로세스 및 모든 규칙을 문서화합니다. 변경 사항

PRISMA 2020 Updates

New in 2020

Separate reporting of database vs register searches

New in 2020

자동화 도구를 보고해야 합니다

New in 2020

Citation searching documented separately

New in 2020

Reasons for exclusion at full-text mandatory

PRISMA 2020은 합성 방법, 확실성 평가 및 프로토콜 등록에 대한 보고를 확대하여 체크리스트를 대폭 개정했습니다.

If Vioxx's cardiovascular data had been screened by independent reviewers,

if all pre-specified outcomes had been required to be reported,

88,000 heart attacks might have been prevented.

출처가 없는 숫자는 숫자가 아닙니다.

Module 4 Quiz

1. VIGOR 시험에서 나프록센과 비교했을 때 Vioxx 그룹에서 MI의 상대적 위험은 얼마나 됩니까?

A. 1.5x higher

B. 2x higher

C. 5x higher

D. 10x higher

2. Why is dual screening (two independent reviewers) important?

A. It makes screening faster

B. It reduces selection bias and catches errors

C. 검토할 연구 수가 줄어듭니다

D. It allows reviewers to skip full-text review

출처가 없는 숫자는 숫자가 아닙니다.

모듈 5: 추출

출처가 없는 숫자는 숫자가 아닙니다.

이 내용은

존재하지 않는 숫자.

모듈 5: 추출

🎯 Learning Objectives

출처 필드를 사용하여 표준화된 데이터 추출 양식 설계
Calculate effect sizes from various reported statistics (OR, RR, HR, SMD)
Implement dual-extraction with discrepancy resolution
데이터 조작 및 부정 행위에 대한 위험 신호 식별
Explain how the DECREASE fraud affected clinical guidelines

~10,000

possible excess deaths in Europe

조작된 임상 시험을 기반으로 한 지침에서

DECREASE 임상시험은 전 세계적으로 수술 전후 관리에 영향을 미쳤습니다. 데이터는 만들어졌습니다.

Don Poldermans: A Star Researcher

Professor at Erasmus Medical Center, Rotterdam. Author of over 500 papers. Lead author of ESC guidelines on perioperative cardiac care.

500+

Publications

DECREASE

Trial series I-VI

ESC

Guideline chair

완벽해 보이는 소스입니다. 누군가 데이터를 보기 전까지는.

DECREASE 실험: 주장

Trial	Finding	Impact
DECREASE-I (1999)	90% reduction in cardiac death	Changed guidelines
DECREASE-IV (2009)	Beta-blockers safe in low-risk	Expanded recommendations

Effect sizes were implausibly large.

90% reduction? Almost nothing in medicine works that well.

The Investigation: 2011

1

Erasmus MC investigated after whistleblower complaints

2

조작된 환자 데이터: Patients who didn't exist or weren't enrolled

3

No informed consent: Many "participants" never consented

4

Poldermans dismissed: From Erasmus MC in 2011

폭발적인 피해

DECREASE가 제거되었을 때 메타 분석...

Benefit → Harm

Direction reversed

27% ↑

Stroke risk increase

POISE 시험(2008)은 해로움을 보여주었습니다. DECREASE와 충돌했기 때문에 기각되었습니다.

왜 이것이 발견되지 않았나요?

1

Trust in authority: Poldermans는 자신의 증거를 검토하는 지침 작성자였습니다

2

No data verification: 아무도 개별 환자 데이터를 요청하지 않았습니다

3

Publication prestige: Published in top journals, assumed valid

4

Implausible effects accepted: 90% reductions should raise suspicion

Data Extraction: Defense Against Fraud

1

Dual Extraction

Two extractors independently - catches transcription errors and forces scrutiny

2

Record Provenance

Table, page, paragraph - every number traceable to source

3

Verify Against Registry

ClinicalTrials.gov 결과 대 출판 - 불일치는 위험 신호입니다

4

Request IPD

Individual patient data reveals what aggregate summaries hide

Effect Size Calculation

추출 중에 보고된 데이터에서 효과 크기를 계산합니다.

BINARY OUTCOMES

Odds Ratio, Risk Ratio, Risk Difference from 2x2 tables

CONTINUOUS OUTCOMES

평균 차이, 평균 및 SD의 표준화된 평균 차이

항상 가장 신뢰할 수 있는 소스에서 추출합니다.

Prefer: ITT results > per-protocol > subgroups

Red Flags During Extraction

!

Implausible effect sizes: 80-90% reductions should prompt scrutiny

!

Baseline imbalances: "너무 완벽하게" 일치하는 그룹

!

Round numbers: "Exactly 50" or "exactly 100" patients per arm

!

Registry discrepancies: 게시된 N은 등록된 N과 다릅니다

Researcher

Effect Size Conversions

연구 보고서 결과는 서로 다른 측정항목으로 나타납니다. 이를 풀링하려면 전환이 필요한 경우가 많습니다.

From	To	Formula
SMD (d)	log-OR	log-OR = d × π / √3
log-OR	SMD (d)	d = log-OR × √3 / π
Correlation (r)	Fisher z	z = 0.5 × ln((1+r)/(1−r))
OR	RR	RR = OR / (1 − P₀ + P₀ × OR)
OR	NNT	NNT = 1 / (P₀ − OR×P₀ / (1−P₀+OR×P₀))

P₀ = 통제 그룹의 기준 위험. 이 공식은 대략적인 조건을 가정합니다. Borenstein et al. 참조 (Ch. 7).

Researcher

사건 발생 시간(생존) 데이터

Many trials report time-to-event outcomes using hazard ratios (HR). Pooling HRs in meta-analysis requires special handling:

1

log(HR) + SE 방법

시험에서 log(HR) 및 해당 SE를 추출합니다. 보고되지 않은 경우 CI에서 SE를 도출합니다. SE = (ln(상부) − ln(하부)) / (2 × 1.96). 표준 역분산 방법을 사용하여 풀링합니다.

2

HR이 보고되지 않는 경우

Kaplan-Meier 곡선에서 IPD를 재구성하거나(Guyot et al. 2012) p-값 및 이벤트 카운트에서 HR을 추정하는 방법이 있습니다(Parmar et al. 1998). 가능한 경우 항상 직접 보고되는 조정된 HR을 선호합니다.

HR < 1 favors treatment; HR > 1 favors control. Do not convert HRs to ORs or RRs—they measure fundamentally different quantities.

스토리: Boldt 콜로이드 스캔들

추출한 데이터가 결코 실제가 아닌 경우에는 어떻게 됩니까?

REAL DATA

Joachim Boldt는 마취액 관리 분야에서 가장 많은 연구를 한 연구자였습니다. 180개 이상의 출판물이 철회되었습니다 . 이는 의학 역사상 가장 큰 철회 사례 중 하나입니다. 그가 조작한 데이터는 하이드록시에틸 전분(HES)이 안전하다는 것을 보여주었습니다. 그의 연구를 포함한 메타 분석에서는 HES가 무해하다고 결론지었습니다. Boldt의 연구가 제거되었을 때 합동 효과가 역전되었습니다: HES increased kidney injury by 59% (RR 1.59, 95% CI 1.26-2.00) and mortality by ~9% (RR 1.09). An estimated thousands of patients received a harmful fluid based on fabricated evidence.

추출자의 경계: 2010

수액 소생술 메타 분석을 위한 데이터를 추출하고 있습니다. Boldt의 연구는 문헌(90개 이상의 논문)을 지배합니다. 제보자가 우려를 표명했습니다. 어떤 일을 하시나요?

PATH A: Extract as Published

Trust peer-reviewed publications. Extract Boldt's data like any other.

↓

Your meta-analysis shows HES is safe. Guidelines recommend it.

OUTCOME: Thousands receive a nephrotoxic fluid

PATH B: Verify Provenance

윤리 승인 교차 확인, 소스 데이터 요청, 의심스러운 연구를 제외한 민감도 분석 수행

↓

Discover missing ethics approvals. Flag studies. Re-analyze without them.

OUTCOME: True signal emerges — HES causes harm

THE REVELATION

출처는 관료주의가 아닙니다. 증거와 허구의 차이입니다. 추출된 모든 숫자는 검증 가능한 환자 데이터와 함께 윤리적으로 승인된 연구를 추적해야 합니다. 출처가 없으면 소유자가 없는 숫자는 무기가 될 수 있습니다.

메타 분석의 모든 숫자

must trace back to a verifiable source.

출처가 없는 숫자는 숫자가 아닙니다.

Fraudulent data can kill as surely as fraudulent drugs.

Module 5 Quiz

1. DECREASE 시험 데이터가 베타 차단제 메타 분석에서 제거되었을 때 무슨 일이 일어났나요?

A. The benefit became even larger

B. No change in conclusions

C. The direction reversed to show potential harm

D. 결과가 결론에 이르지 못했습니다

2. Why should dual extraction be standard practice?

A. It catches transcription errors and forces scrutiny

B. It makes extraction faster

C. 더 많은 연구를 찾는 데 도움이 됩니다

D. It reduces the amount of work needed

출처가 없는 숫자는 숫자가 아닙니다.

모듈 6: 편향

방법은 환자를 신뢰로부터 보호합니다.

이 내용은

우리가 볼 수 없는 편향.

모듈 6: 편향

🎯 Learning Objectives

Apply Risk of Bias 2.0 (RoB 2) to randomized trials
비무작위 연구에 ROBINS-I 적용
Assess all five RoB 2 domains (randomization, deviations, missing data, measurement, selection)
Distinguish confounding by indication from true treatment effects
Explain how BART revealed hidden harms of aprotinin

20+

시판 기간

아프로티닌은 수술 감소의 표준이었습니다. 출혈이 발생했습니다.

그런데 누군가 RCT를 실행했습니다. 진실은 달랐습니다.

The Hidden Bias: Confounding by Indication

1

Sicker patients got aprotinin: Surgeons used it in complex, high-risk cases

2

Survivors bias: Dead patients can't report complications

3

Publication bias: 부정적인 연구는 발표되지 않았습니다

관찰 연구에서는 약물의 효과와 환자의 기본 위험을 분리할 수 없었습니다.

BART: 무작위 진실

Blood Conservation Using Antifibrinolytics in a Randomized Trial

Outcome	Aprotinin	Alternatives
30-day mortality	6.0%	3.9%
Relative Risk	1.53 (53% increased death)

Trial Stopped Early for Harm

11월 시장에서 철수 2007

🔍

조사: 편향 평가

귀하는 관찰 연구를 검토하고 있습니다. 편향 위험 사고 적용:

Question	Observational	BART (RCT)
Random allocation?	❌ Surgeon choice	✓ Yes
Baseline comparable?	❌ Sicker got drug	✓ Balanced
Blinding?	❌ Open label	✓ Double-blind

Confounding by indication: 의사는 가장 아픈 환자에게 아프로티닌을 투여했습니다. 관찰 연구에서는 생존 편향을 측정할 때 생존을 약물에 기인했습니다.

Risk of Bias 2.0: The Five Domains

D1

Randomization Process

D2

의도된 개입의 편차

D3

결과 데이터 누락

D4

결과 측정

D5

보고된 결과 선택

ROBINS-I: 비무작위 연구의 경우

RCT를 사용할 수 없는 경우 ROBINS-I(Risk Of Bias In Non-randomized Studies 개입)

1

Confounding

Baseline differences between groups

2

Selection of Participants

Exclusions related to intervention

3

Classification of Interventions

Misclassification of exposure status

4

의도된 개입의 편차

Co-interventions, contamination

5

Missing Data

Differential loss to follow-up

6

Measurement of Outcomes

Ascertainment bias

7

Selection of Reported Result

Selective reporting

Ratings: Low / Moderate / Serious / Critical / No information

스토리: 아프로티닌 BART 실험

64개의 연구가 동의하고 모두 틀렸을 때 어떤 일이 발생합니까?

REAL DATA

아프로티닌은 출혈을 줄이기 위해 심장 수술에 사용되었습니다 20 years. 64 small randomized trials 안전하고 효과적이다. 메타 분석을 통해 이점이 확인되었습니다. 그런 다음 BART trial (2008) randomized 2,331 patients: aprotinin vs. tranexamic acid vs. aminocaproic acid. Result: aprotinin increased mortality by 53% (RR 1.53, 95% CI 1.06-2.22). 평가판은 위험으로 인해 조기 중단되었습니다. 바이엘은 몇 달 만에 아프로티닌을 시장에서 철수했습니다.

외과 의사의 증거: 2006

당신은 항섬유소 용해제를 선택하는 심장 외과 의사입니다. 64개의 소규모 시험에서는 아프로티닌을 선호하지만 사망률을 검출할 수 있는 검정력은 없었습니다. 대규모 RCT(BART)가 등록 중입니다. 기다리시겠습니까?

경로 A: 메타 분석을 믿으세요

64 trials can't all be wrong. Continue prescribing aprotinin.

↓

소규모 임상시험에서는 사망이 아닌 출혈을 측정했습니다. 아무도 필멸의 삶을 살 수 있는 적절한 힘을 갖고 있지 않았습니다. 통합된 메타 분석은 설득력이 부족한 대리 결과입니다.

OUTCOME: Excess deaths in cardiac surgery patients

PATH B: Assess Risk of Bias First

RoB가 포함된 64개 임상시험을 모두 평가해 주세요. 규모가 작고 대리 결과를 사용하며 감소율이 높다는 점에 주목하세요. 적절하게 전원이 공급된 RCT를 기다립니다.

↓

BART reveals the truth. Switch to safer alternatives.

OUTCOME: Lives saved by demanding adequately powered evidence

THE REVELATION

증거의 양이 질과 같지 않습니다. 잘못된 결과를 측정하는 64개의 저전력 임상시험은 사망률을 측정하는 적절하게 검증된 1개의 임상시험보다 중요하지 않습니다. 편향 위험 평가는 형식적이지 않습니다. 이는 소규모의 대리 중심 증거에서 나온 오해의 소지가 있는 결론과 환자 사이의 보호막입니다.

Sixty-four small trials measured bleeding, not death.

One adequately powered trial revealed 53% increased mortality.

증거의 양은 품질과 검정력을 대체할 수 없습니다.

Module 6 Quiz

1. Why did 64 small trials miss aprotinin's harm?

A. Underpowered for mortality; used surrogate outcomes

B. Confounding by indication

C. Outcome measured incorrectly

D. Follow-up too short

방법은 환자를 신뢰로부터 보호합니다.

모듈 7: 종합

이질성은 노이즈가 아니라 메시지입니다.

마그네슘 논쟁: 1991-1995

When pooling leads us astray.

모듈 7: 종합

🎯 Learning Objectives

Calculate pooled effect sizes using fixed-effect and random-effects models
Choose between DerSimonian-Laird and HKSJ estimators appropriately
Interpret forest plots including weights, confidence intervals, and diamonds
Explain why small-study effects can mislead meta-analyses
원칙 적용: "이질성은 메시지이지, 노이즈가 아닙니다." 소음"

The Year: 1991

"당신은 희망과 증거의 교차점에 서 있습니다..."

Heart disease kills more people worldwide than any other cause. In 1991, a new hope emerges: Could something as simple and cheap as intravenous magnesium save lives after myocardial infarction?

생물학적 근거는 타당했습니다.

Magnesium stabilizes cardiac membranes, prevents arrhythmias, and vasodilates coronary arteries.

LIMIT-2: 획기적인 재판

Leicester Intravenous Magnesium Intervention Trial, 1992

2,316

Patients enrolled

24%

Mortality reduction

p = 0.04

Statistically significant

A cheap, safe intervention that could save 250,000 lives per year globally.

의료계는

The Meta-Analysis: 1993

Researchers pooled seven randomized trials of IV magnesium in MI:

Trial	Year	N	Odds Ratio
Morton 1984	1984	40	0.10
Rasmussen 1986	1986	273	0.35
Smith 1986	1986	400	0.48
Abraham 1987	1987	94	0.87
Shechter 1990	1990	103	0.27
Ceremuzynski 1989	1989	48	0.22
LIMIT-2	1992	2,316	0.74

🔍

Investigation Exercise: The Meta-Analyst's Dilemma

당신은 1993년 Cochrane 리뷰어였습니다. 당신은 MI에 대한 마그네슘에 대한 증거를 종합해 달라는 요청을 받았습니다. 7번의 시도에서 얻은 데이터가 앞에 있습니다.

이 포리스트 플롯에서 패턴이 보이나요?

Pooled OR = 0.44 (95% CI: 0.27–0.71)

55% mortality reduction! Publish in the Lancet?

하지만 잠깐만요... 시도 규모에 대해 뭔가 알아차린 게 있나요?

경고 징후

What should have given us pause?

1

Small sample sizes: Six of seven trials had <500 patients

2

Extreme effects: OR of 0.10 (90% reduction) is implausible for any drug

3

All positive: 부정적인 실험은 어디에 있었습니까? 파일 서랍 문제...

4

Funnel asymmetry: Small trials showed much larger effects than larger ones

🔍

퍼널 플롯 테스트

풀링하기 전에 출판 편향을 확인해야 합니다. 깔때기 도표를 살펴보겠습니다.

연도: 1995 — ISIS-4 보고서

"그리고 진실이 나타났습니다..."

The Fourth International Study of Infarct Survival (ISIS-4) enrolled 58,050 patients across 1,086 hospitals in 31 countries.

58,050

Patients

2,216

Deaths in Mg group

2,103

Deaths in placebo

OR = 1.06 (95% CI: 1.00–1.12)

No benefit. If anything, a trend toward harm.

📊

전과 후: 전체 그림

우리 숲에 대규모 실험을 추가하면 어떤 일이 일어나는지 확인하세요 줄거리...

BEFORE ISIS-4

7 small trials (N = 3,274)

OR = 0.44

Strong benefit signal

AFTER ISIS-4

8 trials (N = 61,324)

OR = 1.02

No effect

Why Did Small Trials Mislead?

1

Publication Bias

Small negative trials were never published—they sat in file drawers

2

Small-Study Effects

Smaller trials tend to show larger effects due to methodological weaknesses

3

Random High Bias

우연히 일부 작은 시도가 극단적인 결과를 얻었고 그 결과가 출판되었습니다.

4

Random-Effects Amplification

Random-effects models give more weight to small trials, amplifying bias

Fixed vs. Random Effects

Which model should you choose?

FIXED EFFECT MODEL

Assumes one true effect. Weights studies by inverse variance (precision). Large trials dominate.

Magnesium result: OR = 0.96 (p = 0.52)

RANDOM EFFECTS MODEL

Assumes distribution of effects. Gives more weight to small trials. Wider confidence intervals.

Magnesium result: OR = 0.59 (p = 0.01)

⚠️ 모델 선택에 따라 결론이 결정되었습니다!

무작위 효과는 편향을 수정하지 않습니다. 소규모 연구 효과로 인해 소규모 임상시험에 중점을 두고 결론이 바뀔 수 있습니다.

마그네슘의 교훈

1. 통합 추정치를 신뢰하기 전에 출판 편향 을 확인하세요. 깔대기 도표와 Egger의 테스트가 도구입니다.

2. Be wary of small-study effects. If only small trials show benefit, wait for a large, well-conducted trial.

3. Model choice matters. 무작위 효과는 편향된 증거를 증폭시킬 수 있습니다. 두 모델을 모두 고려하고 그 의미를 이해하십시오.

4. One large trial can overturn many small ones. 이것이 ISIS-4와 같은 대규모 시험이 그토록 가치 있는 이유입니다.

Researcher

메타 분석의 특별 연구 설계

모든 RCT가 표준 병렬 그룹 설계를 사용하는 것은 아닙니다. 두 가지 일반적인 대안은 결과를 풀링할 때 특별한 처리가 필요합니다.

1

Cluster-Randomized Trials

개인이 아닌 그룹(병원, 학교)을 무작위로 지정합니다. design effect = 1 + (m−1) × ICC는 유효 샘플 크기를 줄입니다. 통합하기 전에 N을 설계 효과로 나누거나 시행에서 조정된 SE를 사용하십시오. 클러스터링을 무시하면 CI가 인위적으로 좁아집니다.

2

Crossover Trials

각 환자는 두 가지 치료를 모두 받습니다. 쌍을 이루는 설계는 분산을 줄이지만 올바르게 풀링하려면 within-patient correlation (또는 쌍을 이루는 분석 SE)가 필요합니다. 병렬 그룹 SE를 사용하는 것은 보수적입니다. 잘못된 N명의 이중 계산 환자를 사용합니다.

자세한 공식과 실제 사례는 Cochrane Handbook v6.4, 23장을 참조하십시오.

스토리: 초기 계면활성제 역전

연구를 결합하는 방식에 따라 치료가 생명을 구하는 것처럼 보이는지 아니면 생명을 구하는 것처럼 보이는지 또는 쓸모가 없나요?

REAL DATA

미숙아용 초기 계면활성제는 초기 vs 후기 계면활성제로 6 small trials showing reduced mortality (RR 0.84). A fixed-effect meta-analysis confirmed benefit (p=0.04). But a random-effects model showed no significance (p=0.12) — the confidence interval crossed 1.0. Later, SUPPORT (2010) and VON (2012), two large pragmatic trials with ~2,000 neonates combined, found no benefit 지원받았습니다. 소규모 실험과 잘못된 모델을 기반으로 임상 실습이 변경되었습니다.

신생아과 의사의 모델 선택: 2005

초기 계면활성제에 대한 Cochrane 리뷰를 업데이트하고 있습니다. 6개의 소규모 시험에서는 고정 효과 모델의 이점을 보여줍니다. 무작위 효과 모델은 중요하지 않습니다. 어느 것을 보고합니까?

PATH A: Report Fixed-Effect Only

Fixed-effect is significant. Report the positive result. Change practice.

↓

NICUs adopt early surfactant. Later trials show no benefit. Practice reverses.

OUTCOME: Years of unnecessary intubation of premature infants

PATH B: Report Both Models

FE 및 RE 결과를 표시합니다. 중요성이 모델 선택에 따라 달라지는 플래그입니다. 대규모 실험을 요청하세요.

↓

Honest uncertainty. Large trials prioritized. True answer emerges faster.

OUTCOME: Premature babies spared unnecessary intervention

THE REVELATION

고정 효과를 사용하는지, 무작위 효과를 사용하는지에 따라 결론이 달라지면 결론이 깨지기 쉽습니다. 둘 다 신고하세요. 불확실성을 인정하십시오. 그리고 기억하세요: 작은 시도에서 얻은 깨지기 쉬운 결과는 관행을 바꿔야 하는 의무가 아닙니다.

Module 7 Quiz

1. ISIS-4가 찾지 못한 이점을 마그네슘 메타 분석에서 보여준 이유는 무엇입니까?

A. ISIS-4 방법론에 결함이 있었습니다

B. Calculation error in meta-analysis

C. Publication bias in small trials

D. LIMIT-2의 성능이 부족했습니다

2. What warning sign should have alerted reviewers to potential bias?

A. Asymmetric funnel plot (small trials showing larger effects)

B. Low heterogeneity (I² = 0%)

C. Strong biological plausibility

D. Too few trials to analyze

3. When publication bias is suspected, which model may amplify the bias?

A. Fixed effect model

B. Random effects model

C. Bayesian model

D. Network meta-analysis

Small trials can show false signals.

Large trials anchor the truth.

이질성은 노이즈가 아니라 메시지입니다.

모듈 8: 이질성

이질성은 노이즈가 아니라 메시지입니다.

ACCORD: 2008

평균이 진실을 가릴 때.

모듈 8: 이질성

🎯 Learning Objectives

I², τ² 및 예측 구간을 계산하고 해석
Apply ICEMAN criteria to assess subgroup credibility
Distinguish between clinical, methodological, and statistical heterogeneity
Conduct and interpret leave-one-out sensitivity analyses
Explain how ACCORD revealed differential effects across subgroups

The Year: 2008

"역사상 가장 충격적인 시험 종료 중 하나를 목격하게 됩니다..."

For 수십 년 동안 당뇨병 커뮤니티에는 lower blood sugar is better라는 하나의 지침 원칙이 있었습니다. 획기적인 DCCT(1993)와 UKPDS(1998)에서는 집중적인 혈당 조절이 미세혈관 합병증(실명, 신부전, 신경 손상)을 감소시키는 것으로 나타났습니다.

논리적 추정:

If controlling glucose prevents complications, shouldn't intensive control prevent cardiovascular disease too?

ACCORD: Action to Control Cardiovascular Risk in Diabetes

The definitive test of intensive glucose control

10,251

Type 2 diabetics

HbA1c <6%

Intensive target

HbA1c 7-7.9%

Standard target

모든 환자는 심혈관 질환 위험이 높거나 다양한 위험 요인이 있는 제2형 당뇨병을 앓고 있었습니다. 이 임상시험은 5.6년 동안 설계되었습니다.

February 6, 2008

데이터 안전 모니터링 위원회(Data Safety Monitoring Board)가 긴급 회의를 소집했습니다.

After 3.5 years, they make an unprecedented decision:

시험을 중단하세요.

충격적인 결과

Outcome	Intensive	Standard	HR (95% CI)
Primary CV endpoint	352 events	371 events	0.90 (0.78–1.04)
All-cause mortality	257 deaths	203 deaths	1.22 (1.01–1.46)
Severe hypoglycemia	10.5%	3.5%	3.0× higher

22% increase in mortality

54 excess deaths in the intensive arm

🔍

Investigation Exercise: The Clinician's Dilemma

당신은 500명의 당뇨병 환자를 담당하는 내분비학자입니다. ACCORD 결과가 게시됩니다. HbA1c <6%를 목표로 노력하고 있는 환자들에게 무엇을 말씀해 주시겠습니까?

강력한 조절은 모든 사람에게 해롭습니까? 아니면 일부에게만?

하위 그룹 분석 공개:

Subgroup	Intensive HR	Interpretation
No prior CVD	1.00 (0.76–1.32)	No effect
Prior CVD	1.45 (1.15–1.84)	Significant harm
Baseline HbA1c <8%	1.02 (0.75–1.40)	No effect
Baseline HbA1c ≥8%	1.29 (1.03–1.60)	Harm

The average effect masked critical heterogeneity!

심혈관 질환이 확립되어 있거나 기본 조절이 불량한 환자의 경우 집중 치료가 해로웠습니다.

이질성 이해: I² 및 그 이상

연구(또는 하위 그룹)에서 다음과 같은 결과가 나타났습니다. 효과가 다르므로 이 변화를 정량화해야 합니다.

I² = 0–25%: 낮은 이질성. 효과는 연구 전반에 걸쳐 일관됩니다.

I² = 25–50%: Moderate. Look for sources of variation.

I² = 50–75%: Substantial. Consider whether pooling is appropriate.

I² = 75–100%: Considerable. A single pooled estimate may mislead.

그러나 I²만으로는 무엇을 해야 할지 알려주지 않으며 추가 조사가 필요하다는 신호를 보냅니다.

Tau²(τ²): 연구 간 차이

I²는 이질성으로 인한 분산의 비율을 알려주는 반면, τ²는 다음을 알려줍니다. 크기.

I² (percentage)

"연구 간 실제 차이로 인한 전체 분산의 비율은 얼마입니까?"

Scale: 0% to 100%

τ² (absolute)

"연구 간 실제 효과의 차이는 얼마나 됩니까?"

Same scale as the effect measure

Use τ² to calculate prediction intervals

예측 구간은 새로운 연구에서 기대할 수 있는 효과의 범위를 보여줍니다. 이는 종종 신뢰도보다 훨씬 더 넓습니다. 간격.

📊

The Prediction Interval: What ACCORD Really Tells Us

Consider a meta-analysis of intensive glucose control across multiple trials...

Confidence Interval

HR 1.10 (0.95–1.27)

"평균 효과에 대한 최선의 추정치"

Prediction Interval

HR 1.10 (0.70–1.73)

"The range of effects in a new setting"

예측 간격은 이익과 해로움 모두에 걸쳐 있습니다!

In some settings, intensive control might help. In others, it could kill.

When Is a Subgroup Effect Credible?

Subgroup Credibility Criteria (adapted from ICEMAN, Schandelmaier 2020 & Sun 2012)

1

하위 그룹 분석이 미리 지정되었습니까?

사후 하위 그룹은 데이터에 취약합니다. 준설

2

Is there a plausible biological rationale?

기전은 명확하고 데이터와 독립적이어야 합니다

3

Is the effect consistent across related outcomes?

사망률에 대한 피해가 나타나면 MI, 뇌졸중에도 비슷한 피해가 있습니까?

4

Is there independent replication?

다른 연구에서 하위 그룹 효과가 확인되었습니까?

ICEMAN Applied to ACCORD

Criterion	Assessment	Score
Pre-specified?	예—이전 CVD가 프로토콜	✓
Biological rationale?	Yes—hypoglycemia more dangerous with CVD	✓
Consistent outcomes?	Yes—CV mortality and all-cause mortality aligned	✓
Independent replication?	Partially—ADVANCE, VADT showed similar patterns	~

ICEMAN Rating: High Credibility

The differential harm in high-risk patients appears genuine.

임상적 의미

CVD가 없는 환자의 경우: Moderate glucose control (HbA1c ~7%) remains the goal. Intensive control may reduce microvascular complications.

CVD가 확립된 환자의 경우: Avoid intensive targets. Hypoglycemia is dangerous for damaged hearts.

노인 환자의 경우: Relaxed targets. Quality of life matters. Tight control causes falls, confusion, and excess mortality.

"One size fits all" treatment is not patient-centered medicine.

Meta-Regression: Explaining Heterogeneity

When heterogeneity is high, meta-regression can identify study-level covariates that explain variation.

THE QUESTION

효과 크기가 연구에 따라 체계적으로 달라지나요? 특성?

Covariates

Year, dose, duration, baseline risk, study quality

Output

Regression coefficient (slope), R², residual heterogeneity

Caution

메타 회귀에는 공변량당 10개 이상의 연구가 필요합니다. 연구가 거의 없기 때문에 탐색적일 뿐입니다. 생태학적 오류: 연구 수준 연관성은 개인에게 적용되지 않을 수 있습니다.

Example: In ACCORD, meta-regression might test if treatment effect varies by baseline HbA1c, showing harm concentrated in patients with very high levels.

스토리: SPRINT 혈압 혁명

What number saves lives? Who decides?

REAL DATA

수십 년 동안의 목표는 다음과 같습니다. 혈압 치료 <140 mmHg systolic. Then came SPRINT (2015): 9,361 high-risk patients randomized to intensive (<120) vs standard (<140) targets. Intensive treatment reduced CV events by 25% and death by 27%. Trial stopped early for benefit. Guidelines changed worldwide.

Before SPRINT: The Guidelines Committee

2014년에 혈압 지침을 설정하고 있습니다. 목표는 수년간 <140이었습니다. 더 나은 증거를 기다려야 할까요?

PATH A: Maintain Status Quo

Keep <140 target (established practice, minimal controversy)

↓

Guidelines unchanged. Physicians continue treating to <140.

OUTCOME: Miss opportunity to prevent deaths

PATH B: Fund the Definitive Trial

대상을 업데이트하기 전에 SPRINT 결과를 기다려야 합니다

↓

SPRINT demonstrates benefit. Update target to <120 for high-risk patients.

OUTCOME: Estimated 100,000+ lives saved globally

JNC 7 (2003): <140

Years of uncertainty

SPRINT(2015): <고위험군에 대해 120

THE REVELATION

"치료 표준"은 고정되어 있지 않습니다. 시험이 가정에 도전하면 변화됩니다. 지난 10년 동안 누구도 당연한 질문을 테스트하지 않았기 때문에 환자들은 제대로 치료받지 못했습니다.

Module 8 Quiz

1. ACCORD 시험이 조기 중단된 이유는 무엇입니까?

A. Intensive control showed clear cardiovascular benefit

B. Intensive control increased mortality

C. 등록이 너무 느렸습니다

D. Budget ran out

2. What does a prediction interval tell us that a confidence interval doesn't?

A. The true effect is more precisely estimated

B. 표본 크기가 적절합니다

C. 새 연구에서 기대할 수 있는 효과의 범위

D. 수학적 공식 사용

3. According to ICEMAN, which factor is MOST important for subgroup credibility?

A. 하위군 가설의 사전 지정

B. Large sample size in the subgroup

C. Statistically significant p-value

D. Multiple outcomes showing same direction

연구가 일치하지 않을 경우

이견을 듣습니다.

이질성은 노이즈가 아니라 메시지입니다.

증거가 없다고 해서 부재의 증거는 아닙니다.

모듈 9: 숨겨진 연구

증거가 없다고 해서 부재의 증거는 아닙니다.

Reboxetine: 2010

빛을 본 적이 없는 74%

모듈 9: 숨겨진 연구

🎯 Learning Objectives

Interpret funnel plots for asymmetry detection
Egger의 테스트를 적용하고 출판 편견에 대한 기타 통계 테스트
편향 조정을 위한 다듬기 및 채우기 방법 구현
Critically appraise the limitations of publication bias tests
"증거의 부재는 부재의 증거가 아닙니다"라는 원칙 적용

The Year: 1997

"A new hope for depression patients who cannot tolerate SSRIs..."

레복세틴(Edronax)은 선택적 노르에피네프린 재흡수 억제제(NRI)인 새로운 항우울제였습니다. SSRI와 달리 다른 신경전달물질 시스템을 표적으로 삼았습니다. 플루옥세틴이나 설트랄린에 실패하거나 내약성이 없는 환자에게 새로운 메커니즘을 제시했습니다.

1997

EU approval

50+

Countries approved

Millions

Prescriptions written

공개된 증거

What doctors could find in medical journals:

Comparison	Published Trials	Published Result
Reboxetine vs Placebo	3 trials (n=507)	Significantly better (SMD = 0.56)
Reboxetine vs SSRIs	4 trials (n=628)	Equivalent or better

출판된 문헌에서는 명확한 이야기를 들려주었습니다.

Reboxetine works. Patients benefit. Prescribe with confidence.

하지만 볼 수 없었던 임상시험은 어떻습니까?

In 2010, German researchers at IQWiG made a request to the European Medicines Agency...

They demanded access to all 데이터—공개 및 미공개.

What they found changed everything.

완전한 그림

Eyding et al., BMJ 2010

Comparison	Published Only	ALL DATA
Reboxetine vs Placebo	SMD 0.56 (benefit)	SMD 0.10 (no benefit)
Patients in analysis	507 (14%)	2,731 (100%)
Reboxetine vs SSRIs	Equivalent	열등함(위험에 대해 RR 1.23)
Patients in analysis	628 (26%)	2,411 (100%)

환자 데이터의 74%는 게시되지 않았습니다

숨겨진 임상시험에서는 이점이 없었고 그 이상을 보였습니다. 피해

🔍

Investigation Exercise: The File Drawer

귀하는 2008년 체계적 검토자입니다. 귀하는 모든 reboxetine 임상시험에 대해 PubMed, Embase 및 Cochrane Library를 검색합니다. 7개의 발표된 임상시험에서 이점이 나타났습니다.

이 증거를 신뢰할 수 있습니까?

⚠️ 깔때기가 크게 비대칭입니다!

발표된 모든 연구는 한쪽에 모여 있습니다. Null 및 Negative 시험은 어디에 있습니까?

출판 편견 툴킷

1

Funnel Plot

Plot effect size vs. standard error. A symmetric funnel suggests no bias; asymmetry raises alarms.

2

Egger's Regression Test

Regress effect/SE on 1/SE. A non-zero intercept (P < 0.10) suggests small-study effects. Note: inflated false-positive rate with binary outcomes; use Peters' test instead.

3

Peters' Test

For binary outcomes, regresses log OR on inverse of total sample size. Less prone to false positives.

4

Trim-and-Fill

"누락된" 연구를 대입하여 깔대기 대칭을 만든 다음 통합 효과를 다시 계산합니다.

📊

대화형: 다듬기 및 채우기 분석

리복세틴 데이터에 다듬기 및 채우기를 적용하고 조정된 추정치가 무엇인지 살펴보겠습니다...

Published Only

7 trials

SMD = 0.56

Significant benefit

Trim-and-Fill

7 + 5 imputed = 12 trials

SMD = 0.23

Reduced, still nominally significant

But even trim-and-fill underestimated the problem!

모든 데이터의 실제 효과는 SMD = 0.10(기본적으로 null)이었습니다.
Trim-and-fill is conservative—it doesn't fully correct for selective publication.

The Best Defense: Trial Registries

출판 편향 탐지 방법은 불완전합니다. 실제 해결책은 prospective registration.

ClinicalTrials.gov

US registry (2000)

WHO ICTRP

Global portal

PROSPERO

Review registration

시험을 검색할 때 항상 레지스트리를 확인하는 것입니다. registered 시도 횟수를 published횟수와 비교하세요. 격차는 경고 신호입니다.

Since 2005, ICMJE requires trial registration as a condition of publication.

AllTrials 캠페인

"All trials registered. All results reported."

레복세틴 스캔들은 다른 약물의 유사한 사례와 함께 세계적인 움직임을 촉발했습니다.

✓

2013: EMA 임상 데이터 정책

European Medicines Agency commits to publishing clinical study reports

✓

2016: FDA Amendments Act enforcement

Mandatory results reporting on ClinicalTrials.gov within 12 months

✓

AllTrials Coalition

Over 90,000 supporters, 700+ organizations demanding transparency

레복세틴 여파

!

Germany's IQWiG recommended against reboxetine for depression

!

영국 NICE는 이를 "권장하지 않음"으로 강등했습니다

!

FDA는 2001년에 레복세틴을 거부했습니다(그들은 미공개 데이터에 접근할 수 있었습니다)

10년 넘게 환자들은 레복세틴보다 나을 것이 없는 약을 받았습니다. 위약.

긍정적인 임상시험만 발표되었기 때문입니다.

STORY: 파록세틴 연구 329 속임수

발표된 결론이 실제 데이터와 반대라면 어떻게 될까요?

REAL DATA

GlaxoSmithKline의 연구 329개의 파록세틴이 테스트되었습니다. adolescent depression. 출판된 논문(2001)에서는 파록세틴이 "generally well tolerated and effective." 실제 데이터: 파록세틴 failed on all 8 pre-specified outcomes. When re-analyzed (RIAT 2015), suicidal/self-harm events: 파록세틴 그룹에서 23명, 위약 그룹에서 5라고 결론지었습니다. 출판된 논문은 사후 결과를 제조 중요성에 따라 재정의했습니다. 2015년에 원래 임상 연구 보고서 를 사용한 RIAT(Restoring Invisible and Abandoned Trials) 재분석에서는 파록세틴이 neither safe nor effective for adolescents.

처방자의 수수께끼: 2003

당신은 아동정신과 의사입니다. 유일한 대규모 임상시험인 연구 329에서는 파록세틴이 10대에도 효과가 있다고 말합니다. 그러나 FDA는 청소년에게는 이를 승인하지 않았습니다. 부모님께서 처방해 달라고 하십니다. 무엇을 하시나요?

경로 A: 출판물을 신뢰하세요

A peer-reviewed JAACAP paper says it works. Prescribe off-label.

↓

Millions of prescriptions worldwide. Suicidal events in adolescents.

OUTCOME: FDA issues black box warning for SSRIs in youth (2004)

PATH B: Check the Trial Registry

원래 종료점을 찾으려면 ClinicalTrials.gov를 검색하세요. 게시된 결과가 등록된 프로토콜과 일치하지 않습니다.

↓

빨간색 플래그: 결과 전환이 감지되었습니다. 당신은 약을 보류합니다. 환자는 더 안전합니다.

OUTCOME: Publication bias identified before harm

THE REVELATION

출판 편견은 단지 누락된 연구에 관한 것이 아닙니다. 이는 출판된 연구에서 진실이 누락된 것에 관한 것입니다. 결과 전환, 대필 및 선택적 보고는 실패한 시험을 마케팅 도구로 바꿀 수 있습니다. 게시된 결과를 항상 시험 등록 프로토콜과 비교하세요.

Module 9 Quiz

1. 출판된 문헌에서 숨겨진 레복세틴 시험 데이터의 비율은 얼마나 됩니까?

A. 25%

B. 50%

C. 74%

D. 90%

2. Why can trim-and-fill underestimate the correction needed?

A. It assumes effects are normally distributed

B. 대칭을 달성하기 위한 연구만 전가하며 이는 현실을 완전히 반영하지 못할 수 있습니다

C. 최소 20개의 연구가 필요합니다

D. 매우 대규모 연구에서만 작동합니다

3. What is the best prospective defense against publication bias?

A. Funnel plots in all meta-analyses

B. Egger's test before pooling

C. Prospective trial registration

D. More medical journals

할 수 없는 것

may be more important than what you can.

증거가 없다고 해서 부재의 증거는 아닙니다.

Certainty must be earned, not assumed.

모듈 10: 확실성

Certainty must be earned, not assumed.

Early Surfactant: 2012

고품질 증거가 발전하는 경우

모듈 10: 확실성

🎯 Learning Objectives

전체 GRADE 프레임워크를 적용하여 다음의 확실성을 평가합니다. 증거
Evaluate all five downgrade factors (RoB, inconsistency, indirectness, imprecision, publication bias)
Identify when to upgrade for large effect, dose-response, or confounding
Construct Summary of Findings tables with absolute effect estimates
"가정이 아니라 확실성을 얻어야 합니다"라는 원칙을 적용하십시오.

The Year: 1990s

"A revolution in neonatal care..."

호흡곤란증후군(RDS)은 미숙아 사망의 주요 원인이었습니다. 폐포 붕괴를 방지하는 물질인 외인성 surfactant의 개발은 신생아 의학의 가장 큰 발전 중 하나였습니다.

질문은 언제 계면활성제를 투여해야 합니까?

Prophylactically (to all high-risk infants) or selectively (only after RDS develops)?

오리지널 코크란 리뷰 (2003)

Multiple RCTs conducted before the era of routine CPAP

Outcome	Prophylactic vs Selective	Certainty
Neonatal mortality	RR 0.73 (favors prophylactic)	High
BPD or death	RR 0.84 (favors prophylactic)	High

Recommendation: Give surfactant prophylactically

Guidelines worldwide adopted this approach

그러나 신생아 관리의 세계는 변화하고 있었습니다...

A new technology emerged: Continuous Positive Airway Pressure (CPAP)

Non-invasive support that could help preterm lungs without intubation.

기존 근거가 여전히 적용될까요?

2012년 코크란 업데이트

New trials conducted in the CPAP era

Outcome	Old Trials	New Trials
BPD or death	RR 0.84 (favors prophylactic)	RR 1.12 (favors selective)
기계적 필요성 환기	예방적 측면에서는 더 낮음	예방적 측면에서는 더 높음!

Complete Reversal

In the CPAP era, prophylactic surfactant causes more harm

🔍

Investigation: Why Did Evidence Evolve?

당신은 신생아 전문의입니다. 동료가 묻습니다. "무작위 시험이 어떻게 서로 모순될 수 있나요?"

원래 증거가 잘못되었나요?

1

Indirectness Changed

Old trials: No CPAP available. New trials: CPAP standard of care.

2

비교기가 개선되었습니다

Selective surfactant + CPAP is better than prophylactic intubation.

3

Context Matters

한 시대의 증거는 다음에는 적용되지 않을 수 있습니다.

This is why GRADE assesses Indirectness!

High-quality evidence can become inapplicable when context changes.

GRADE 프레임워크

Grading of Recommendations, Assessment, Development and Evaluations

GRADE는 다음 질문에 답합니다. 이 추정치를 얼마나 확신합니까?

⊕⊕⊕⊕ HIGH: Very confident. True effect is close to the estimate.

⊕⊕⊕◯ MODERATE: Moderately confident. True effect likely close, but may differ substantially.

⊕⊕◯◯ LOW: Limited confidence. True effect may differ substantially.

⊕◯◯◯ VERY LOW: Very little confidence. True effect likely substantially different.

GRADE: Factors That Downgrade Certainty

RCT 증거는 HIGH에서 시작됩니다. 다음 이유로 인해 다운그레이드될 수 있습니다.

1

Risk of Bias

Flawed randomization, lack of blinding, incomplete follow-up, selective reporting

2

Inconsistency

Unexplained heterogeneity across studies (large I², non-overlapping CIs)

3

Indirectness

질문에서 모집단, 중재, 비교자 또는 결과의 차이

4

Imprecision

Wide confidence intervals, small sample size, few events

등급: 5번째 요소

5

Publication Bias

Asymmetric funnel plot, missing registered trials, sponsor influence

Each factor can downgrade by one or two levels

High → Moderate → Low → Very Low

Example: 비뚤림 위험이 높고(↓1) 심각한 간접성(↓1)이 있는 RCT(HIGH로 시작)에 대한 메타 분석은 다음과 같습니다. 등급을 매기세요 LOW.

📊

Interactive: Apply GRADE to Surfactant

오래된 시험과 새로운 시험을 비교하여 예방적 계면활성제에 대한 근거의 확실성을 평가해 보겠습니다.

OLD TRIALS (Pre-CPAP)

Starting: HIGH (RCTs)

Risk of Bias: Low (−0)

Inconsistency: None (−0)

Indirectness: Serious (−1)

Different standard of care today

Final: ⊕⊕⊕◯ MODERATE

NEW TRIALS (CPAP Era)

Starting: HIGH (RCTs)

Risk of Bias: Low (−0)

Inconsistency: None (−0)

Indirectness: None (−0)

Matches current practice

Final: ⊕⊕⊕⊕ HIGH

GRADE: Factors That Upgrade Certainty

관찰적 근거는 낮음에서 시작합니다. 다음과 같이 업그레이드할 수 있습니다.

+1

Large Magnitude of Effect

타당한 혼란 없이 RR >2 또는 <0.5

+1

Dose-Response Gradient

Higher exposure = larger effect in a consistent pattern

+1

Residual Confounding

All plausible confounders would reduce the effect (strengthens causal inference)

Communicating Certainty

GRADE requires transparent language about confidence:

HIGH: "Prophylactic surfactant reduces mortality..."

MODERATE: "Prophylactic surfactant probably reduces mortality..."

LOW: "Prophylactic surfactant may reduce mortality..."

VERY LOW: "We are uncertain whether prophylactic surfactant reduces mortality..."

이 언어는 임상의가 증거의 강점을 이해할 수 있도록 해줍니다.

스토리: 미숙아 산소 역설

Can too much of a lifesaver become a killer?

REAL DATA

1940s-50s: High oxygen concentrations saved premature babies from respiratory failure. Then came an epidemic of blindness—retrolental fibroplasia (now called ROP). Doctors reduced oxygen dramatically. Blindness dropped. But then: increased deaths and brain damage 에서 저산소증. 필요한 최적의 산소 수준 decades of trials to find. Recent SUPPORT/BOOST II trials finally defined the therapeutic window: SpO2 91-95%.

신생아과 전문의의 딜레마: 1955

당신은 신생아 전문의입니다. 고산소를 섭취하는 미숙아는 실명하게 됩니다. 무엇을 하시나요?

PATH A: Dramatic Reduction

Drastically reduce oxygen to prevent blindness

↓

Blindness rates drop. But some babies die or suffer brain damage from hypoxia.

OUTCOME: Trading one harm for another

경로 B: 체계적 연구

주의 깊게 산소 적정, 용량-반응 관계 연구

↓

Takes decades but eventually identifies the optimal range.

OUTCOME: Optimize both survival and vision

1940s: High O2 saves lives

1950s: Blindness epidemic

1960년대~70년대: 낮은 O2로 인한 사망

2010s: SUPPORT/BOOST define optimal range

THE REVELATION

모든 개입에는 치료 창이 있습니다. 이를 찾으려면 가정이 아닌 측정이 필요합니다. 증거가 균형을 정의하기 전까지 진자는 60년 동안 흔들렸습니다.

Module 10 Quiz

1. 계면활성제 권장사항이 2003년과 2012년 사이에 반전된 이유는 무엇입니까?

A. 원래 실험은 사기였습니다

B. CPAP changed the comparator (indirectness)

C. Not enough patients in original trials

D. 결과가 다르게 측정되었습니다

2. 다음 중 GRADE 하향 요인이 아닌 것은 무엇입니까?

A. Risk of bias

B. Imprecision

C. Publication bias

D. Large magnitude of effect

3. 확실성이 낮은 증거에는 어떤 언어를 사용해야 합니까?

A. "개입이 감소할 수 있습니다..."

B. "개입이 아마도 감소할 것입니다..."

C. "개입이 감소할 수 있습니다..."

D. "불확실합니다 여부..."

숫자만으로는 충분하지 않습니다.

얼마나 확신하는지 전달해야 합니다.

Certainty must be earned, not assumed.

방법은 환자를 신뢰로부터 보호합니다.

모듈 11: 생활 검토

방법은 환자를 신뢰로부터 보호합니다.

COVID-19 Hydroxychloroquine: 2020

긴급한 증거가 충족된 경우

모듈 11: 생활 검토

🎯 Learning Objectives

시험 순차 분석을 적용하여 증거가 충분한지 확인
생생한 체계적 검토 설계 및 유지
Establish update triggers and futility/harm boundaries
Manage multiplicity and alpha-spending in sequential analyses
Explain how rapid evidence synthesis evolved during COVID-19

March 2020: A World in Crisis

"바이러스는 우리보다 빠르게 확산됩니다. 이해합니다..."

COVID-19로 인해 수천 명이 사망했습니다. 중환자실이 넘쳤습니다. 백신도 없고 치료법도 없었습니다. 그런 다음 희미한 희망: hydroxychloroquine (HCQ)—an old malaria drug—showed antiviral activity in lab studies.

March 20

Gautret 연구(프랑스)

36 pts

Non-randomized

Viral

Clearance improved

채택을 위한 돌진

Gautret 연구 후 몇 주 이내:

!

March 28: FDA issues Emergency Use Authorization for HCQ

!

April 4: India bans HCQ export (hoarding fears)

!

Global: Shortages affect lupus and rheumatoid arthritis patients

Millions received HCQ based on a 36-patient observational study

What could go wrong?

🔍

조사: Gautret 연구

당신은 프랑스 HCQ 연구를 평가해 달라는 EBM 전문가입니다. 설계 검토...

Issue	Impact
Non-randomized	Selection bias—who got HCQ?
6 patients excluded	3 went to ICU, 1 died, 1 withdrew, 1 had nausea
Surrogate outcome	Viral load, not clinical outcomes
다른 병원의 통제	Different care, different testing
No blinding	Expectation bias in lab testing

이 연구는 RoB 2.0에 대한 편향 위험이 높을 것입니다

GRADE certainty: VERY LOW. Yet it changed global policy.

Why Observational COVID Studies Misled

1

Immortal Time Bias

Patients must survive long enough to receive treatment. Survivors are compared to non-survivors.

2

Confounding by Indication

Sicker patients may get different treatments. Healthier patients received HCQ early.

3

Healthy User Effect

Patients who seek treatment tend to be healthier overall.

4

Outcome Reporting

긍정적인 결과가 있는 연구가 더 빨리 발표되었습니다.

2020년 6월: RCT 보고서

Large, rigorous trials completed at remarkable speed

Trial	N	Result
RECOVERY (UK)	4,716	No benefit on mortality (RR 1.09)
WHO SOLIDARITY	954	No benefit (RR 1.19)
ORCHID (US)	479	중단됨 무용성

HCQ provided no benefit—and may have caused harm

June 15, 2020: FDA revokes Emergency Use Authorization

📊

타임라인: 관찰 증거와 RCT 증거

March-May 2020

Observational: ~20 studies

Suggest benefit

Pooled OR ~0.65

June-July 2020

RCTs: RECOVERY, SOLIDARITY

Show no benefit/harm

Pooled RR ~1.10

3개월 내에 "유망"에서 "비효과"로

이것이 우리가 무작위 배정과 진화하는 증거를 추적하기 위한 실시간 검토가 필요한 이유입니다.

Living Systematic Reviews

빠르게 진화하는 새로운 접근 방식 증거:

1

Continuous Surveillance

새로운 증거를 찾기 위해 매주 또는 매일 문헌 검색

2

Cumulative Meta-Analysis

Update pooled estimates as each new trial reports

3

TSA(시험 순차 분석)

Determine when sufficient information has accumulated to conclude

4

Transparent Versioning

Track every change, maintain full audit trail

TSA(시험 순차 분석)

When have we learned enough?

TSA는 메타 분석에 정지 경계를 적용합니다. 이는 단일 임상시험의 중간 분석과 유사합니다. 이는 required information size (RIS) needed to detect or exclude a clinically meaningful effect.

RIS

Required sample size

α-spending

Controls type I error

Boundaries

Benefit / Harm / Futility

COVID의 HCQ에 대해 TSA가 2020년 6월까지 무용성 경계를 넘어섰다는 것을 보여주었습니다.

편향이 만연한 HCQ 사가에서 얻은 교훈

1. Observational studies can mislead spectacularly 같은 방향을 가리키는 많은 연구라도 틀릴 수 있습니다.

2. RCTs can be conducted quickly when the will exists. RECOVERY enrolled 5,000+ patients in weeks.

3. 생생한 리뷰는 필수입니다 for evolving topics. Fixed-point-in-time reviews become obsolete instantly.

4. Political pressure doesn't change biology. 압박한 상황에서도 엄격한 방법으로 환자를 보호합니다.

스토리: LEAP 땅콩 알레르기 혁명

1997년부터 2008년까지 예방이 원인이라면 어떨까요?

REAL DATA

For decades, pediatric guidelines recommended: avoid peanuts in infancy to prevent allergy. Meanwhile, peanut allergy rates tripled 그런 다음 왔습니다 LEAP (2015): 640 high-risk infants randomized to early peanut introduction vs. avoidance. Result: Early introduction reduced peanut allergy by 81% (1.9% 대 13.7%). 예방 전략이 전염병을 일으키고 있었습니다.

알레르기 의사의 갈림길: 2010

당신은 소아 알레르기 전문의입니다. 회피 지침에도 불구하고 땅콩 알레르기가 증가하고 있습니다. 교리에 의문을 제기하십니까?

PATH A: Follow Guidelines

Continue recommending peanut avoidance in high-risk infants

↓

Guidelines are "evidence-based." Safe to follow consensus.

OUTCOME: Peanut allergies continue to rise

PATH B: 교리에 의문을 제기하세요

Design a trial to test if early introduction might be protective

↓

LEAP trial reveals the truth. Guidelines reverse worldwide.

OUTCOME: Prevent an epidemic

2000: AAP recommends avoidance

2008: Allergy rates triple

2015: LEAP가 증거를 뒤집습니다

2017: Guidelines flip to early introduction

THE REVELATION

"먼저 해를 끼치지 마십시오"에는 증거가 필요합니다. 선의의 가정이라도 대규모로 해를 끼칠 수 있습니다. 면역체계는 내성을 키우기 위해 노출이 필요했습니다. 회피를 하면 감작이 발생했습니다.

Module 11 Quiz

1. 고트레 하이드록시클로로퀸 연구의 가장 큰 문제점은 무엇이었나요?

A. Too few patients

B. No blinding

C. Excluding patients who deteriorated

D. Too short follow-up

2. What does Trial Sequential Analysis help determine?

A. Which studies have high risk of bias

B. When enough evidence has accumulated

C. 이질성 정도

D. Which treatment is best

3. 관찰 연구에서는 왜 HCQ 이점이 있는 것으로 나타났으나 RCT에서는 그렇지 않았습니까?

A. RCTs enrolled sicker patients

B. RCTs used different outcomes

C. 관찰 연구의 편견

D. 관찰 연구에서 더 나은 데이터가 나타났습니다

Speed cannot replace rigor.

But rigor can be fast.

Living reviews balance both.

모든 신호가 진실은 아닙니다.

모듈 12: 고급 방법

모든 신호가 진실은 아닙니다.

Advanced Methods

Beyond pairwise meta-analysis.

모듈 12: 고급 방법

🎯 Learning Objectives

Interpret network meta-analysis geometry and SUCRA rankings
Apply bivariate models for diagnostic test accuracy meta-analysis
Conduct dose-response meta-analysis with flexible splines
Understand when individual patient data (IPD) meta-analysis is needed
각 고급 방법

Pairwise가 충분하지 않은 경우

"때때로 질문은 A와 B보다 더 복잡합니다..."

배운 방법이 기초를 형성합니다. 그러나 임상 현실은 더 많은 것을 요구하는 경우가 많습니다. Which of 10 antidepressants is best? What's the optimal dose of statin? Does this test accurately diagnose early cancer?

이 모듈에서는 각각 서로 다른 복잡한 질문에 답하는 4가지 고급 방법을 소개합니다.

Network Meta-Analysis (NMA)

When you have many treatments but few head-to-head trials

NMA combines direct evidence (A vs B) with indirect evidence (A vs C, B vs C → inferred A vs B) to compare multiple treatments simultaneously.

SUCRA

Ranking probabilities, not effect size

Consistency

Direct = Indirect?

Networks

Visualize evidence

🔍

NMA Example: Antidepressants

The landmark Cipriani 2018 NMA compared 21 antidepressants using 522 trials.

The Challenge

21 drugs, but not every pair tested head-to-head

Many vs. placebo, few vs. each other

The Solution

NMA는 네트워크 전반에 걸쳐 직간접적 증거를 결합합니다.

효용성과 수용성에 대해 21가지 모두 순위를 매깁니다.

결과: 일부 약물은 효능이 더 높게 평가되고 다른 약물은 수용도가 더 높게 평가됩니다.

보편적으로 "최고"인 단일 약물은 없습니다. 신뢰할 수 있는 간격, 전이성 및 임상적 절충을 통해 순위를 해석합니다.

NMA: Critical Assumptions

1

Transitivity

Effect modifiers should be similarly distributed across comparisons; otherwise indirect comparisons may be biased

2

Consistency

직접 및 간접 증거가 동의합니다(테스트 가능)

3

Connected Network

All treatments linked through at least one common comparator

When assumptions fail, NMA can mislead

항상 이행성을 평가하고 다음에 대한 테스트를 수행합니다. 불일치.

Dose-Response Meta-Analysis

최적 선량 찾기

Uses the Greenland-Longnecker method 제한된 입방체 스플라인을 사용하여 복용량과 효과 사이의 비선형 관계를 모델링합니다.

1

Non-linear patterns

J-shaped (alcohol & mortality), U-shaped (vitamin D), threshold (aspirin)

2

Clinical relevance

단순히 "많을수록 좋다"가 아닌 최고의 이익-해로움 균형을 갖춘 복용량을 찾습니다.

개별 환자 데이터(IPD)

하위군 분석을 위한 최적의 표준

Instead of published summary data, obtain raw 시험 참가자의 환자 수준 데이터 정확한 하위 그룹 분석, 이벤트 시간 모델링 및 표준화된 정의가 가능합니다.

One-Stage

Single hierarchical model (not mega-trial)

Two-Stage

Analyze, then pool

80%+ target

데이터 가용성 목표

Early Breast Cancer Trialists' Collaborative Group은 1980년대에 IPD MA를 개척했습니다.

Diagnostic Test Accuracy (DTA)

"개입"이 테스트

DTA meta-analysis synthesizes sensitivity (참양성률) 및 specificity (true negative rate)—two correlated outcomes requiring bivariate models.

1

Bivariate/HSROC Model

민감도와 특이도 간의 상관관계 설명

2

SROC Curve

95% 신뢰도 및 예측 영역이 있는 요약 ROC 곡선

3

QUADAS-2

Quality Assessment of Diagnostic Accuracy Studies

오른쪽 선택 방법

Question	Method
Does A beat B?	Pairwise MA
Which of many treatments is best?	Network MA (NMA)
최적의 복용량은 무엇입니까?	Dose-Response MA
Who benefits most? (subgroups)	IPD MA
이 테스트는 얼마나 정확합니까?	DTA MA
시간이 지남에 따라 효과가 어떻게 전개됩니까?	Survival/Time-to-Event MA

방법은 질문과 일치해야 합니다. 잘못된 방법으로 질문을 강요하지 마세요.

스토리: 패혈증 이야기의 스테로이드

Three large trials. Three different answers. What do you believe?

REAL DATA

CORTICUS (2008): 499 patients. Hydrocortisone in septic shock. No mortality benefit. ADRENAL (2018): 3,658 patients. Hydrocortisone. No mortality benefit. APROCCHSS (2018): 1,241 patients. Hydrocortisone + fludrocortisone. Mortality reduced (43% vs 49.1%, p=0.03). Same class of intervention. Different protocols. Different results.

가이드라인 작성자의 과제

패혈증 가이드라인을 작성하고 계십니다. 세 가지 주요 임상시험은 서로 일치하지 않습니다. 어떻게 추천하시나요?

PATH A: Simple Average

Pool all three trials. Overall effect uncertain. Conclude "evidence unclear."

↓

Guidelines say steroids are optional. No strong recommendation.

OUTCOME: Clinicians left without clear guidance

PATH B: Investigate Heterogeneity

Analyze why APROCCHSS differed (fludrocortisone, longer duration, different population)

↓

유효한 프로토콜과 비효과적인 프로토콜이 다른지 확인하세요.

OUTCOME: Recommend the specific effective protocol

THE REVELATION

상충되는 시도는 실패가 아닙니다. 이는 치료가 효과가 있는 곳과 그렇지 않은 곳을 나타내는 지도입니다. 임상시험 간의 차이점(용량, 기간, 공동 개입, 모집단)이 이해의 핵심입니다.

Module 12 Quiz

1. 쌍별 분석에 비해 네트워크 메타 분석의 주요 장점은 무엇입니까?

A. 데이터 추출이 필요하지 않습니다

B. It compares treatments not directly tested against each other

C. 편향 평가의 위험이 필요하지 않습니다

D. It produces better forest plots

2. Why does DTA meta-analysis require bivariate models?

A. To handle more than two studies

B. 출판 편향을 조정하려면

C. 민감도와 특이성은 다음과 같습니다. 상관 관계

D. To generate forest plots

3. What does the "consistency" assumption in NMA require?

A. All studies must be high quality

B. 직접 증거와 간접 증거가 일치해야 합니다

C. Sample sizes must be similar

D. No missing studies

Methodologist

강좌 생태계

이 강좌는 전체 체계적 검토 작업 흐름을 다룹니다. 심층적인 내용을 알아보려면 동반 과정을 살펴보세요.

DTA Course
Bivariate/HSROC, SROC curves, QUADAS-2

Risk of Bias Mastery
RoB 2, ROBINS-I/E, domain-level assessment

GRADE Certainty
Full SoF tables, GRADE-CERQual

IPD Meta-Analysis
One-stage/two-stage, mixed-effects models

Publication Bias Detective
Copas, PET-PEESE, p-curve, selection models

Umbrella Reviews
AMSTAR 2, ROBIS, overlap correction

Prognostic Reviews
CHARMS, PROBAST, c-statistic pooling

Living Reviews + Rapid Reviews
TSA, update triggers, abbreviated methods

Module 12 Complete

"방법은 질문과 일치해야 합니다. 고급 방법은 고급 질문에 답하지만 기본은 변하지 않습니다."

핵심 작업 흐름을 마스터하셨습니다. 다음 10개 모듈에서는 베이지안 추론, 네트워크 메타 분석, 개별 환자 데이터, 용량-반응 모델링, 견고성 및 취약성, 형평성, AI 지원 합성, 정성적 증거, 다변량 방법 및 재현성 등 미개척 영역을 탐구합니다.

모든 신호가 진실은 아닙니다.

모듈 13: 베이지안 회전

모든 신호가 진실은 아닙니다.

모듈 13: 베이지안 회전

🎯 Learning Objectives

빈도주의 추론과 베이지안 추론의 차이점을 설명하세요
Interpret prior distributions, likelihoods, and posterior distributions
Distinguish credible intervals from confidence intervals
Understand when Bayesian meta-analysis offers advantages
Recognize how prior choice affects conclusions

이야기 시작: STAMPEDE

In 2005, a trial began

that would never truly end.

전립선암에 대한 STAMPEDE 임상시험은 MAMS(다군, 다단계) 플랫폼 설계를 사용했습니다. 증거가 축적되면 무기를 추가하거나 삭제할 수 있습니다. 통계는 빈도주의적이었지만 적응 철학은 데이터가 축적됨에 따라 결정을 업데이트한다는 베이지안 정신을 구현했습니다.

빈번주의자의 세계관

In frequentist statistics, probability means long-run frequency. 95% CI는 "실제 효과가 내부에 있을 확률이 95%"라는 의미는 아닙니다. 즉, 연구를 무한히 반복하면 간격의 95%에 진실이 포함됩니다.

p-value

P(H₀ | data)가 아닌 P(data | H₀)

95% CI

믿음이 아닌 커버리지 속성

Fixed

참 매개변수는 고정

베이지안 세계관

In Bayesian statistics, probability represents degree of belief. We start with a prior (데이터 이전에 우리가 믿는 것), likelihood (데이터가 우리에게 알려주는 것)로 업데이트하고 a posterior (updated belief).

1

Prior × Likelihood = Posterior

베이즈 정리: P(θ|data) ∝ P(data|θ) × P(θ)

2

Credible Intervals

95% 신뢰할 수 있는 간격은 지정된 모델 및 사전 조건에 따라 확률적으로 해석 가능합니다.

Researcher

Choosing Priors

1

Non-informative (Vague)

정규(0, 10000) 또는 균일. 데이터가 지배하게 하세요. 빈도주의 결과를 모방합니다.

2

Weakly Informative

Normal(0, 1) for log-OR. Regularizes extreme estimates while remaining flexible.

3

Informative

Based on previous evidence. Powerful but controversial. Must be pre-specified.

4

Half-Cauchy for τ

Recommended for heterogeneity. Half-Cauchy(0, 0.5) allows large τ but concentrates near zero.

Researcher

MCMC Sampling

Most Bayesian models cannot be solved analytically. We use Markov Chain Monte Carlo (MCMC)는 후방에서 샘플을 추출합니다. 도구: JAGS, Stan, brms (R), PyMC (Python).

Chains

Multiple independent chains (typically 4)

R̂

Convergence: R̂ < 1.01 (strict; older texts use < 1.1)

ESS

Bulk-ESS > 400(수단용); CI

Methodologist

Bayesian Model Averaging

Instead of choosing between fixed-effect and random-effects models, Bayesian model averaging (BMA)의 경우 tail-ESS > 400은 사후 확률을 기준으로 각 모델에 가중치를 부여합니다. 이는 최종 추정치의 모델 불확실성을 설명합니다.

BF

Bayes Factors

BF₁₀ > 10 = H₁에 대한 강력한 증거입니다. BF₁₀ < 1/10 = H₀에 대한 강력한 증거.

대화형 도구 자리 표시자

Interactive: Posterior Visualizer

사전 강도를 조정하여 사후 강도에 어떤 영향을 미치는지 확인하세요. 더 많은 데이터가 이전 데이터를 압도하는 방식을 확인하십시오.

Prior Strength: Vague

Prior Mean (log-OR): 0.00

STAMPEDE 스토리

STAMPEDE는 진행성 전립선암 치료법을 비교하는 5개 연구 기관과 함께 2005년에 시작되었습니다. 2016년까지 아비라테론을 추가하여 사망률이 37% 감소한 것으로 나타났습니다(HR 0.63, 95% CI 0.52–0.76).

플랫폼 설계는 베이지안 적응적 사고를 구현합니다. 즉, 중간 분석을 통해 치료군 선택을 안내하고, 새로운 치료군이 치료제 출시에 따라 진입할 수 있으며, 쓸데없는 치료군은 조기에 중단되어 비효과적인 환자를 구합니다.

STAMPEDE는 100개 이상의 센터에서 10,000명 이상의 환자를 등록했으며 전립선암 치료를 근본적으로 변화시켰습니다. 베이지안 사고방식을 사용하면 증거가 축적되어 실시간으로 결정을 내릴 수 있습니다.

Decision Tree: When to Go Bayesian?

Frequentist vs Bayesian Meta-Analysis

(1) 진정한 사전 정보가 있는 경우, (2) 확률적 진술이 필요한 경우("80% 확률 효과 > 0"), (3) 빈도주의 속성을 신뢰할 수 없는 연구가 거의 없는 경우, (4) 모델을 만들고 싶은 경우 베이지안을 선택하세요. 평균화.

Bayesian with weakly informative prior

A common practical default. Regularizes extreme estimates without forcing strong prior conclusions.

유익한 사전 정보가 있는 베이지안

사전 증거가 강력하고 사전 지정된 경우에만. 민감도 분석을 수행해야 합니다.

Stay frequentist

Simpler, well-understood. Preferred when k is large and no prior information.

Remember Module 1?

CAST Through a Bayesian Lens

CAST에 대한 베이지안 분석에서 기초 과학(항부정맥제가 PVC를 억제함)의 사전 정보를 사용했다면 사후 분석은 여전히 해로움 쪽으로 강하게 이동했을 것입니다. 데이터가 충분하면 사전 확률이 높더라도 가능성이 높아집니다. 교훈: 베이지안 방법은 나쁜 사전 확률로부터 보호하지 않지만 가정을 합니다. transparent.

Module 13 Quiz

Q1. What does a 95% Bayesian credible interval mean?

A. 95% of repeated experiments would produce intervals containing the true value

B. 실제 매개변수가 이 간격 내에 있을 확률은 95%입니다.

C. The interval has a 95% chance of being correct

D. 미래 데이터의 95%가 이 범위에 속할 것입니다.

Q2. 연구 간 이질성을 위해 권장되는 사전은 무엇입니까? (τ)?

A. Uniform(0, 100)

B. Normal(0, 1)

C. Half-Cauchy(0, 0.5)

D. Fixed at 0.5

Module 13 Complete

"베이지안 전환은 수학에 관한 것이 아닙니다. 정직성에 관한 것입니다. 즉, 가정을 가시화하는 것입니다."

모든 신호가 진실은 아닙니다.

모듈 14: 네트워크

방법은 환자를 신뢰로부터 보호합니다.

모듈 14: 네트워크

🎯 Learning Objectives

Explain why pairwise comparisons are insufficient when many treatments exist
Interpret network geometry (nodes, edges, thickness)
우울증에 대한 간접 증거의 역할
Interpret SUCRA rankings and league tables
Recognize when NMA assumptions are violated

A clinician faces a patient

을 이해하세요. 어떤 약이 있나요?

일반적으로 처방되는 항우울제는 21가지입니다. 대부분의 일대일 시험에서는 2개 또는 3개만 비교합니다. Cipriani et al. (2018, Lancet)은 522건의 임상시험과 116,477명의 환자를 단일 네트워크로 연결했습니다.

네트워크 메타 분석의 논리

1

Direct Evidence

Trials directly comparing A vs B give the most reliable estimate.

2

Indirect Evidence

A vs C, B vs C가 존재하면 A vs B를 추론할 수 있습니다. 이것이 "이행적"입니다. 가정.

3

Mixed Evidence

NMA combines both, weighted by precision, to rank all treatments simultaneously.

Interactive: Network Graph

각 노드는 처리입니다. 모서리 두께는 두 처리 방법을 비교하는 연구 수를 나타냅니다.

Researcher

Transitivity & Consistency

Transitivity: 간접 추정치(공통 비교기를 통한)는 직접 추정치에 가까워야 합니다. 이를 위해서는 효과 수정자가 비교 전반에 유사하게 배포되어야 합니다.

Consistency: 직접증거와 간접증거를 비교하는 통계적 검정. 전체(치료별 설계 상호 작용) 및 로컬(노드 분할) 테스트는 불일치 루프를 식별하는 데 도움이 됩니다.

Researcher

SUCRA & P-scores

SUCRA

누적 순위의 표면. 값이 높을수록 순위 확률이 높음을 의미하며 우월성이 보장되지는 않습니다.

P-score

순위 확률 요약에 대한 빈도주의적 유사체입니다. 효과 크기와 불확실성을 활용하여 해석합니다.

Caution: Ranking is seductive but misleading when differences between treatments are small or uncertain. Always report credible/confidence intervals alongside ranks.

Methodologist

Component NMA

When interventions are complex (e.g., behavioral + pharmacological), component NMA decomposes multi-component treatments to estimate the individual contribution of each component. Uses additive models: effect(A+B) = effect(A) + effect(B) + interaction.

Cipriani 네트워크

2018년 Lancet 분석에서는 21개 항우울제가 모두 위약보다 더 효과적인 것으로 나타났습니다. Amitriptyline, mirtazapine, venlafaxine이 효능 면에서 가장 높은 순위를 차지했습니다. 아고멜라틴, 플루옥세틴, 에스시탈로프람은 수용도 측면에서 가장 높은 순위를 기록했습니다(탈퇴율 최소화).

모든 결과에서 단일 약물이 '승리'한 것은 없습니다. 네트워크는 쌍별 분석에서는 보이지 않는 장단점을 보여주었습니다.

Decision Tree: Is NMA Appropriate?

NMA Feasibility Check

6가지 다른 스타틴을 비교하는 15개의 RCT가 있습니다. 일부 쌍에는 직접적인 증거가 있고 다른 쌍에는 없습니다.

Check transitivity, then fit NMA

비교를 통해 환자 모집단과 연구 설계가 충분히 유사한지 확인하십시오.

간접적 증거를 무시합니다

통계적 힘을 잃고 증거 기반에 공백이 남습니다.

Pool all into one pairwise comparison

구조를 위반합니다. 증거. 스타틴은 서로 다른 약물입니다.

Module 14 Quiz

Q1. NMA에서 간접 증거가 유효하려면 어떤 가정이 있어야 합니까?

A. Transitivity — effect modifiers are balanced across comparisons

B. Homogeneity — I² must be below 25%

C. All studies must have similar sample sizes

D. 모든 연구는 이중 맹검이어야 합니다.

Module 14 Complete

"네트워크는 쌍별 비교가 할 수 없는 것, 즉 치료 선택의 전체 환경을 확인합니다."

모든 신호가 진실은 아닙니다.

모듈 15: 개인

What was hidden in plain sight?

모듈 15: 개인

🎯 Learning Objectives

Explain why aggregate data can mask treatment–covariate interactions
Distinguish one-stage from two-stage IPD models
Recognize ecological bias in aggregate meta-analysis
Understand the practical challenges of IPD collection
Interpret treatment–covariate interaction plots

For decades, breast cancer trials

게시된 요약. 환자가 아닙니다.

초기 유방암 시험자 협력 그룹(EBCTCG)은 수백 건의 임상시험을 통해 100,000명이 넘는 여성으로부터 개인 기록을 수집했습니다. IPD 메타 분석에서는 타목시펜의 효능이 에스트로겐 수용체 상태에 크게 좌우된다는 사실이 밝혀졌습니다. 이는 집계 데이터에서는 보이지 않는 수치입니다.

요약이 은폐한 내용

발표된 모든 타목시펜 임상시험에서는 전반적인 결과가 보고되었습니다. 수백 건의 연구에서 타목시펜은 적당한 이점을 제공하는 것으로 나타났습니다. 그러나 "보통의 이익"은 심오한 진실을 숨긴 평균이었습니다.

숨겨진 하위 그룹 분할

RR 0.59

ER-positive subgroup: 41% reduction in recurrence

RR 0.97

ER-negative subgroup: essentially no benefit at all

반응이 있는 환자와 반응이 없는 환자를 혼합한 전체 통합 효과는 통계적으로 허구였습니다. 한 그룹에 대한 이점을 과장하고 다른 그룹에는 존재하지 않는 이점을 암시하는 "보통" 평균입니다.

개별 환자 데이터 집계

AD

Aggregate: published effect + CI only

IPD

Individual: raw patient-level records

IPD를 사용하면 (1) 일관된 결과 정의, (2) 환자 특성에 따른 하위 그룹 분석, (3) 이벤트 발생 시간 모델링, (4) 생태학적 편향 확인이 가능합니다. 이는 gold standard for exploring treatment effect modification.

Researcher

One-Stage vs Two-Stage IPD

1

Two-Stage

Analyze each study separately, then combine estimates (like standard MA). Simple but loses information.

2

One-Stage

단일 혼합 효과 모델을 모든 환자 데이터에 동시에 맞추는 것입니다. 상호작용과 희귀한 사건에 더욱 강력해졌습니다.

Key: 두 가지 모두 연구 클러스터링을 고려해야 합니다. IPD를 하나의 대규모 임상시험에서 나온 것처럼 통합하지 마십시오. 이렇게 하면 혼란이 발생합니다(심슨의 역설).

Methodologist

Ecological Bias

A meta-regression using study-level mean age might show older patients benefit more. But this could be ecological bias—연구 수준 연관성은 환자 수준 진실을 반영하지 않습니다. IPD만이 분리할 수 있습니다. within-study from between-study effects.

전체가 부분에 관한 거짓말을 할 때

심슨의 역설: 집계 데이터에 나타나는 추세는 데이터가 교란 변수에 의해 그룹화될 때 반전됩니다.

실제 역설

A mega-trial analysis found Treatment X beneficial overall. But 각 연구는 유해했습니다. 어떻게? 연구 간 기본 위험의 차이로 인해 환상이 생겼습니다. 아픈 인구 집단이 더 많은 치료를 받아 총 이익이 부풀려졌습니다.

Cates (2002, BMJ) 클러스터링을 고려하지 않고 여러 연구를 통합하면 효과의 명백한 방향이 바뀔 수 있음이 나타났습니다.

이것이 바로 IPD 1단계 모델이 연구를 클러스터링 변수로 포함하는 이유입니다. 이는 연구 간 혼란이 치료로 가장하는 것을 방지하기 위한 것입니다. 효과.

EBCTCG 유산

EBCTCG의 IPD 메타 분석은 40년 동안 유방암 치료법을 정의해 왔습니다. 타목시펜과 치료 없음에 대한 2005년 분석에서는 ER 양성 종양(RR 0.59)에서는 분명한 이점이 있었지만(RR 0.97) ER 음성 종양(RR 0.97)에서는 이점이 없었습니다.

IPD가 없었다면 전체 종합 효과가 두 그룹에 걸쳐 통합되었을 것입니다. 이득.

Decision Tree: When Is IPD Worth Pursuing?

Do you suspect treatment–covariate interactions?

Yes →

80% 이상의 시도에서 IPD를 얻을 수 있습니까?

Yes → One-stage IPD meta-analysis with interaction terms

No → 2단계: 사용 가능한 IPD 요청 + 나머지 집계

No →

Is ecological bias a concern?

Yes → IPD preferred even without interactions

No → Aggregate data meta-analysis may suffice

EBCTCG는 40년 동안 수백 번의 시도에서 데이터를 수집했습니다. 대부분의 IPD 메타 분석에는 5~20회의 시험이 포함됩니다. 결정은 야망이 아니라 질문에 달려 있습니다.

Methodologist

패턴 반복

모듈 3을 기억하시나요? HRT는 관찰 연구에서는 유익한 것으로 나타났지만 RCT에서는 유해한 것으로 나타났습니다. 동일한 집계 마스킹이 발생했습니다. 전체 이익은 하위 그룹의 피해를 숨겼습니다.

나중에 여성 건강 계획(Women's Health Initiative)의 IPD 분석에서는 timing mattered—폐경 후 10년 이내에 HRT를 시작한 여성은 나중에 시작한 여성과 다른 결과를 보였습니다. 게시된 집계 요약에서는 "타이밍 가설"이 보이지 않았습니다.

학습 내용이 반복됩니다. 집계 데이터는 중요한 치료-공변량 상호 작용을 모호하게 할 수 있습니다. 유방암의 ER 상태든 HRT 시기든 개인 수준 데이터는 요약에 숨겨진 내용을 드러냅니다.

Module 15 Quiz

Q1. 종합 데이터 메타 분석에 비해 IPD의 주요 장점은 무엇입니까?

A. 항상 더 많은 연구가 포함됩니다

B. 더 저렴하고 더 빨라짐

C. It can explore treatment–covariate interactions without ecological bias

D. 무작위 효과 모델이 필요하지 않습니다

Module 15 Complete

"모든 합동 추정치 뒤에는 집계가 알 수 없는 이야기를 가진 개인이 있습니다."

이질성은 노이즈가 아니라 메시지입니다.

모듈 16: 복용량

이질성은 노이즈가 아니라 메시지입니다.

모듈 16: 복용량

🎯 Learning Objectives

Explain why simple pairwise comparisons miss dose–response relationships
Distinguish linear, quadratic, and spline dose–response models
Interpret restricted cubic splines with knots
Identify threshold effects and J/U-shaped curves
Understand model comparison with AIC/BIC

수십년 동안 적당한 음주

는

"J자형 곡선"은 비음주자가 중간 정도 음주자보다 심혈관 사망률이 더 높은 것으로 나타났습니다. 그러나 Stockwell et al. (2016)은 J-곡선이 이전 음주자(질병으로 인해 끊은 사람)를 "금주자"로 잘못 분류한 결과임을 입증했습니다.

A Scientific Consensus Built on Sand

2010년까지 100개 이상의 관찰 연구에서 J-곡선이 확인되었습니다. 의학 교과서에서 가르쳤습니다. 심장 전문의들이 이를 인용했습니다. 와인 산업 로비스트는 이에 관한 컨퍼런스에 자금을 지원했습니다.

100+

J 곡선을 확인하는 관찰 연구

15–25%

Lower cardiovascular mortality in moderate drinkers vs abstainers

증거가 압도적인 것처럼 보였습니다. 하지만 비교 그룹인 "금주자"가 오염되었다면 어떨까요?

금단자

A Hidden Confounder

The Problem

People who stop drinking often do so because they are already ill—간 질환, 약물 상호 작용, 암 진단. 대부분의 연구에서는 이러한 "과거 음주자"를 "금주자"로 분류했습니다.

The Effect: The reference group (abstainers) appeared less healthy—금주가 해로웠기 때문이 아니라 아픈 사람들이 이에 동참했기 때문입니다.

When Stockwell et al. (2016, J Stud Alcohol Drugs) removed former drinkers and applied appropriate study-quality corrections: J 곡선이 사라졌습니다. 보호 효과는 환상이었습니다.

Dose–Response Meta-Analysis

Standard meta-analysis asks: "Does treatment X work?" Dose–response meta-analysis asks: "At what dose 치료 X가 가장 잘 작동합니까?" 이는 여러 연구에 걸쳐 용량 수준과 결과 사이의 관계를 모델링합니다.

Linear

Simplest: log(RR) = β × dose

Spline

Flexible: piecewise polynomials with knots

Fractional

Polynomial: dose^p1 + dose^p2

Researcher

Restricted Cubic Splines

RCS place knots 미리 지정된 용량 지점에서 그 사이의 부드러운 다항식을 맞춥니다. 일반적으로 용량 분포의 분위수에서 3~5노트입니다. 경계 너머의 선형 매듭. 비선형 테스트는 스플라인 모델을 더 단순한 모델과 비교합니다. 선형 모델.

AIC

Model Comparison

AIC/BIC는 선형과 스플라인 피팅을 비교합니다. 낮음 = 더 좋음. 또한 선형성에서의 이탈도 테스트합니다(스플라인 항에 대한 p-값).

Interactive: Dose–Response Builder

선형, 2차 및 스플라인 피팅을 비교하세요.

알코올 J-곡선의 정체가 밝혀졌습니다.

Stockwell의 2016년 재분석에서는 이전에 술을 마셨던 사람들이 "금주자" 참조 그룹에서 올바르게 제외되었을 때 적당한 음주의 보호 효과가 사라진다는 사실을 발견했습니다. J-곡선은 병든 포기 편향에 의해 주도되었습니다.

용량-반응 메타 분석을 통해 진실이 밝혀졌습니다. 곡선의 모양은 "제로 용량"을 정의하는 방법에 따라 결정적으로 달라집니다. 잘못된 참조 카테고리가 환상의 혜택을 만들어냈습니다.

When Curves Shape Policy

The phantom J-curve influenced alcohol guidelines worldwide:

UK

NHS Guidance (until 2016)

공식 지침에 "적당한 음주는 심장을 보호할 수 있습니다"가 나타났습니다. Stockwell의 수정 이후 영국은 all 음주에 대한 한도를 주당 14단위로 개정했습니다(이전에는 남성의 경우 21단위). "안전"하다고 선언된 양은 없습니다.

US

Dietary Guidelines Advisory Committee

J 곡선 연구는 2015년까지 인용되었습니다. 2020년 위원회는 참조 그룹 편향을 인정하여 남성의 경우 하루 1잔으로 한도를 낮출 것을 권장했습니다.

AU

Australian Guidelines

Safe drinking limits were delayed by industry-funded J-curve research promoting “cardioprotective” moderate intake.

Decision Tree: Is Dose-Response Analysis Appropriate?

노출 수준이 3 이상입니까(노출 수준이 아니라 노출 수준이 아님)? 노출되지 않음)?

Yes →

관계가 비선형적입니까?

Yes → Restricted cubic splines (3–5 knots). Compare AIC with linear model.

No → Linear dose-response meta-regression may suffice

No →

Standard pairwise meta-analysis (no dose-response possible with only two levels)

Warning: 항상 확인하세요. 참조 카테고리가 깨끗합니까? J 곡선 교훈: 오염된 참조 그룹은 환상의 비선형성을 생성합니다.

Module 16 Quiz

Q1. What makes restricted cubic splines useful in dose–response meta-analysis?

A. They always produce a straight line

B. They flexibly capture non-linear dose–response curves

C. 필요한 연구 수를 줄입니다.

D. They simplify the model to fewer parameters

Module 16 Complete

"복용량에 따라 독이 됩니다. 그리고 곡선의 모양을 보면 독이 진짜인지 여부가 드러납니다."

증거가 없다고 해서 부재의 증거는 아닙니다.

모듈 17: 취약성

증거가 없다고 해서 부재의 증거는 아닙니다.

모듈 17: 취약성

🎯 Learning Objectives

취약성 지수 계산 및 해석
GOSH 플롯을 사용하여 영향력 있는 연구 및 하위 집합 식별 효과
Interpret contour-enhanced funnel plots
출판 편향을 위해 Copas 선택 모델과 PET-PEESE를 적용
Understand how sensitivity analyses strengthen meta-analytic conclusions

Governments stockpiled billions

볼 수 없는 증거를 기반으로 합니다.

H1N1 이후 정부는 오셀타미비르(타미플루) 비축량에 수십억 달러를 지출했습니다. Cochrane 팀(Jefferson et al. 2014)은 게시되지 않은 데이터에 접근하기 위해 수년간 싸웠습니다. 마침내 그렇게 했을 때 합병증을 예방할 수 있는 증거는 증발했습니다.

취약성 지수

취약성 지수는 다음을 묻습니다. "How many patients would need to change outcome to flip a statistically significant result to non-significant?" p >까지 더 적은 수의 이벤트가 있는 그룹에 이벤트를 반복적으로 추가합니다(비이벤트를 이벤트로 변환). 0.05.

FI = 1

Extremely fragile. One patient flip changes conclusion.

FI > 8

Reasonably robust. Less sensitive to individual outcomes.

Interactive: Fragility Calculator

Enter a 2×2 table to calculate the fragility index. Watch events shift until significance flips.

Events

Total N

Treatment

Control

Researcher

GOSH Plots

연구 이질성의 그래픽 개요 (GOSH)는 가능한 모든 연구 하위 집합에 메타 분석 모델을 적용합니다. 각 점은 하나의 하위 집합에 대한 통합 효과 대 I²를 표시합니다. 클러스터는 별개의 하위 그룹을 제안합니다. 이상치 구름은 이질성을 유발하는 하나의 연구를 제안합니다.

k개 연구의 경우 2개^k−1 subsets. For k > 15, random sampling is used.

Researcher

Contour-Enhanced Funnel Plots

Standard funnel plots show effect size vs standard error. Contour-enhanced 버전이 있으며 p < 0.01, p < 0.05 및 p < 0.10에 대해 음영 영역을 추가합니다. 누락된 연구가 중요하지 않은 지역에 속한다면 출판 편향이 발생할 가능성이 높습니다. 중요한 영역에 속하는 경우 다른 원인(예: 연구 품질)이 비대칭성을 설명할 수 있습니다.

Methodologist

Copas Selection & PET-PEESE

1

Copas Selection Model

연구가 SE 및 효과 크기의 함수로 출판될 확률을 모델링합니다. 실제 효과와 선택 메커니즘을 공동으로 추정합니다.

2

PET-PEESE

Precision-Effect Test (PET): regress effects on SE. If intercept = 0, no true effect. PEESE uses SE² for better performance when a true effect exists.

Oseltamivir Saga

Roche가 자금을 지원한 원래 메타 분석(Kaiser 2003)에서는 oseltamivir가 인플루엔자 합병증을 67% 감소시키는 것으로 나타났습니다. 그러나 10건의 임상시험 중 8건은 출판된 적이 없습니다. Cochrane이 임상 연구 보고서를 입수한 후 합병증에 대한 이점은 유의하지 않은 11%로 떨어졌습니다.

취약성은 단지 통계적인 것이 아니라 정보 제공용이었습니다. 증거 기반 자체에는 대부분의 데이터가 누락되었습니다.

결정 트리: 취약성 결과 해석

취약성 지수를 계산했습니다. 숫자는 무엇을 의미합니까?

FI ≤ 3

Highly fragile. 몇 가지 다른 사건으로 인해 결론이 반전됩니다. 매우 주의해서 해석하십시오.

FI 4–8

Moderately fragile. 작은 변동에 민감합니다. 이를 바꿀 수 있는 미발표 시험이 있습니까?

FI > 8

Relatively robust. But remember: fragility is only one dimension. Publication bias can undermine even robust results.

Walsh et al. (2014, J Clin Epidemiol)는 상위 저널에 발표된 399개 RCT에서 취약성 지수 중앙값이 8에 불과한 것으로 나타났습니다. 25% 이상이 FI ≤ 3이었습니다. 임상 실습에 영향을 미치는 랜드마크 시험은 종종 통계적 스레드에 의해 중단되었습니다.

Methodologist

Beyond the Index: Structural Fragility

오셀타미비르 사가가 밝혀졌습니다 three types of fragility—취약성 지수는 먼저.

1

Statistical Fragility (FI)

p-값을 뒤집는 이벤트는 몇 개입니까? 이것이 취약성 지수(Fragility Index)가 측정하는 것입니다. 이는 개별 환자 결과에 대한 민감도를 정량화합니다.

2

Informational Fragility

숨겨진 증거는 얼마나 됩니까? Roche oseltamivir 임상시험 10개 중 8개는 출판되지 않았습니다. 증거 기반이 구조적으로 불완전했습니다.

3

Analytical Fragility

연구자의 자유도가 얼마나 되어야 결론을 바꿀 수 있습니까? 다양한 결과 정의, 분석 모집단 또는 통계 방법.

모듈 10(Paroxetine)에 대한 콜백: 다른 결과 정의를 사용한 재분석은 결론을 완전히 뒤집었습니다. 그것은 분석적 취약성이었습니다. 끝점 자체에 대한 논쟁이 있었기 때문에 FI는 계산되지 않았습니다. 완전한 견고성 평가는 세 가지 측면을 모두 검사합니다.

Module 17 Quiz

Q1. 한 시험에는 각 부문에 200명의 환자가 참여하고, 12건의 치료 사례, 25건의 대조 사례가 포함됩니다(p=0.03). 취약성 지수는 3입니다. 이것이 의미하는 바는 무엇입니까?

A. 효과 크기는 정확히 3입니다

B. Changing just 3 patient outcomes would flip the result to non-significant

C. 확증적 연구 3건으로 결과는 매우 확고합니다

D. 연구에는 최소 3명의 환자가 필요합니다

Module 17 Complete

"파괴를 시도할 때마다 살아남는 숫자 신뢰할 만한 가치가 있는 수치입니다."

모든 신호가 진실은 아닙니다.

모듈 18: 형평성

Certainty must be earned, not assumed.

모듈 18: 형평성

🎯 Learning Objectives

Identify how trial exclusion criteria create evidence gaps
PROGRESS-Plus 프레임워크를 적용하여 형평성을 평가합니다. 증거
Use PRISMA-Equity reporting guidelines
Understand transportability: when trial findings fail in practice
Design equity-sensitive search and synthesis strategies

SPRINT proved tight blood pressure control

saves lives. But whose lives?

랜드마크 SPRINT 시험에서는 당뇨병, 이전 뇌졸중 및 심부전 환자를 제외했습니다. 미국 고혈압 환자의 75% 이상이 자격을 갖추지 못했을 것입니다. 근거는 강력했지만 적용 가능성은 좁았습니다.

슬라이드 A: 누락된 다수

대부분의 환자를 제외한 임상시험

SPRINT는 9,361명의 환자를 등록했으며 집중 혈압 조절(목표 <120mmHg)이 심혈관 사건을 다음과 같이 줄인다는 것을 입증했습니다. 25%(HR 0.75, 95% CI 0.64–0.89). 그러나 포함 기준에 따르면 이야기가 달라집니다.

제외 대상:

Diabetes — 고혈압이 있는 미국 성인의 35%
Prior stroke — 고혈압 인구의 8%
Symptomatic heart failure — 6% of hypertensive adults
Expected survival <3 years — 가장 허약한 사람 환자
Nursing home residents — excluded entirely
GFR <20 mL/min — advanced kidney disease

결과: 고혈압이 있는 미국 성인의 75% 이상이 자격을 갖추지 못했습니다. 증거는 강력했습니다. 그러나 누구를 위한 것인가?

슬라이드 B: 증거의 지리학

증거의 출처

78%

of cardiovascular mega-trial participants came from high-income countries (2000–2020).

6%

from sub-Saharan Africa — where cardiovascular disease is rising fastest.

폴리필 시험: 5개 중 4개는 평균 BMI가 25 미만인 인구 집단을 대상으로 실시되었습니다. 미국 평균 BMI는 30입니다. 약물 대사, 동반 질환 패턴, 의료 접근성 및 유전적 변이는 모두 모집단에 따라 다릅니다. Efficacy in one population does not guarantee effectiveness in another.

참고: 다국적 시험 및 PROGRESS-Plus 격차

PROGRESS-Plus Framework

P

Place of residence

R

Race / ethnicity

O

Occupation

G

Gender / sex

R

Religion

E

Education

S

SES (socioeconomic)

S

Social capital

Plus: Age, disability, sexual orientation, other vulnerable groups.

Researcher

PRISMA-Equity & Transportability

PRISMA-Equity PRISMA를 확장하여 검토에서 형평성이 어떻게 다루어졌는지에 대한 보고를 요구합니다(인구 특성, 불리한 부분에 따른 하위 그룹 분석, 소외 계층에 대한 적용 가능성 평가). 모집단.

Transportability: 시험 유효성은 실제 효과와 동일하지 않습니다. 대상 모집단 분포와 일치하도록 시험 데이터에 가중치를 다시 부여하는 방법이 있습니다.

슬라이드 C: 이동 가능성 질문

Researcher

From Trial to Real World: Transportability

Transportability = 시험 모집단 X의 결과를 대상 모집단 Y에 적용할 수 있습니까? 이것은 철학적인 질문이 아니며 공식적인 방법이 있습니다.

1

Inverse Probability of Participation Weighting (IPPW)

Re-weights trial participants so they resemble the target population on key covariates.

2

Generalizability Index

관찰된 특성에 대해 시험 표본이 대상 모집단과 얼마나 유사한지를 정량화합니다.

Stuart et al. (2015, Stat Med): 미국 고혈압 인구와 일치하도록 SPRINT 결과의 가중치를 다시 적용했을 때 추정 이익은 HR 0.82(시험 중 0.75 대비)로 약화되었습니다. 치료는 여전히 효과가 있습니다. 그러나 인구가 변하면 크기도 변합니다.

SPRINT 및 누락된 다수

SPRINT는 9,361명의 환자를 대상으로 잘 설계된 임상시험이었습니다. 그 발견(집중 혈압 조절과 표준 혈압 조절의 경우 HR 0.75)은 전 세계적으로 지침을 변경했습니다. 그러나 후속 분석에서는 시험 모집단과 가장 유사한 하위 그룹에서 이점이 가장 강했으며 제외 그룹에서는 불확실한 것으로 나타났습니다.

근거 종합의 형평성은 단순히 "효과가 있습니까?"라는 질문을 의미하는 것이 아닙니다. 그러나 "누구에게 효과가 있습니까?"

결정 트리: 검토를 위한 형평성 평가

ROOT: 검토의 증거가 대상과 유사한 모집단에서 나왔습니까?

YES → Good. But check: Are subgroups (age, sex, ethnicity, SES) reported separately?

Yes: Use subgroup effects for population-specific recommendations
No: Flag as limitation — equity gap in reporting

NO → Does PROGRESS-Plus analysis reveal differential effects?

Yes: Population-specific recommendations needed. Consider transportability re-weighting.
No: Cautious generalization with explicit equity statement in discussion

슬라이드 E: 모듈 3에 대한 콜백

Methodologist

Callback: The HRT Lesson Revisited

모듈 3을 기억하십니까? HRT 이야기에 따르면 healthy-user bias 해로운 치료법이 유익한 것으로 나타났습니다. SPRINT에는 정반대의 문제가 있을 수 있습니다. "건강한 자원봉사자" 효과로 인해 효과적인 치료가 나타날 수 있습니다. more effective than it would be in the real world.

모든 메타 분석은 다음과 같이 질문해야 합니다. 누가 포함되었습니까? 누가 제외됐나요? 그게 문제가 되나요?

Module 18 Quiz

Q1. What does the PROGRESS-Plus framework help reviewers assess?

A. Statistical heterogeneity

B. Equity and applicability across disadvantaged populations

C. 포함된 연구의 내부 타당성

D. 증거의 전반적인 확실성

Module 18 Complete

"취약한 사람들을 배제하는 증거는 그들에게 도움이 된다고 주장할 수 없습니다."

모든 신호가 진실은 아닙니다.

모듈 19: 기계

출처가 없는 숫자는 숫자가 아닙니다.

모듈 19: 기계

🎯 Learning Objectives

Describe how AI/ML is used in systematic review screening
Explain active learning and human-in-the-loop workflows
Assess automation validation: recall, workload savings, and risk
알고리즘 심사의 한계와 편향을 인식
증거에 책임감 있는 AI 사용을 위한 프레임워크 적용 합성

When COVID-19 hit,

papers arrived faster than humans could read.

2021년까지 300,000개 이상의 코로나19 논문이 존재했습니다. Cochrane은 기계 학습 분류기를 사용하여 신속한 검토를 위한 연구를 분류하여 95% 이상의 재현율을 유지하면서 선별 작업량을 최대 70% 줄였습니다.

The Flood

By April 2020, 4,000 COVID preprints appeared every week.

PubMed indexed 500 new COVID articles per day.

Cochrane's screening queue hit 10,000 unreviewed titles.

🔍 불가능한 수학

A pair of reviewers screens ~200 titles per day.

At 500 new articles/day, they fell further behind with every hour.

살아있는 검토가 시작되기 전에 죽어가고 있었습니다.

첫 번째 시도

아이디어는 새로운 것이 아닙니다. Cohenet al. (2006, JAMIA)는 처음으로 머신러닝이 선별 작업량을 50%까지 줄일 수 있음을 보여주었습니다. 즉, 회상 손실은 5% 미만입니다.

📅

2006: Cohen et al. — SVM classifiers for drug class reviews. Proof of concept.

📅

2016: RobotReviewer (Marshall et al., JMLR) — ML for risk of bias assessment. Inter-rater reliability comparable to human reviewers.

📅

2021: ASReview (van de Schoot et al., Nature Machine Intelligence) — active learning that simulated 95% workload reduction.

그러나 시뮬레이션은 현실이 아닙니다. 코로나19는 최초의 대규모 테스트가 될 것입니다.

AI in Systematic Reviews

1

Screening Prioritization

Active learning ranks citations by relevance. Reviewers screen the most likely relevant first.

2

데이터 추출 지원

NLP는 PICO 요소, 결과 및 결과를 추출합니다. 항상 사람의 확인이 필요합니다.

3

Risk of Bias Assessment

ML classifiers predict RoB domains. Experimental—human judgment remains gold standard.

Researcher

Validating Automation

Recall

>95% required. Missing 1 study can change conclusions.

WSS@95%

Work Saved over Sampling at 95% recall.

Stopping

When to stop screening? Consecutive irrelevant threshold.

근본적인 긴장감: 자동화는 시간을 절약하지만 새로운 오류 원인을 발생시킵니다. 항상 도구, 버전, 교육 데이터 및 중지 기준을 보고하십시오.

검증 위기

🔍 검증의 역설

기계가 관련 연구를 놓쳤는지 확인하려면 you need a human to screen everything.

But if humans screen everything, 왜 기계?

The solution: prospective holdout validation.

Random 10% sample screened by both human and machine
비교: 인간이 발견한 것을 기계가 놓쳤습니까?
If recall drops below 95%, retrain and expand human screening

신뢰하되 검증하십시오. 기계는 역할을 수행하지만 상속받지는 않습니다.

Cochrane's COVID Response

Cochrane은 수백만 개의 기록에 대해 훈련된 기계 학습 분류기를 사용하여 코로나19 연구 기록부를 구축했습니다. 시스템은 수동 검사를 몇 주에서 며칠로 줄이면서 99% 감도를 달성했습니다.

그러나 기계는 대체 도구가 아니라 도구였습니다. 포함된 모든 연구는 여전히 검토자에 의해 검증되었습니다. 교훈: AI는 리뷰어를 대체하는 것이 아니라 보강합니다.

거의 발견되지 않았던 연구

2020년 6월 RECOVERY 임상시험에서 덱사메타손 결과가 발표되었습니다.the first treatment proven to reduce COVID mortality (28-day mortality: 22.9% vs 25.7%, RR 0.83).

사전 인쇄본은 비표준 제목으로 medRxiv에 게재되었습니다. 이와 같은 시나리오는 팬데믹 기간 동안 반복적으로 발생했습니다. 기존 용어에 대해 훈련된 ML 분류자는 익숙하지 않은 프레이밍을 낮은 순위로 매겼습니다.

여러 실제 리뷰에서 플래그가 지정된 제목을 스캔하는 인간 리뷰어는 주요 약물 이름을 인식하고 분류자가 우선순위를 낮춘 확대된 연구를 인식했습니다.

그 인간이 없었다면 획기적인 치료 결과가 생명체에 전달되기까지 몇 주가 걸렸을 것입니다. 검토하세요.

컴퓨터가 더 빨리 읽습니다. 인간은 더 깊이 읽습니다. 둘 다만으로는 충분하지 않습니다.

Decision Tree: When Should You Use AI?

귀하의 리뷰에서는 5,000개 이상의 제목을 심사하게 됩니까?

Yes → Consider AI-assisted screening

Active learning prioritization. Dual-screen random 10% holdout. Stop when 3 consecutive batches yield 0 relevant studies.

Report: classifier type, training data, recall on holdout, stopping rule.

No → Manual screening is feasible

For <5,000 titles, dual human screening remains gold standard. AI adds complexity without proportionate benefit.

실시간 리뷰인가요, 아니면 빠른 리뷰인가요?

If yes → AI is especially valuable. Continuous classifier retraining on new evidence. But: 기계가 최종 포함 결정을 내리지 마세요.

패턴 반복

Methodologist

패턴 반복

모듈 6을 기억하시나요? Poldermans는 10년 동안 수술 전후 베타 차단제 지침을 안내한 DECREASE 데이터를 조작했습니다.

AI can now detect statistical anomalies automatically:

GRIM test: 보고된 평균이 정수 표본 크기와 일치합니까?
SPRITE: 보고된 요약 통계를 그럴듯한 개별 데이터에서 재구성할 수 있습니까?
Statcheck: Do reported p-values match the test statistics?

이러한 도구는 이상을 발견했습니다. in hundreds of published papers—faster than any human auditor.

그러나 기계 플래그입니다. 인간 판사들. 철회 결정은 근본적으로 인간의 몫입니다.

Module 19 Quiz

Q1. 체계적 검토에서 AI 지원 선별에 대해 허용 가능한 최소 회상은 얼마입니까?

A. 80%

B. 90%

C. >95%

D. 100%

Module 19 Complete

"기계는 더 빨리 읽습니다. 인간은 더 깊이 읽습니다. 함께 진실을 읽습니다."

모든 신호가 진실은 아닙니다.

모듈 20: 질적

방법은 환자를 신뢰로부터 보호합니다.

모듈 20: 질적

🎯 Learning Objectives

Explain why some questions require qualitative evidence synthesis
Describe meta-ethnography (Noblit & Hare) and thematic synthesis
Apply the CERQual framework to assess confidence in qualitative findings
Understand mixed-methods synthesis approaches
Recognize when qualitative evidence changes practice

WHO가 질문했습니다

어떤 RCT도 할 수 없습니다 답변.

전 세계 여성들이 출산 중 무례함과 학대를 경험하는 이유는 무엇입니까? Bohrenet al. (2015)는 34개국의 65개 질적 연구를 학대의 7개 영역 프레임워크로 종합했습니다.

슬라이드 A: 무작위화를 넘어서는 질문

무작위화를 넘어서는 질문

2014년 WHO는 여성이라는 글로벌 위기를 해결하기 위해 패널을 소집했습니다. 신체적 학대, 언어적 굴욕, 출산 중 돌봄 거부를 당했습니다. 이는 드문 사건이 아니었습니다. 보고서는 34 countries.

They needed to understand WHY. What drives disrespect and abuse in maternity care?

에서 나왔습니다. 이에 대한 RCT는 답변할 수 없습니다. 여성을 학대적인 치료와 존중하는 치료로 무작위로 분류할 수는 없습니다. 조산사의 눈을 멀게 할 수는 없습니다. 리커트 척도에서는 "품위"를 측정할 수 없습니다. 증거는 정성적이어야 합니다.

Meta-Ethnography

Developed by Noblit & Hare (1988), meta-ethnography translates 숫자를 집계하는 것이 아니라 연구 전반에 걸쳐 개념을 적용해야 합니다. 1차(참여자 인용) 및 2차(저자 해석) 데이터로부터 새로운 해석 프레임워크(3차 구성)를 생성합니다.

Reciprocal

연구는 서로를 확인합니다

Refutational

연구는 서로 모순됩니다

Line of
argument

연구는 새로운 것을 구축합니다 이론

What Bohren Found: A Taxonomy of Mistreatment

1. Physical abuse

Hitting, pinching, slapping during labor

2. Sexual abuse

Inappropriate touching, non-consensual procedures

3. Verbal abuse

Shouting, threats, judgmental comments

4. Stigma & discrimination

Based on HIV status, ethnicity, age, poverty

5. Professional standards failure

Neglect, lack of informed consent

6. Poor rapport

Poor communication, dismissiveness

7. Health system conditions

Overcrowding, understaffing, lack of supplies

65개 연구. 34개국. 언어, 문화, 시스템 전반에 걸쳐 동일한 패턴이 반복되었습니다. 이것은 일화가 아니었습니다. 이는 종합된 증거입니다.

Researcher

CERQual: 정성적 증거에 대한 신뢰도

CERQual assesses confidence in qualitative review findings across four components:

1

Methodological Limitations

기여 연구의 품질

2

Coherence

데이터가 결과를 얼마나 잘 뒷받침하는지.

3

Adequacy

데이터의 풍부함(단순한 숫자가 아님) 연구).

4

Relevance

검토 질문 맥락에 대한 적용 가능성.

슬라이드 C: 증거에서 조치로

When Qualitative Evidence Changes Practice

Bohren's synthesis informed the WHO's 2018 Recommendations on Intrapartum Care for a Positive Childbirth Experience. Specific changes grounded in qualitative evidence:

Rec. 15

Companionship during labor

Rec. 1

Respectful maternity care

Rec. 3

Effective communication

Rec. 12

Emotional support

질적 증거에 근거한 이러한 권장 사항은 이제 194개 WHO 회원국의 산모 관리에 지침이 됩니다. 어떤 산림 음모도 그것을 생산할 수 없었습니다. 어떤 I² 통계도 이를 밝혀낼 수 없었습니다.

Bohren's Framework of Mistreatment

2015년 질적 종합에서는 신체적 학대, 성적 학대, 언어적 학대, 낙인 및 차별, 전문적 기준 충족 실패, 낮은 관계 형성, 의료 시스템 조건 등 7가지 영역을 식별했습니다. 이 프레임워크는 분만 중 관리에 대한 WHO 권장 사항(2018)에 정보를 제공했습니다.

어떤 p-값도 분만 중에 뺨을 맞는 경험을 포착할 수 없습니다. 질적 합성은 숫자가 표현할 수 없는 것을 표현했습니다.

Decision Tree: When Is Qualitative Synthesis Appropriate?

ROOT: 당신의 연구 질문이 경험, 인식, 장벽 또는 촉진 요인에 관한 것입니까?

YES → 질문이 WHETHER가 아니라 방법이나 이유에 관한 것입니까?

Yes: Qualitative evidence synthesis (meta-ethnography, thematic synthesis, or framework synthesis)
No: 혼합 방법 고려: 효과에 대한 정량적 + 메커니즘에 대한 정성

NO → 효율성/효용성에 대한 질문이 있습니까?

Yes: Quantitative meta-analysis
But: 구현 장벽에 대한 정성적 검토로 보완(CERQual 평가)

Key insight: 가장 강력한 체계적 검토는 둘 다에 답합니다: 작동합니까? (정량적) AND 그것이 작동하거나 실패하는 이유는 무엇입니까? (정성적)

Module 20 Quiz

Q1. What distinguishes meta-ethnography from quantitative meta-analysis?

A. 3~5개의 연구만 포함됩니다

B. It translates concepts across studies rather than pooling numbers

C. It does not require a systematic search

D. It is less rigorous than quantitative synthesis

Module 20 Complete

"중요한 모든 것을 셀 수 있는 것은 아닙니다. 계산된 모든 것이 중요한 것은 아닙니다."

이질성은 노이즈가 아니라 메시지입니다.

모듈 21: 다변량

이질성은 노이즈가 아니라 메시지입니다.

모듈 21: 다변량

🎯 Learning Objectives

연구 내 결과가 상관관계가 있는 경우 인식
Explain multivariate random-effects models
Apply robust variance estimation (RVE) for dependent effect sizes
중첩에 대한 3단계 모델 이해 데이터
Choose between multivariate approaches based on data structure

Cardiovascular trials report

사망률, 심근경색, 뇌졸중 등

이러한 결과는 환자 내에서 서로 연관되어 있습니다. 사망한 환자는 MI 엔드포인트를 가질 수 없습니다. 표준 메타 분석은 의존성과 이중 계산 가능성이 있는 증거를 무시하고 각 결과를 독립적으로 처리합니다.

슬라이드 A: 편리한 거짓말

아무도 질문하지 않는 가정

표준 메타 분석 교과서를 엽니다. 모델은 각 연구가 기여한다고 가정합니다. one independent effect size. But reality is different.

단일 심혈관 임상시험에서 사망률, 심근경색, 뇌졸중 및 혈관재생이 보고되었습니다. 단일 심리 치료 연구에서는 3개월, 6개월, 12개월의 우울증, 불안 및 삶의 질이 보고되었습니다.

30 trials

× 4 outcomes

= 120

effect sizes

Most analysts either: (a) treat all 120 as independent (inflating precision by a factor of √4), or (b) 한 가지 결과를 선택하고 나머지는 폐기합니다. 두 접근 방식 모두 잘못되었습니다.

의존성 문제

In standard pairwise meta-analysis, each study contributes one effect size. But many studies report multiple outcomes, subgroups, timepoints, or arms—creating dependent 효과 크기. 이를 무시하면 정밀도가 부풀려지고 추론이 왜곡됩니다.

RVE

Robust Variance Estimation. Sandwich estimator handles unknown correlation.

3-Level

Study → Outcome nesting modeled explicitly.

Researcher

Robust Variance Estimation

RVE (Hedges, Tipton & Johnson, 2010) uses a sandwich-type 종속 효과 간의 실제 상관 관계에 관계없이 유효한 표준 오류를 제공하는 추정기입니다. 연구 내 상관관계를 알거나 추정할 필요가 없습니다. 20개 이상의 연구에 적합합니다.

Small-sample correction: Tipton & Pustejovsky(2015)는 군집 수가 적을 때 Satterthwaite 자유도를 사용하여 RVE에 대한 소표본 수정(CR2)을 개발했습니다.

슬라이드 B: 수학적 진실

Researcher

What Dependence Does to Your Confidence Intervals

결과가 4개인 경우 동일한 연구에서 연구 내 상관 관계가 ρ = 0.5:

Treating as independent

CI width = X

의존성 설명

CI width = 1.58X

신뢰 구간은 58% wider이어야 합니다. 이를 무시한 모든 메타 분석은 정확하지 않은 결과를 발표했습니다.

RVE (Hedges, Tipton & Johnson, 2010): Uses a “sandwich” variance estimator that produces correct standard errors without needing to know the exact within-study correlation.

Researcher

Three-Level Models: Making Structure Explicit

1

Level 1: Sampling Variance

Measurement error within each effect size estimate.

2

Level 2: Within-Study Variance

결과와 시점은 단일 연구 내에서 다양합니다.

3

Level 3: Between-Study Variance

연구는 모집단, 환경, 방법이 서로 다릅니다.

Example: 우울증에 대한 심리 치료에 대한 메타 분석(k=50개 연구, 180개 효과) 크기), 35% 의 변동은 연구 내(다른 결과)였으며 65% 연구 간(다른 치료법, 모집단)이었습니다. 이 분해는 효과가 중첩될 때(예: 연구 내의 여러 결과 또는 연구 그룹 내의 연구) 이질성이 얼마나 많은지 보여줍니다. within vs between studies.

Methodologist

Three-Level Models: Formal Framework

효과가 중첩된 경우(예: 연구 내 여러 결과 또는 연구 내 연구 그룹), three-level model 분산을 (1) 샘플링 분산(레벨 1), (2) 연구 내 분산(레벨 2), (3) 연구 간 분산(레벨 3)으로 분할합니다. 이는 수준 전반에 걸쳐 강도를 차용하면서 정확한 추론을 유지합니다.

심혈관 문제

스타틴에 대한 메타 분석에는 사망률, MI, 뇌졸중 및 혈관 재개통을 보고하는 30개의 시험이 포함될 수 있습니다. 이는 30개 클러스터의 120개 효과 크기입니다. 이를 120개의 독립적인 추정치로 취급하면 연구 내 상관 관계와 관련된 요소로 정밀도가 높아집니다.

RVE or multivariate models handle this correctly—producing wider, honest confidence intervals.

Decision Tree: Which Approach for Dependent Effect Sizes?

ROOT: 메타 분석이 연구별로 다양한 효과를 가집니까?

YES → 연구 내 상관 관계를 알고 있습니까(또는 추정할 수 있습니까)?

Yes: Multivariate random-effects model (most efficient)
No: RVE with small-sample correction (robust to unknown correlations)

NO → Standard univariate random-effects model

Sub-question: 다양한 결과에서 오는 여러 효과가 있습니까? 또는 하위 그룹?

Different outcomes → Three-level model or RVE with clustering
Different timepoints → Network of timepoints with temporal correlation
Different subgroups → Consider if subgroups are meaningful or should be averaged

Module 21 Quiz

Q1. What problem does Robust Variance Estimation (RVE) solve?

A. Publication bias

B. 동일 연구의 여러 효과 크기 간 종속성

C. Between-study heterogeneity

D. Small-study effects

Module 21 Complete

"결과가 얽혀 있을 때 독립적인 척하는 것은 편의상 거짓말입니다."

출처가 없는 숫자는 숫자가 아닙니다.

모듈 22: 증명

출처가 없는 숫자는 숫자가 아닙니다.

모듈 22: 증명

🎯 Learning Objectives

Understand how computational errors propagate through policy
재현성을 정의하고 복제성과 구별
증거 해싱 및 증명 전달 적용 숫자
Use reproducibility checklists for meta-analysis
사전 등록과 공개 데이터의 역할을 인식

A graduate student opened a spreadsheet

긴축 시대가 오류 위에 세워졌다는 사실을 발견했습니다.

2010년에 Reinhart와 Rogoff는 GDP 대비 부채 비율이 90%를 넘는 국가가 마이너스 성장을 했다고 주장했습니다. 이는 유럽 전역의 긴축 정책에 영향을 미쳤습니다. 2013년 Thomas Herndon은 평균에서 5개국을 제외하는 Excel 오류를 발견했습니다. 수정된 결과: 붕괴되지 않은 적당한 양의 성장.

Reproducibility vs Replicability

Reproducible

Same data + same code = same result

Replicable

새로운 데이터 + 동일한 방법 = 일관된 결과

Reproducibility is the minimum standard. 다른 사람들이 귀하가 보고한 데이터로부터 통합 추정치를 재현할 수 없는 경우 분석을 검증할 수 없습니다. 메타 분석은 추출된 데이터, 분석 스크립트, 소프트웨어 버전 및 무작위 시드를 공유해야 합니다.

Researcher

Proof-Carrying Numbers

Every number in a meta-analysis should carry its provenance: 출처, 변환 방법, 생성된 코드 Evidence hashing creates a cryptographic fingerprint of inputs so any change (accidental or deliberate) is detectable.

SHA

Input Hash

추출된 데이터의 SHA-256 해시. 하나의 셀이 변경되면 해시도 변경됩니다. 출처 체인: 데이터 → 코드 → 결과 → 해시

Interactive: Reproducibility Checklist

메타 분석의 재현성을 평가하려면 각 항목을 선택하세요. 귀하의 리뷰 점수는 어떻습니까?

경제를 변화시킨 Excel 오류

Reinhart-Rogoff의 "부채 시대의 성장"은 의회 증언, 유럽 위원회 보고서 및 IMF 정책 브리핑에서 인용되었습니다. Excel 오류(AVERAGE 공식에서 30~34행이 제외됨)는 호주, 오스트리아, 벨기에, 캐나다, 덴마크 등 5개 국가가 누락되었음을 의미합니다.

수정된 평균은 -0.1%에서 +2.2%로 변경되었습니다. 긴축 정책은 수백만 명에게 영향을 미쳤습니다. 재현성은 학문적 완벽주의가 아니라 재앙에 대비한 안전 장치입니다.

Remember Module 5?

DECREASE Through the Lens of Reproducibility

Don Poldermans의 DECREASE 임상시험은 데이터 조작으로 인해 철회되었습니다. 해시된 입력, 출처 체인, 검증된 계산 등 증거 전달 번호가 존재했다면 위조가 감지되었을 것입니다 before 증거가 메타 분석에 입력되고 수술 지침이 변경되었습니다.

Module 22 Quiz

Q1. Reinhart-Rogoff 오류는 무엇입니까?

A. They used too small a sample

B. An Excel formula excluded 5 countries, reversing the conclusion

C. They studied the wrong time period

D. They used the wrong statistical test

Module 22 Complete

"출처가 없는 번호는 숫자가 아닙니다. 재현성이 없는 분석은 증거가 아닙니다."

Certainty must be earned, not assumed.

모듈 23: 첫 번째 메타 스프린트

Certainty must be earned, not assumed.

모듈 23: 첫 번째 메타 스프린트

🎯 Learning Objectives

40일 체계적 검토 워크플로우 이해
Map the Seven Principles to real practice phases
Recognize Definition-of-Done (DoD) gates as quality checkpoints
Appreciate why structure prevents the failures you've studied
Graduate ready to conduct (not just understand) meta-analysis

여정 완료

당신은 이야기를 배웠습니다.

이제 길을 걸어야 합니다.

당신이 연구한 모든 증거 반전은 팀 때문에 일어났습니다. knew 방법이 있지만 그렇지 않았습니다. follow them systematically.

META-SPRINT 프레임워크

5단계 게이트가 있는 40일 구조의 워크플로입니다. 각 게이트는 품질이 보장될 때까지 진행을 방해하는 DoD(완료 정의) 체크포인트입니다.

40

Days to Completion

5

DoD Phase Gates

Day 34

Hard Freeze

Why 40 days? 엄격할 만큼 길고 범위 변동을 방지할 만큼 짧습니다. 로시글리타존 심장 신호는 투명성을 요구하는 기한이 없었기 때문에 수년 동안 묻혀 있었습니다.

5개 관문

5개 단계 관문

A

DoD-A: Protocol Lock (Days 1-3)

PICOS defined, timepoint rules set, model choices pre-specified. No moving target.

B

DoD-B: Search Lock (Days 6-10)

All databases searched, grey literature checked, PRESS validated. No hidden studies.

C

DoD-C: Extraction Lock (Days 10-28)

Dual extraction, provenance linked, RoB assessed. No fabricated numbers.

The Five Phase Gates (continued)

D

DoD-D: 분석 잠금(21~33일)

Forest plots generated, sensitivity analyses run, heterogeneity explored. No cherry-picking.

E

DoD-E: Submission Lock (Days 33-40)

GRADE certainty rated, clinical summary written, manuscript finalized. No overconfidence.

Day 34 Freeze: 새로운 연구를 추가할 수 없습니다. 이는 업계가 계속해서 유리한 연구를 "발견"하는 BMP 척추 수술 메타 분석을 괴롭혔던 "무기화된 범위 변동"을 방지합니다.

실천의 7가지 원칙

Every principle you learned maps to a specific phase gate:

DoD-A "모든 신호가 진실은 아닙니다" — 무엇으로 간주되는지 미리 지정합니다. 증거

DoD-B "What was hidden in plain sight?" — Search comprehensively

DoD-C "출처가 없는 숫자는 숫자가 아닙니다" — 모든 데이터 포인트를 연결하세요

DoD-D "이질성은 잡음이 아니라 메시지입니다" — Investigate, don't ignore

DoD-E "Certainty must be earned, not assumed" — GRADE everything

레드팀 원칙

당신의 팀은 작업을 중단하세요.

매일 순환하는 두 명의 팀원이 적으로 데이터 품질을 확인하는 데 12분을 소비합니다. 친절한 검토가 아닌 회의적인 확인을 통해 채용이 불가능하다는 사실을 확인하여 Boldt의 사기 행위를 적발했습니다.

CondGO: When Things Go Wrong

What happens when you discover a critical problem mid-sprint?

CondGO = Conditional Go

A bounded rescue protocol. You have exactly 72 hours 허용된 조치만 사용하여 문제를 해결합니다. 문제를 해결할 수 없으면 검토를 중단해야 합니다.

📖 Avandia 강의: GSK는 2000년에 심혈관 신호를 확인했지만 강제 기한이 없었습니다. 그들은 7년 동안 “보고 기다렸습니다”. 수만 명이 피해를 입었습니다. CondGO는 "결국 처리할 것"이 사람을 죽이기 때문에 존재합니다.

이 과정을 이야기로 시작하셨습니다.

실습할 준비가 되어 끝납니다.

META-SPRINT 워크플로는 배운 모든 것을 가져와서 실패를 방지하는 40일 시스템으로 구성합니다.

실제 체계적 검토를 수행할 준비가 되면 META-SPRINT 애플리케이션을 엽니다. 여기에서 배운 이야기가 모든 단계에서 알림으로 나타나 안내할 것입니다.

스토리: CTT 협업 — 방법이 수백만 달러를 절약할 때

What does it look like when every principle is followed?

REAL DATA

콜레스테롤 치료 실험자(CTT)의 협업은 메타 분석의 표준입니다. 170,000명 이상의 참가자로부터 개별 환자 데이터 across 26 statin trials. Pre-specified protocol. IPD from all major trials. Standardized outcomes. Result: statins reduce major vascular events by 21% per mmol/L LDL reduction (RR 0.79, 95% CI 0.77-0.81), regardless of baseline risk. This finding, replicated across 15년간 5번의 메타 분석, has prevented an estimated millions of heart attacks and strokes worldwide.

적용된 7가지 원칙

CTT 사례는 이 과정의 모든 원칙을 따를 때 어떤 일이 일어나는지 보여줍니다. 대안을 고려해 보십시오.

경로 A: 원칙 없음

No protocol. Published data only. No RoB. No heterogeneity investigation. No GRADE.

↓

Conflicting small trials. Statin controversy persists. Millions untreated.

OUTCOME: Preventable cardiovascular deaths continue

경로 B: CTT 방식

사전 등록된 프로토콜. 모든 시험의 IPD. 표준화된 결과. 투명한 방법. GRADE 높은 확실성.

↓

확실한 답변입니다. 글로벌 지침이 변경됩니다. 혜택을 받는 사람들에게는 스타틴을 처방합니다.

OUTCOME: Millions of lives saved by rigorous evidence synthesis

THE REVELATION

이 과정의 모든 원칙은 그것이 없으면 해를 끼쳤기 때문에 존재합니다. CTT Collaboration은 방법이 엄격할 때, 데이터에 출처가 있을 때, 편견이 평가되고 확실성이 있을 때 메타 분석이 의학에서 가장 강력한 도구가 된다는 것을 증명합니다. 당신은 이제 이러한 원칙을 가지고 있습니다. 사용하세요.

Capstone Quiz

1. META-SPRINT에서 34일차 "강제 동결"의 목적은 무엇인가요?

A. 동료 검토 시간을 허용하기 위해

B. 늦게 추가된 연구 결과가 조작되는 것을 방지하려면

C. To speed up publication

D. 저널 마감일에 맞춰

2. The CondGO protocol gives teams how long to fix critical problems?

A. 24 hours

B. 48 hours

C. 72 hours

D. 1 week

3. Red-team adversarial QA caught Joachim Boldt's fraud by noticing:

A. Impossible patient recruitment rates

B. p-hacking in statistical tests

C. Inconsistent effect sizes

D. Whistleblower testimony

당신이 배운 이야기는 역사가 아닙니다.

이는 향후 작업을 보호하는 경고입니다.

첫 번째 메타 분석을 수행할 때
remember CAST before you trust a signal,
remember Poldermans before you skip provenance,
퍼널을 무시하기 전에 Reboxetine을 기억하세요.

이제 준비가 되었습니다. 구조로 가십시오. 겸손하게 가십시오. 7가지 원칙을 따르세요.

모든 신호가 진실은 아닙니다.

모듈 24: 최종 시험

Certainty must be earned, not assumed.

Final Examination

Final Exam: Part 1 of 2

메타 분석 원칙에 대한 숙달도를 테스트하세요. 각 질문은 과정의 핵심 개념을 다룹니다.

Q1. 연구원이 "운동이 건강에 미치는 영향"을 연구하려고 합니다. 이 연구 질문의 주요 문제점은 무엇입니까?

A. It lacks randomization

B. Sample size is too small

C. It is not answerable—lacks specific PICO elements

D. It lacks ethical approval

Q2. 깔때기 도표는 왼쪽 하단 영역에 연구가 누락되어 뚜렷한 비대칭성을 보여줍니다. 이것이 시사하는 바는 무엇입니까?

A. Large studies have more precise estimates

B. 소규모 부정적인 연구는 출판되지 않았을 가능성이 높습니다.

C. The true effect is stronger than estimated

D. Random sampling error

Q3. 메타 분석에 따르면 I² = 85% 및 τ² = 0.42입니다. 가장 적절한 해석은 무엇입니까?

A. There is an 85% chance of a true effect

B. The effect size is very large

C. Substantial between-study variance exists; investigate sources

D. 결과는 임상적으로 중요합니다

Q4. GRADE에서 무작위 대조 시험의 증거에 대한 시작 확실성은 무엇입니까?

A. High

B. Moderate

C. Low

D. Very low

Q5. In RoB 2.0, which domain assesses whether outcome assessors knew the treatment allocation?

A. D1: Randomization process

B. D2: 의도된 개입에서 벗어남

C. D3: 결과 누락 data

D. D4: 결과 측정

Final Exam: Part 2 of 2

Q6. CAST 연구에서는 항부정맥제가 부정맥을 억제하면서도 사망률을 높이는 것으로 나타났습니다. 다음은 다음의 예입니다.

A. Random sampling error

B. Surrogate outcome failure

C. Confounding by indication

D. Reverse causation

Q7. When should a random-effects model be preferred over a fixed-effect model?

A. When sample sizes are large

B. 결과가 이분법적인 경우

C. When between-study heterogeneity is expected

D. When publication bias is suspected

Q8. According to ICEMAN criteria, which makes a subgroup analysis MORE credible?

A. Hypothesis specified a priori

B. Large number of subgroups tested

C. No biological rationale

D. Inconsistent effects across trials within subgroup

Q9. What assumption must be checked in network meta-analysis to ensure valid indirect comparisons?

A. All studies have equal sample sizes

B. 모든 연구가 동일한 결과를 측정합니다

C. Transitivity (consistency of effect modifiers)

D. Double-blinding in all trials

Q10. TSA(시험 순차 분석)에서 무용성 경계를 넘으면 무엇을 의미합니까?

A. 치료로 인해 발생하는 현상 피해

B. 추가 연구에서는 의미 있는 효과가 나타날 가능성이 낮습니다.

C. 증거는 이점에 대해 결정적입니다

D. 메타 분석은 설득력이 부족합니다

Part 1 Complete — continue to Part 2 (Advanced Modules)

파트 2: 고급 모듈 질문 (Q11-Q25)

Final Exam: Part 2 of 2 (Advanced)

Questions 11–25 cover Modules 13–22 (Bayesian, NMA, IPD, Dose-Response, Fragility, Equity, AI, Qualitative, Multivariate, Reproducibility).

Q11. 베이지안 메타 분석에서 많은 연구에서 모호한 사전 분석을 사용하면 어떻게 되나요?

A. 후반이 빈도주의 결과와 밀접하게 일치합니다

B. 사전이 사후를 지배합니다

C. The credible interval becomes infinitely wide

D. 모델이 실패함 수렴

Q12. Cipriani의 항우울제 NMA에서는 단일 약물이 "승자"로 선언되지 않은 이유는 무엇입니까?

A. 연구가 너무 적음

B. Different drugs ranked best on different outcomes

C. 간접적 증거가 없음

D. SUCRA는 불가능함 계산됨

Q13. 왜 하나의 대규모 시험에서처럼 IPD를 통합하면 안 되나요?

A. IPD always has fewer studies than aggregate

B. 연구 클러스터링을 무시하고 혼란을 초래합니다

C. 이벤트 발생 시간 데이터를 처리할 수 없습니다

D. Binary outcomes cannot be pooled

Q14. What caused the alcohol "J-curve" to disappear in Stockwell's reanalysis?

A. 이점

B. 이전 음주자는 금주 참조 그룹에서 올바르게 제거되었습니다

C. 표본 크기가 증가했습니다

D. 혼란 요인에 대한 더 나은 조정

Q15. 오셀타미비르 사가에서 Cochrane은 미발표 임상에 액세스할 때 무엇을 발견했습니까? 연구 보고서?

A. 약물의 효과가 전혀 없었습니다

B. 처음 생각했던 것보다 효과가 컸습니다

C. 합병증에 대한 이점이 크게 사라졌습니다

D. Side effects were more common than reported

Q16. 미국 고혈압 환자 중 몇 퍼센트가 SPRINT 시험에 적합하지 않습니까?

A. About 25%

B. About 50%

C. Over 75%

D. Nearly 100%

Q17. Why is AI considered an "augmenter" rather than a "replacer" in systematic reviews?

A. AI is slower than human reviewers

B. AI has perfect recall

C. AI screens fast but cannot make human-level contextual judgments

D. AI is too expensive for most reviews

Q18. What does the "adequacy" component of CERQual assess?

A. 연구 수 만

B. 결과를 뒷받침하는 데이터의 풍부함과 양

C. 연구 전반에 걸친 결과의 일관성

D. Generalizability to other populations

Q19. A meta-analysis includes 30 statin trials, each reporting 4 correlated outcomes (120 effect sizes). Which approach is correct?

A. Treat all 120 as independent effect sizes

B. Use RVE with small-sample correction

C. Pick only one outcome per study

D. 각 연구 내 4개 결과의 평균

Q20. Reinhart-Rogoff 오류에서 수정된 평균 성장률은 얼마였습니까? 부채가 많은 국가인가요?

A. −0.1% (same as claimed)

B. +2.2%

C. 0%

D. +5%

Passing Score: 15/20 across both parts

관련 모듈로 돌아가서 누락된 질문이 있는지 검토하세요. 각 질문은 핵심 개념을 테스트합니다.

모든 신호가 진실은 아닙니다.

방법은 환자를 신뢰로부터 보호합니다.

Congratulations

증거 반전: 메타 분석 과정을 완료했습니다.

진리를 바탕으로 종합하고, 지혜를 바탕으로 통합하고,
겸손하게 결론을 내리시기를 바랍니다.

The Seven 원칙:

"모든 신호가 진실은 아닙니다."

"방법은 환자를 신뢰로부터 보호합니다."

"What was hidden in plain sight?"

"출처가 없는 숫자는 a가 아닙니다. 번호."

"이질성은 잡음이 아니라 메시지이다."

"증거의 부재는 부재의 증거가 아니다."

"Certainty must be earned, not assumed."

"직선으로 안내..."

Your Progress

7가지 원칙

Badges Earned

Learning Streak

모듈 0: 시작

🎯 Learning Objectives

What is Meta-Analysis?

합동 연구를 수행해야 하는 이유

Increase Statistical Power

Improve Precision

Resolve Disagreement

Explore Heterogeneity

합동을 수행하지 않는 경우

증거 계층

7가지 원칙

Module 0 Quiz

1. 때때로 메타 분석에서 연구를 통합하면 안 되는 이유는 무엇입니까?

2. RCT에 대한 체계적인 검토는 증거 계층 구조에서 어디에 위치합니까?

모듈 1: 질문

🎯 Learning Objectives

The Observation

The Response

모두를 설득한 논리

CAST: The Cardiac Arrhythmia Suppression Trial

결과: 1989년 4월

The Human 비용

논리 재검토

What Went Wrong: The Surrogate Trap

PICO 프레임워크

조사 연습: CAST 이전의 증거

Before: Observational Logic

After: CAST RCT (1989)

증거 종합에 대한 교훈

생물학적 타당성은 증명되지 않습니다

Surrogate endpoints can mislead

무작위 시험은 가장 강력한 인과 관계 증거를 제공합니다

합의는 증거가 아닙니다

REAL DATA

Module 1 Quiz

1. 항부정맥 논리의 근본적인 오류는 무엇이었나요?

2. PICO에서 "O"는 무엇을 의미하며 왜 중요한가요?

모듈 2: 프로토콜

🎯 Learning Objectives

간호사의 건강 연구

숨겨진 편견

WHI: The Women's Health Initiative

결과: 2002년 7월

REAL DATA

PROSPERO Registration

검색하기 전에 등록

결정 잠그기

Document Amendments

Prevent Duplication

Module 2 Quiz

1. 간호사 건강 연구에서 WHI가 제공하지 않은 HRT의 이점을 보여준 이유는 무엇입니까?

2. What is the primary purpose of PROSPERO registration?

모듈 3: 검색

🎯 Learning Objectives

공개된 증거(2007년 이전)

Nissen's Discovery: May 2007

메타 분석 결과

The FDA Advisory Committee: July 2007

The Aftermath

What a Comprehensive Search Requires

PRESS 체크리스트

연구 질문 번역

부울 및 근접 연산자

Subject Headings

Text Words

PRESS Checklist (continued)

Spelling, Syntax, Line Numbers

제한 및 필터

Database Translation

REAL DATA

Module 3 Quiz

1. 어떤 유형의 증거 출처에서 로시글리타존 심혈관 신호가 밝혀졌습니까?

2. What does PRESS stand for?

모듈 4: 스크리닝

🎯 Learning Objectives

Vioxx의 부상