can be reduced to a single number,
and in that number, lives are erased?
But which patients benefited? The young or the old? Those with mild disease or severe? Men or women?
The aggregate cannot answer.
For within the average, some patients were saved—and some were harmed.
● Responders ● Non-responders
But who are the 30%? Without individual data, we cannot identify the responders from the non-responders. We cannot practice precision medicine.
Not summaries. Not averages. Each patient, each tumor, each outcome.
What he discovered changed breast cancer treatment forever.
"Tamoxifen works."
But the individual data revealed:
Giving tamoxifen to ER− patients was useless.
In the individual, the truth appears.
This is why we seek the hidden patient."
This is Individual Participant Data Meta-Analysis.
What is found when we look closer?
Aggregate Data (AD)
- Study-level summaries
- Effect sizes from publications
- Mean age, % male, etc.
- Quick and accessible
- Cannot see within-study variation
Individual Participant Data (IPD)
- Patient-level raw data
- Every participant's characteristics
- Actual ages, actual outcomes
- Time-intensive to obtain
- Can see who responds and who doesn't
Trial A: Mean age 55 years, Effect size 0.70 (benefit)
Trial B: Mean age 75 years, Effect size 0.90 (less benefit)
Tempting conclusion: "The drug works better in younger patients."
But this could be completely wrong.
Perhaps in Trial A, older patients within that trial responded better. Perhaps in Trial B, younger patients within that trial responded better.
You cannot know without individual data.
True Effect Modification
Test whether treatment effect varies by patient characteristics (age, biomarkers, disease severity)
Time-to-Event Analysis
Use actual survival curves, not just hazard ratios. Handle censoring properly.
Consistent Definitions
Standardize outcome definitions, exposure timing, covariate categories across studies
Subgroup Credibility
Test interactions within studies, avoiding the ecological fallacy
Individual data shows each tree.
If you need to know which trees are sick—
you must walk among them."
gathered data on 170,000 patients
and answered questions no single trial could ask?
27 trials. 174,149 patients. Every baseline characteristic. Every cardiovascular event. Every death.
The published trials asked: "Do statins work?"
The CTT asked: "For whom do statins work?"
But the questions you can answer are worth the investment.
Benefit Proportional to LDL Reduction
Every 1 mmol/L LDL reduction = 22% lower CV events. True across all subgroups.
No Age Threshold
Benefit continues even in patients >75 years (contradicting earlier AD analyses)
Primary Prevention Works
Patients without prior CVD benefit proportionally to their baseline risk
No Cancer Signal
Concerns about statins causing cancer were definitively refuted with IPD follow-up
Trial A: "High risk" = 10-year CVD risk >20%
Trial B: "High risk" = Prior MI
Trial C: "High risk" = Diabetes
You cannot compare or combine what is defined differently.
IPD allowed the CTT to redefine everyone consistently.
IPD provides the translation.
What seemed contradictory becomes clear:
the same truth, measured differently."
Aggregate data takes weeks.
When is the investment worth it?
Should You Pursue IPD?
Time-to-Event Outcomes
When survival curves matter, not just final hazard ratios. When you need to handle censoring properly.
Continuous Effect Modifiers
Testing whether treatment effect varies by age, BMI, biomarker level (not just "high" vs "low")
Outcome Definition Problems
When trials define outcomes differently and you need to standardize
Longer Follow-Up Available
When trialists have unpublished follow-up data you want to include
Your key question is: "Does benefit vary by disease severity and timing of treatment?"
Overall Treatment Effect
When your only question is "Does it work?" not "For whom?"
Homogeneous Population
When trials enrolled similar patients and effect modification is unlikely
Binary Outcomes, Short Follow-up
When censoring isn't an issue and outcomes are simple yes/no
IPD Unobtainable
When trialists won't share, data is lost, or resources are unavailable
But every question about which individuals
demands their presence in your data."
Now: do you analyze it as one combined dataset
or trial by trial, then combine?
Two-Stage Approach
- Stage 1: Analyze each trial separately
- Stage 2: Meta-analyze the results
- Preserves trial structure
- Familiar (like standard MA)
- Cannot handle sparse data well
One-Stage Approach
- Analyze all data simultaneously
- Mixed-effects regression model
- Random effects for clustering
- Better for sparse data
- More flexible modeling
Each trial's estimate is transparent.
Familiar forest plots and I2 statistics.
2. More powerful for detecting interactions
3. Can model complex covariate relationships
4. Exact likelihood (no normal approximations)
One-Stage or Two-Stage?
The one-stage hears all voices at once.
When data is sparse and events are rare—
the one-stage catches what the two-stage misses."
that saved babies' lives—
but only if given at the right time?
But a puzzle remained: When should they be given?
24 hours before birth? 48 hours? A week?
Published trials couldn't answer—they didn't report timing consistently.
Maximum benefit: 24 hours to 7 days before birth
Reduced benefit: >7 days (lung maturity effect fades)
No benefit: <24 hours (not enough time to work)
After IPD: Guidelines now recommend repeat dosing if delivery hasn't occurred within 7 days of the first course.
This precision—impossible without individual data— has saved thousands of premature babies.
Trial B reported: "Steroid given antenatally"
Trial C reported: "Steroid-to-delivery interval: median 3 days"
Different categories. Different definitions. Incompatible summaries.
Only by examining each baby's actual steroid-to-delivery time could the optimal window be identified.
But when to give it was unknown.
IPD turned 'sometime' into 'the right time'—
and in that precision, children lived."
Somewhere, in files and databases,
each patient's story is recorded.
The question is: will they share it?
This is expected. Plan for it.
Trialist Collaboration
Direct contact with trial investigators. Build relationships. Offer co-authorship.
Data Sharing Platforms
YODA Project, ClinicalStudyDataRequest.com, Vivli, ICPSR
Regulatory Agencies
EMA Policy 0070, FDA (limited), Health Canada
Journal Requirements
Many journals now require data sharing; check supplementary materials
2. Offer co-authorship—make sharing worthwhile
3. Describe data security—how you'll protect their patients
4. Provide data dictionaries—specify exactly what you need
5. Set clear timelines—respect their time
Industry trials: less likely to share
Negative trials: less likely to share
Older trials: data may be lost
Your IPD sample may be biased.
Will they share what they have guarded?
Build the bridge carefully—
for on that bridge, patients' futures cross."
from twelve trials, five countries, three decades.
But Trial A calls it "cardiovascular death"
and Trial B calls it "cardiac mortality".
Are they the same?
Trial from USA: Age in decimal years (e.g., 65.7)
Trial from UK: Age bands ("65-74")
Diabetes: HbA1c ≥ 6.5% vs. fasting glucose ≥ 126 vs. "physician diagnosis"
Outcome: "Major adverse cardiac event" (one trial includes stroke, another doesn't)
Before analysis: harmonize everything.
Create a Master Data Dictionary
Define every variable you need: name, type, permitted values, derivation rules
Map Each Trial's Variables
Document how each trial's coding maps to your standardized definitions
Check with Trialists
Verify your interpretations. They know their data better than you.
Validate Transformations
Reproduce published results from the IPD. If they don't match, investigate.
Six trials used a standard definition (ICD codes). Two trials from the 1990s used "investigator-assessed cardiac death" with no standardized criteria. These two trials show larger treatment effects.
Reproduce each trial's published results from the IPD.
If your analysis gives RR = 0.78 but the publication says RR = 0.85,
something is wrong.
Find the discrepancy. Fix it. Then proceed.
different ways to name the same disease.
Before you can combine, you must translate.
Before you translate, you must understand."
But does it work the same for the young and the old?
For the mild and the severe?
For the one with the biomarker and the one without?
They asked: "Does the effect differ by estrogen receptor status?"
The interaction was massive:
ER-positive: 47% reduction in recurrence
ER-negative: No benefit at all
The treatment effect differs between subgroups defined by the covariate.
If not: the treatment effect is similar across subgroups (or you lack power to detect a difference).
Between-Study Interaction
- Compares trial-level averages
- Ecological fallacy risk
- Confounded by trial design
- Low statistical power
- Can do with aggregate data
Within-Study Interaction
- Compares patients within each trial
- No ecological fallacy
- Randomization preserved
- Much higher power
- Requires IPD
This is the gold standard for effect modification.
You tested 8 potential effect modifiers in your analysis.
The interaction reveals who benefits.
Test it within studies, not between—
for that is where the truth is found."
When you seek to predict who will die,
who will recover, who will relapse—
the individual is everything.
"Will they ever wake up?"
Individual trials were too small to develop accurate prediction models.
IMPACT gathered IPD from 11 studies, 9,205 patients, and built a model that predicts 6-month outcomes from initial clinical features.
External Validation Across Populations
Develop in some studies, validate in others. True test of generalizability.
Non-linear Relationships
Explore how predictors relate to outcome: linear? threshold? U-shaped?
Multiple Predictor Interactions
Age + GCS + pupil reactivity may interact in ways aggregate data cannot reveal
Proper Handling of Missing Predictors
Multiple imputation at the patient level, not at the study level
TRIPOD
- Prediction model reporting
- Development and validation
- Calibration and discrimination
- 22 item checklist
PRISMA-IPD
- IPD meta-analysis reporting
- Data acquisition details
- Harmonization process
- Integrity checking
When a patient arrives with traumatic brain injury, the model provides a probability of survival and probability of favorable outcome.
This guides conversations with families. This informs treatment intensity. This helps allocate ICU resources.
Built from individual data. Serving individual patients.
you must learn from thousands of pasts.
IPD holds those stories—
each one a teacher, if you will listen."
that has been given to millions for their hearts?
They were told: "Take this, and you shall be protected."
But was every heart equally in need of protection?
Researchers quickly re-identified users by matching ratings with public IMDb reviews.
One lawsuit alleged a closeted lesbian was outed through her viewing patterns.
The lesson: removing names isn't anonymization.
IPD contains enough combinations of age, diagnosis, treatment response, and dates to uniquely identify individuals. Privacy requires more than deleting the name column—it requires understanding how data combinations become fingerprints.
The published trials said: "Aspirin prevents heart attacks."
But the ATT asked the forbidden question:
"At what cost? And for whom?"
2-3 heart attacks prevented
2-3 major bleeds caused
The benefit and harm cancel out.
Should This Patient Take Aspirin for Primary Prevention?
(benefit > harm)
(benefit ≈ harm)
(harm > benefit)
They could not show that high-risk patients gained
while low-risk patients lost.
Only by examining each patient's baseline risk,
each patient's outcomes,
could the interaction be revealed.
He read online that "aspirin prevents heart attacks" and wants your advice.
may harm the secure.
Know your patient's risk before you prescribe.
This is what the individual data taught us."
95,000 patients. One truth: risk determines benefit.
"Should the old be treated the same as the young?"
Some said: "Lower is always better."
Others said: "The old are fragile. Be cautious."
Who was right?
The Aggressive Camp: "Every mmHg of BP reduction saves lives.
Treat everyone to target 120/80."
The Conservative Camp: "In the elderly, low BP causes falls, strokes, death.
There's a J-curve—too low is dangerous."
Individual trials were too small to settle it. Until the BPLTTC gathered the individual data.
Is there a J-curve at very low pressures?
Do the very old (>80 years) still benefit?
Age 55-64: 10% lower major CV events
Age 65-74: 10% lower major CV events
Age 75-84: 10% lower major CV events
Age ≥85: Still 10% lower
pinteraction = 0.85. No evidence of age modification.
who achieved very low blood pressures.
Result: No J-curve in randomized comparisons.
The apparent J-curve in observational data was reverse causation— sick patients have low BP because they're sick, not sick because their BP is low.
Only IPD from RCTs could untangle this.
Should This Elderly Patient Get BP Treatment?
Proportional benefit maintained across all ages
Consider frailty, life expectancy, patient preference—but not age.
You remember the BPLTTC IPD meta-analysis.
But the individual data showed otherwise:
At every age, the benefit endures.
Do not let age alone deny protection."
when a clot blocks the brain?
Every minute, two million neurons die.
But when does the window close?
When is it too late to intervene?
But given too late, it causes bleeding into dying brain tissue.
What is the time window? 3 hours? 4.5 hours? 6 hours?
Individual trials disagreed. Guidelines were uncertain.
They knew exactly when each patient's stroke began. They knew exactly when thrombolysis was given. They knew exactly who lived, who died, who recovered.
They could map benefit against time, minute by minute.
Treated at 90 min: 1 in 4 achieve excellent outcome
Treated at 180 min: 1 in 7 achieve excellent outcome
Treated at 270 min: 1 in 14 achieve excellent outcome
Acute Ischemic Stroke: Give Thrombolysis?
URGENTLY
if eligible
imaging
avoid
Trial B: "Patients treated within 6 hours" (average: 4.2 hours)
These overlapping, inconsistent windows couldn't be compared.
Only by knowing each patient's exact time
could the continuous decay of benefit be mapped.
Door-to-needle time will add 30 minutes, making total time ~4.5 hours.
two million neurons perish.
The IPD showed us the fading window.
Act quickly, or the window closes forever."
You open the files with hope.
And then you see it: empty cells.
Age: 67. Sex: Male. Smoking status: missing.
Outcome at 1 year: missing.
What now?
Missing Completely at Random (MCAR)
Lab machine broke randomly. No relation to patient characteristics. Safe to ignore (but wasteful).
Missing at Random (MAR)
Older patients more likely to miss follow-up. Missingness related to observed variables. Imputation can help.
Missing Not at Random (MNAR)
Patients with poor outcomes drop out. Missingness related to the missing value itself. Dangerous. Requires sensitivity analysis.
What To Do With Missing Values?
may suffice
imputation
sensitivity
Multiple imputation (M=20-50 datasets) reflects uncertainty about what the missing value might have been.
This preserves valid standard errors and p-values.
Systematically Missing Variables
Trial A measured biomarker X. Trial B didn't. Can't impute what was never collected.
Multilevel Structure
Patients nested within trials. Imputation model must account for clustering.
Different Follow-up Durations
Trial A followed for 2 years. Trial B for 5 years. Survival analysis needs care.
Trial Didn't Collect Your Key Covariate
analysis to trials
with the variable
varying assumptions
about unmeasured
These 3 trials tend to be older and smaller.
It is a question: Why is this unknown?
Answer that question before you fill the gap—
for the reason for absence shapes the solution."
who gathered data from willing trialists
and declared victory?
But the unwilling held secrets.
And those secrets changed everything.
Cochrane reviewers requested trial data to verify efficacy. Roche refused, citing confidentiality.
For five years, the BMJ campaigned for transparency. When full Clinical Study Reports were finally released in 2014, the picture changed: Tamiflu reduced symptom duration by less than a day and didn't prevent complications.
Billions spent on a drug whose full evidence was locked away.
The Tamiflu saga transformed expectations—today, clinical trial transparency is becoming the norm, not the exception.
Industry-sponsored trials: Less likely to share
Trials with negative results: Less likely to share
Older trials: Data often lost
If these trials systematically differ in effect size,
your IPD-MA is biased.
Assessing Availability Bias
for bias
Sensitivity analysis needed
Report IPD retrieval rate
"We obtained IPD from 12/15 trials (80%)"
Compare IPD vs. non-IPD trial characteristics
Sample size, funding, publication date, effect size from aggregate data
Sensitivity analysis including non-IPD trials
Two-stage analysis combining IPD + AD from non-sharing trials
Discuss reasons for non-sharing
Data lost? Refused? Never requested? Each has different implications.
Your IPD analysis shows OR 0.70.
Use IPD where you have it (for interaction testing).
Supplement with AD for overall effect estimation.
Transparency about what came from where.
The open doors may hide the truth behind locked ones.
Always ask: Who refused to share?
And what might they be hiding?"
Key Sources Cited in This Course
- Riley RD, et al. Individual Participant Data Meta-Analysis: A Handbook for Healthcare Research. Wiley, 2021.
- Stewart LA, et al. PRISMA-IPD: Preferred reporting items for systematic reviews and meta-analyses of individual participant data. JAMA 2015;313:1657-65.
- Early Breast Cancer Trialists' Collaborative Group. Tamoxifen for early breast cancer. Cochrane Database Syst Rev 2001.
- Cholesterol Treatment Trialists' Collaboration. Efficacy and safety of LDL-lowering therapy. Lancet 2010;376:1670-81.
- Roberts D, et al. Antenatal corticosteroids for accelerating fetal lung maturation. Cochrane Database Syst Rev 2017.
- IMPACT Study Group. Predicting outcome after traumatic brain injury. PLoS Med 2008;5:e165.
- Debray TPA, et al. Get real in individual participant data meta-analysis. Int J Epidemiol 2015;44:1287-97.
- Burke DL, et al. Meta-analysis using individual participant data. Stat Med 2017;36:320-38.
But you have learned to find them.
You have learned to ask: Who benefits? Who is harmed?
Now go—and let no patient vanish in the average."
The Hidden Patient — Now You See Them.