Prediction Model Validation Meta-Analysis — c-statistics, O:E ratios, calibration, PROBAST
Data Input
C-statistic MA
O:E Ratio MA
PROBAST
Calibration Slope
Forest Plots
Advanced
Guide
Load Example
Model Information
Validation Study Data
One row per validation study. Format: Study, N, Events, C-statistic, C_lower, C_upper, O:E_ratio, OE_lower, OE_upper, CalSlope, PROBAST_overall (L/H/U) Optional extra columns (after PROBAST): CalSlope_SE, NRI_events, NRI_nonevents, NRI_event_SE, NRI_nonevent_SE, Applicability (L/H/U), IDI, IDI_SE, CITL, CITL_SE, TP, FP, Threshold(%)
Pooled C-statistic
Study-level Discrimination
Study
N
Events
C-statistic
95% CI
Weight
Heterogeneity
Pooled O:E Ratio
Study-level Calibration
Study
O:E Ratio
95% CI
Interpretation
Weight
Heterogeneity
PROBAST Risk of Bias Summary
Traffic Light Table
Study
Participants
Predictors
Outcome
Analysis
Overall
Low risk High risk Unclear
PROBAST Bar Chart
Pooled Calibration Slope
Study-level Calibration Slopes
Study
N
Cal. Slope
Interpretation
Weight
Heterogeneity
Forest Plot — Calibration Slopes
C-statistic Transformation Comparison (Debray et al. 2017)
Compare three transformations for pooling c-statistics. Logit is standard; log(-log) may be better when c is near 1; arcsine-sqrt is variance-stabilizing.
Prediction Interval for Pooled C-statistic (IntHout et al. 2016)
The prediction interval shows where the next validation study's c-statistic might fall. Much wider than the CI when heterogeneity is present. Uses t-distribution with k-2 df.
E/O vs C Bubble Plot (Calibration vs Discrimination)
X = c-statistic (discrimination), Y = O:E ratio (calibration). Bubble size = sample size. Color = PROBAST RoB. Quadrants: upper-left = poor discrimination + underprediction; lower-right = good discrimination + overprediction.
Net Reclassification Improvement (NRI) Pooling
Pools event NRI and non-event NRI separately (FE/RE). Overall NRI = NRI_events + NRI_nonevents. Provide NRI data in optional columns.
PROBAST Applicability Assessment
Distinct from risk of bias. A study can have low RoB but high applicability concern (e.g., different population from the target). 3 domains: Participants, Predictors, Outcome.
Study
Participants
Predictors
Outcome
Overall Applicability
Subgroup Analysis by PROBAST Risk of Bias
Pools c-statistics separately for Low-RoB vs High/Unclear-RoB studies and tests for subgroup difference. If few studies in a subgroup (k < 3), pooling may be unreliable.
Newcombe Variance Approximation for C-statistic (Newcombe 2006)
When SE is not reported and only N + events are available (no CI), the Newcombe formula derives Var(c) from the concordance structure. Applied automatically when CI columns are missing or empty.
IDI = mean predicted probability increase in events minus mean predicted probability increase in non-events. Pooled on raw scale (approximately normal for large N). Provide IDI and IDI_SE as optional columns 18-19.
Net benefit = TP/N - FP/N x (pt/(1-pt)) where pt = threshold probability. Provide TP, FP, and threshold (%) as optional columns 22-24. Or set a common threshold below to compute from events/N.
Calibration-in-the-Large (CITL) Meta-Analysis
CITL = observed - predicted rate (on logit scale). Perfect calibration: CITL = 0. When CITL is reported directly, pool on raw scale. Otherwise derived from O:E as log(O/E). Provide CITL and CITL_SE as optional columns 20-21.
Discrimination Improvement Funnel Plot
X-axis: c-statistic per study. Y-axis: SE of c-statistic. Pseudo-95% CI lines around pooled c. Points colored by PROBAST rating. Egger-style regression test for small-study effects.
Leave-One-Out Sensitivity for Pooled C-statistic
For each study, re-pool c without that study. Reports change in pooled c and flags any single study that shifts it materially (> 0.01 on c scale).
Heterogeneity Decomposition (Meta-Regression on Sample Size)
Meta-regression of logit(c) on log(N). Reports R-squared = proportion of tau-squared explained by sample size. If R-squared > 50%, smaller studies tend to show systematically different c-statistics (spectrum effect).
Trim-and-Fill for C-statistics (Sensitivity Analysis)
Sensitivity analysis only — not a primary correction. Applied on logit(c) scale. Uses the R0 rank-based estimator to estimate the number of missing studies, imputes symmetrically around the pooled estimate, and re-pools. Imputed studies shown as filled triangles on the funnel plot above.
Doi Plot + LFK Index for C-statistics
Alternative to the funnel plot for detecting publication bias. The Doi plot graphs Z-scores (effect / SE) against |Z-scores|. The LFK index quantifies asymmetry: |LFK| ≤ 1 = no asymmetry, |LFK| 1–2 = minor, |LFK| > 2 = major asymmetry (Furuya-Kanamori, Barendregt & Doi 2018).
Events Per Variable (EPV) Assessment
EPV = min(events, non-events) / number_of_predictors. Studies with EPV < 10 may have optimistically biased c-statistics (overfitting). EPV ≥ 20 is recommended for stable validation. Set the number of predictors in the Data Input tab.
Harrell's Concordance Decomposition by Predictor
Decomposes the c-statistic by predictor contribution using an approximate beta-weight approach. When partial c data are not directly available, partial c for predictor j is approximated as c_partial_j = 0.5 + (c - 0.5) * |beta_j| / sum(|beta_k|). Enter predictor names and standardized coefficients below.
Meta-Regression on Study Characteristics
Regresses logit(c) on a user-selected covariate (log(N), publication year, or prevalence). Reports slope, CI, R-squared (% of tau-squared explained), and residual tau-squared. Select the covariate in the Data Input tab. Bubble plot: covariate on X, c on Y, size = N.
Bayesian Pooling of C-statistics
Bayesian random-effects model on logit(c) scale. Prior: Normal(0, sigma_prior^2) for mu. Posterior via 200x200 grid approximation over (mu, tau). Reports posterior median, 95% CrI, and P(c > threshold | data). Configure prior SD and clinical threshold in the Data Input tab.
Forest Plot — C-statistics
Forest Plot — O:E Ratios
Prediction Model MA — Method Guide
Why prediction model MA is different: Unlike treatment-effect MA, we pool performance measures (discrimination + calibration) rather than effect sizes. C-statistics are bounded [0.5, 1] and O:E ratios are bounded [0, ∞) — both require transformations for valid pooling.
C-statistic pooling
Transform to logit(c) = log(c/(1-c)). Pool on logit scale. Back-transform pooled estimate. SE via delta method: SE(logit) = SE(c)/(c(1-c)).
O:E ratio pooling
Pool on log(O:E) scale. O:E = 1 means perfect calibration. O:E > 1 = underprediction. O:E < 1 = overprediction.
Calibration slope
Slope = 1 means perfect calibration. Slope < 1 = overfitting in development. Can pool directly (unbounded).
PROBAST
4 domains: Participants, Predictors, Outcome, Analysis. Each rated Low/High/Unclear. Overall = High if any domain is High.
Advanced Methods
C-stat transforms
Logit(c) is standard. Log(-log(c)) (complementary log-log) may perform better when c is near 1. Arcsine-sqrt(c) is variance-stabilizing. Compare results across transforms (Debray 2017 Stat Med).
Prediction interval
PI = expit(logit(c) +/- t_{k-2} * sqrt(SE^2 + tau^2)). Uses t-distribution, not normal. Undefined for k < 3. Shows range of plausible c values for next study (IntHout 2016).
Calibration slope MA
Pool slopes directly (unbounded). Slope = 1 = perfect. Slope < 1 = overfitting. Slope > 1 = underfitting. SE derived from CI width / (2*1.96).
Bubble plot
O:E vs c-statistic. Size = N. Color = PROBAST. Quadrants show discrimination vs calibration trade-offs.
NRI pooling
Event NRI and non-event NRI pooled separately. Overall NRI = sum. Requires NRI data in optional columns.
Applicability
PROBAST applicability is distinct from risk of bias: low RoB but high concern if population/predictors/outcome differ from target use.
Subgroup by RoB
Compare pooled c between Low-RoB and High/Unclear-RoB subgroups. Chi-squared test for subgroup difference.
Newcombe SE
When CI is not reported, derive SE from N and events using Newcombe 2006 approximation: Var(c) accounts for concordance structure.
IDI pooling
Integrated Discrimination Improvement pooled on raw scale. IDI > 0 = new model improves discrimination. Provide IDI + IDI_SE.
Net benefit
Clinical utility via Vickers decision curve. Net benefit = TP/N - FP/N x pt/(1-pt). Pooled across studies at a specified threshold.
CITL
Calibration-in-the-large. CITL = 0 = perfect. Derived from O:E as log(O/E) if not reported directly. Pooled on raw scale.
Funnel plot
Discrimination funnel: c-stat vs SE(c). Pseudo-CI lines. Egger regression for small-study effects. Colored by PROBAST.
Leave-one-out
Sensitivity: re-pool c without each study. Flags studies shifting pooled c by > 0.01. Forest plot of LOO estimates.
Het. decomposition
Meta-regression of logit(c) on log(N). R-squared = proportion of tau-squared explained. R-squared > 50% = spectrum effect.
Knapp-Hartung
Replace z-based CI with t_{k-1}. HKSJ variance inflation q_KH = max(1, Q/(k-1)). Wider CI with proper small-sample correction. Shown alongside standard CI.
Trim-and-fill
Sensitivity analysis for publication bias in c-statistics. R0 rank-based estimator on logit(c) scale. Imputes missing studies and re-pools. Not a primary correction.
Doi + LFK
Alternative to funnel: Z-score vs |Z-score|. LFK index: |LFK| ≤ 1 none, 1-2 minor, >2 major asymmetry.
EPV
Events per variable = min(events, non-events) / predictors. Green ≥ 20, Amber 10-20, Red < 10. Low EPV = optimistic c.
Concordance decomp.
Approximate partial c per predictor: c_partial_j = 0.5 + (c-0.5) * |beta_j| / sum(|beta_k|). Shows which predictors drive discrimination.