PredModelMA

Prediction Model Validation Meta-Analysis — c-statistics, O:E ratios, calibration, PROBAST

Data Input
C-statistic MA
O:E Ratio MA
PROBAST
Calibration Slope
Forest Plots
Advanced
Guide

Load Example

Model Information

Validation Study Data

One row per validation study. Format: Study, N, Events, C-statistic, C_lower, C_upper, O:E_ratio, OE_lower, OE_upper, CalSlope, PROBAST_overall (L/H/U)
Optional extra columns (after PROBAST): CalSlope_SE, NRI_events, NRI_nonevents, NRI_event_SE, NRI_nonevent_SE, Applicability (L/H/U), IDI, IDI_SE, CITL, CITL_SE, TP, FP, Threshold(%)

Pooled C-statistic

Study-level Discrimination

StudyNEventsC-statistic95% CIWeight

Heterogeneity

Pooled O:E Ratio

Study-level Calibration

StudyO:E Ratio95% CIInterpretationWeight

Heterogeneity

PROBAST Risk of Bias Summary

Traffic Light Table

StudyParticipantsPredictorsOutcomeAnalysisOverall

Low risk High risk Unclear

PROBAST Bar Chart

Pooled Calibration Slope

Study-level Calibration Slopes

StudyNCal. SlopeInterpretationWeight

Heterogeneity

Forest Plot — Calibration Slopes

C-statistic Transformation Comparison (Debray et al. 2017)

Compare three transformations for pooling c-statistics. Logit is standard; log(-log) may be better when c is near 1; arcsine-sqrt is variance-stabilizing.

Prediction Interval for Pooled C-statistic (IntHout et al. 2016)

The prediction interval shows where the next validation study's c-statistic might fall. Much wider than the CI when heterogeneity is present. Uses t-distribution with k-2 df.

E/O vs C Bubble Plot (Calibration vs Discrimination)

X = c-statistic (discrimination), Y = O:E ratio (calibration). Bubble size = sample size. Color = PROBAST RoB. Quadrants: upper-left = poor discrimination + underprediction; lower-right = good discrimination + overprediction.

Net Reclassification Improvement (NRI) Pooling

Pools event NRI and non-event NRI separately (FE/RE). Overall NRI = NRI_events + NRI_nonevents. Provide NRI data in optional columns.

PROBAST Applicability Assessment

Distinct from risk of bias. A study can have low RoB but high applicability concern (e.g., different population from the target). 3 domains: Participants, Predictors, Outcome.
StudyParticipantsPredictorsOutcomeOverall Applicability

Subgroup Analysis by PROBAST Risk of Bias

Pools c-statistics separately for Low-RoB vs High/Unclear-RoB studies and tests for subgroup difference. If few studies in a subgroup (k < 3), pooling may be unreliable.

Newcombe Variance Approximation for C-statistic (Newcombe 2006)

When SE is not reported and only N + events are available (no CI), the Newcombe formula derives Var(c) from the concordance structure. Applied automatically when CI columns are missing or empty.

Integrated Discrimination Improvement (IDI) Pooling

IDI = mean predicted probability increase in events minus mean predicted probability increase in non-events. Pooled on raw scale (approximately normal for large N). Provide IDI and IDI_SE as optional columns 18-19.

Clinical Utility Meta-Analysis (Net Benefit — Vickers Decision Curve)

Net benefit = TP/N - FP/N x (pt/(1-pt)) where pt = threshold probability. Provide TP, FP, and threshold (%) as optional columns 22-24. Or set a common threshold below to compute from events/N.

Calibration-in-the-Large (CITL) Meta-Analysis

CITL = observed - predicted rate (on logit scale). Perfect calibration: CITL = 0. When CITL is reported directly, pool on raw scale. Otherwise derived from O:E as log(O/E). Provide CITL and CITL_SE as optional columns 20-21.

Discrimination Improvement Funnel Plot

X-axis: c-statistic per study. Y-axis: SE of c-statistic. Pseudo-95% CI lines around pooled c. Points colored by PROBAST rating. Egger-style regression test for small-study effects.

Leave-One-Out Sensitivity for Pooled C-statistic

For each study, re-pool c without that study. Reports change in pooled c and flags any single study that shifts it materially (> 0.01 on c scale).

Heterogeneity Decomposition (Meta-Regression on Sample Size)

Meta-regression of logit(c) on log(N). Reports R-squared = proportion of tau-squared explained by sample size. If R-squared > 50%, smaller studies tend to show systematically different c-statistics (spectrum effect).

Trim-and-Fill for C-statistics (Sensitivity Analysis)

Sensitivity analysis only — not a primary correction. Applied on logit(c) scale. Uses the R0 rank-based estimator to estimate the number of missing studies, imputes symmetrically around the pooled estimate, and re-pools. Imputed studies shown as filled triangles on the funnel plot above.

Doi Plot + LFK Index for C-statistics

Alternative to the funnel plot for detecting publication bias. The Doi plot graphs Z-scores (effect / SE) against |Z-scores|. The LFK index quantifies asymmetry: |LFK| ≤ 1 = no asymmetry, |LFK| 1–2 = minor, |LFK| > 2 = major asymmetry (Furuya-Kanamori, Barendregt & Doi 2018).

Events Per Variable (EPV) Assessment

EPV = min(events, non-events) / number_of_predictors. Studies with EPV < 10 may have optimistically biased c-statistics (overfitting). EPV ≥ 20 is recommended for stable validation. Set the number of predictors in the Data Input tab.

Harrell's Concordance Decomposition by Predictor

Decomposes the c-statistic by predictor contribution using an approximate beta-weight approach. When partial c data are not directly available, partial c for predictor j is approximated as c_partial_j = 0.5 + (c - 0.5) * |beta_j| / sum(|beta_k|). Enter predictor names and standardized coefficients below.

Meta-Regression on Study Characteristics

Regresses logit(c) on a user-selected covariate (log(N), publication year, or prevalence). Reports slope, CI, R-squared (% of tau-squared explained), and residual tau-squared. Select the covariate in the Data Input tab. Bubble plot: covariate on X, c on Y, size = N.

Bayesian Pooling of C-statistics

Bayesian random-effects model on logit(c) scale. Prior: Normal(0, sigma_prior^2) for mu. Posterior via 200x200 grid approximation over (mu, tau). Reports posterior median, 95% CrI, and P(c > threshold | data). Configure prior SD and clinical threshold in the Data Input tab.

Forest Plot — C-statistics

Forest Plot — O:E Ratios

Prediction Model MA — Method Guide

Why prediction model MA is different: Unlike treatment-effect MA, we pool performance measures (discrimination + calibration) rather than effect sizes. C-statistics are bounded [0.5, 1] and O:E ratios are bounded [0, ∞) — both require transformations for valid pooling.
C-statistic poolingTransform to logit(c) = log(c/(1-c)). Pool on logit scale. Back-transform pooled estimate. SE via delta method: SE(logit) = SE(c)/(c(1-c)).
O:E ratio poolingPool on log(O:E) scale. O:E = 1 means perfect calibration. O:E > 1 = underprediction. O:E < 1 = overprediction.
Calibration slopeSlope = 1 means perfect calibration. Slope < 1 = overfitting in development. Can pool directly (unbounded).
PROBAST4 domains: Participants, Predictors, Outcome, Analysis. Each rated Low/High/Unclear. Overall = High if any domain is High.

Advanced Methods

C-stat transformsLogit(c) is standard. Log(-log(c)) (complementary log-log) may perform better when c is near 1. Arcsine-sqrt(c) is variance-stabilizing. Compare results across transforms (Debray 2017 Stat Med).
Prediction intervalPI = expit(logit(c) +/- t_{k-2} * sqrt(SE^2 + tau^2)). Uses t-distribution, not normal. Undefined for k < 3. Shows range of plausible c values for next study (IntHout 2016).
Calibration slope MAPool slopes directly (unbounded). Slope = 1 = perfect. Slope < 1 = overfitting. Slope > 1 = underfitting. SE derived from CI width / (2*1.96).
Bubble plotO:E vs c-statistic. Size = N. Color = PROBAST. Quadrants show discrimination vs calibration trade-offs.
NRI poolingEvent NRI and non-event NRI pooled separately. Overall NRI = sum. Requires NRI data in optional columns.
ApplicabilityPROBAST applicability is distinct from risk of bias: low RoB but high concern if population/predictors/outcome differ from target use.
Subgroup by RoBCompare pooled c between Low-RoB and High/Unclear-RoB subgroups. Chi-squared test for subgroup difference.
Newcombe SEWhen CI is not reported, derive SE from N and events using Newcombe 2006 approximation: Var(c) accounts for concordance structure.
IDI poolingIntegrated Discrimination Improvement pooled on raw scale. IDI > 0 = new model improves discrimination. Provide IDI + IDI_SE.
Net benefitClinical utility via Vickers decision curve. Net benefit = TP/N - FP/N x pt/(1-pt). Pooled across studies at a specified threshold.
CITLCalibration-in-the-large. CITL = 0 = perfect. Derived from O:E as log(O/E) if not reported directly. Pooled on raw scale.
Funnel plotDiscrimination funnel: c-stat vs SE(c). Pseudo-CI lines. Egger regression for small-study effects. Colored by PROBAST.
Leave-one-outSensitivity: re-pool c without each study. Flags studies shifting pooled c by > 0.01. Forest plot of LOO estimates.
Het. decompositionMeta-regression of logit(c) on log(N). R-squared = proportion of tau-squared explained. R-squared > 50% = spectrum effect.
Knapp-HartungReplace z-based CI with t_{k-1}. HKSJ variance inflation q_KH = max(1, Q/(k-1)). Wider CI with proper small-sample correction. Shown alongside standard CI.
Trim-and-fillSensitivity analysis for publication bias in c-statistics. R0 rank-based estimator on logit(c) scale. Imputes missing studies and re-pools. Not a primary correction.
Doi + LFKAlternative to funnel: Z-score vs |Z-score|. LFK index: |LFK| ≤ 1 none, 1-2 minor, >2 major asymmetry.
EPVEvents per variable = min(events, non-events) / predictors. Green ≥ 20, Amber 10-20, Red < 10. Low EPV = optimistic c.
Concordance decomp.Approximate partial c per predictor: c_partial_j = 0.5 + (c-0.5) * |beta_j| / sum(|beta_k|). Shows which predictors drive discrimination.
Meta-regressionRegress logit(c) on covariate (log(N), year, prevalence). Reports slope, CI, R-squared, residual tau-squared. Bubble plot.
Bayesian poolingGrid approximation (200x200) for mu, tau on logit(c). Weakly informative Normal prior. Posterior median, 95% CrI, P(c > threshold | data).

PROBAST Domain Details

D1: ParticipantsAppropriate study design? Inclusions/exclusions? Sufficient sample size?
D2: PredictorsDefined consistently? Assessed blinded to outcome? Available at intended use time?
D3: OutcomePre-specified? Assessed blinded? Appropriate time horizon?
D4: AnalysisEvents per variable ≥10? Continuous predictors handled appropriately? Missing data handled? Appropriate performance measures?