RetractionImpact

Retraction Fragility Quantifier — How robust is your meta-analysis to study retractions?

Data Input
Results
Impact Waterfall
Fragility Curve
Leave-One-Out
Advanced Diagnostics
Sensitivity
Temporal
Radial Plot
Guide

Load Example

Meta-Analysis Data

One study per line: Study name, effect estimate (log scale), standard error. The effect is assumed to be on log scale (log OR, log RR, log HR, or SMD).

0 for log scales, 0 for SMD

Max # studies to retract combinatorially

Retraction Fragility Summary

Original Meta-Analysis

Most Impactful Single Retraction

Retraction Fragility Index (RFI)

Leave-One-Out Impact Table

Retracted StudyPooled Est.Change% ChangeReversal?New P

Impact Waterfall — Effect of Retracting Each Study

Bars show the change in pooled estimate when each study is removed. Sorted by absolute impact.

Fragility Curve — P-value After Sequential Retraction

Studies removed in order of most-to-least impactful. Shows how quickly significance erodes.

Leave-One-Out Forest Plot

Prediction Interval

Influence Diagnostics Table

Cook's D > 4/k, |DFBETAS| > 2/√k, |Studentized Residual| > 2 flag influential/outlier studies. High leverage = disproportionate weight.
StudyLeverage (hi)Stud. Resid (ri*)Cook's DDFBETASFlags

Baujat Plot

X = contribution to Q (heterogeneity), Y = influence on pooled estimate. Upper-right = influential + heterogeneous.

GOSH Density Plot

Distribution of pooled estimates across random study subsets. Bimodality suggests important subgroups.

Credibility Ceiling

The credibility ceiling (Ioannidis 2017) is the maximum bias probability u at which the pooled conclusion still holds. Higher ceiling = more robust to potential bias.

Doi Plot & LFK Index

Alternative to the funnel plot for detecting asymmetry. X-axis = Z-score (effect/SE), Y-axis = |Z|. The LFK index quantifies asymmetry: |LFK| < 1 = no asymmetry, 1–2 = minor, > 2 = major. (Furuya-Kanamori et al. 2018, Int J Evid Based Healthc)

E-value for Unmeasured Confounding

The E-value (VanderWeele & Ding 2017, Ann Int Med) is the minimum strength of association that an unmeasured confounder would need with both exposure and outcome to explain away the observed effect. Higher E-values = more robust.

Henmi-Copas Adjusted CI

Publication bias-adjusted confidence interval (Henmi & Copas 2010, Stat Med). If the adjusted CI is much wider than the standard CI, the pooled result is fragile to selective reporting.

Power Analysis

Post-hoc power and prospective sample size for the meta-analysis. How much power does this MA have, and how many more studies would be needed?

Robust Meta-Analysis (Huber M-Estimator)

Huber's M-estimation downweights outlying studies instead of removing them. The tuning constant k=1.345 gives 95% efficiency at the normal distribution. Compare with standard DL. (Huber 1964, Ann Math Stat)

Sensitivity to Effect Measure

If the input is log-OR, also interpret as log-RR or SMD. Shows how the conclusion changes if the effect measure is reinterpreted. The conversion uses the constant √3/π ≈ 0.5513 for OR↔SMD.

Permutation Test for Heterogeneity

Permutes the signs of effect estimates to build a null distribution of Q. More accurate than the χ² approximation for small k. Uses seeded PRNG for reproducibility. (Higgins & Thompson 2004, Stat Med)

REML vs DL Heterogeneity Estimation

Restricted Maximum Likelihood (REML) via Fisher scoring provides a less biased tau² estimate than DerSimonian-Laird, especially for small k. The Q-profile method gives a confidence interval for tau². (Viechtbauer 2005, Stat Med)

Knapp-Hartung Adjustment

The Knapp-Hartung (HKSJ) adjustment replaces the z-based CI with a t-based CI using an adjusted variance estimator: seKH = se × √max(1, Q/(k−1)). Uses tk−1 quantile, often giving substantially wider (and more honest) intervals. (Knapp & Hartung 2003, Stat Med)

Profile Likelihood CI

The profile likelihood CI (Hardy & Thompson 1996) evaluates the log-likelihood at a grid of 200 beta values and finds the bounds where −2LL drops by χ²1,α = 3.841. Typically asymmetric. Compared with Wald CI and KH CI.

HKSJ Prediction Interval

The most conservative prediction interval: PI = β ± tk−2 × √(seKH² + τ²REML). Combines REML heterogeneity with Knapp-Hartung variance for small-k honesty.

Fragility Direction Analysis

When a retraction reverses significance: is it because the study was (a) the largest (weight-driven), (b) the most extreme effect (effect-driven), or (c) both? Classifies each LOO reversal to explain WHY the MA is fragile.

Temporal Retraction Analysis

Studies are ordered by input order (proxy for publication chronology). Cumulative meta-analysis adds studies one-by-one, tracking how the pooled estimate and RFI evolve as evidence accumulates. Shows whether fragility was always present or developed with specific studies.

Galbraith (Radial) Plot

X = precision (1/SE), Y = Z-score (effect/SE). The regression line through the origin has slope = pooled estimate. Studies outside the 95% band (±1.96) are heterogeneity-driving outliers. (Galbraith 1988)

Retraction Impact — Method Guide

Motivation: Retractions in meta-analyses can invalidate pooled conclusions. The Retraction Fragility Index (RFI) quantifies how many studies would need to be retracted to reverse the statistical conclusion (significant → non-significant or vice versa).
RFI = 1Extremely fragile — removing any single study reverses the conclusion.
RFI = 2–3Fragile — a small cluster of retractions can change the conclusion.
RFI ≥ 4Moderately robust — multiple retractions needed.
RFI = kMaximally robust — even removing all but 2 studies doesn't change conclusion.

How It Works

1. Compute the full pooled estimate and its P-value.
2. For each subset of 1, 2, ..., d studies (up to max depth), re-pool without those studies.
3. The RFI is the smallest number of retractions that reverses significance (crosses P = 0.05).
4. The fragility curve shows P-value degradation as studies are greedily removed (most impactful first).

Advanced Diagnostics

Cook's DDi = (βfull − β−i)² / Var(β). Threshold: 4/k. (Viechtbauer & Cheung 2010, Res Synth Methods)
DFBETASfull − β−i) / SE−i. Threshold: |DFBETAS| > 2/√k. Standardized influence on the pooled estimate.
Leverage (hi)wi* / Σwj* where w* = 1/(vi + τ²). High leverage = study dominates the pooled weight.
Studentized Residualri* = (yi − β) / √((vi + τ²)(1 − hi)). |r*| > 2 suggests outlier.
Baujat PlotX = study's contribution to Q, Y = influence on pooled estimate. Upper-right studies are both heterogeneous and influential. (Baujat et al. 2002, Stat Med)
GOSHGraphical Overview of Study Heterogeneity. Pools all 2k−1 subsets (capped at 1000 random subsets for k>15). Bimodality suggests subgroups.
Credibility CeilingMax bias probability u where conclusion holds with inflated variance vi + u²·θi². (Ioannidis 2017, J Clin Epidemiol)
Prediction Intervalβ ± tk−2 · √(SE² + τ²). Range of true effects expected in a new study setting.
Galbraith (Radial) PlotX = 1/SE (precision), Y = effect/SE (Z-score). Slope of regression through origin = pooled estimate. Studies outside ±1.96 band are outliers. (Galbraith 1988)
Doi Plot & LFK IndexAlternative to funnel plot. X = Z-score, Y = |Z|. LFK index: |LFK| < 1 none, 1–2 minor, > 2 major asymmetry. (Furuya-Kanamori 2018)
E-valueMinimum confounding strength to explain away the effect. E = RR + √(RR × (RR − 1)). Higher = more robust. (VanderWeele & Ding 2017)
Henmi-Copas CIPublication bias-adjusted CI. Wider CI = fragile to selective reporting. (Henmi & Copas 2010, Stat Med)
Power AnalysisPost-hoc power and prospective sample size. Reports detectable effect at 80% power and studies needed for a target effect.
Robust MA (Huber)Huber M-estimator with k=1.345. Iteratively downweights outliers. Compares with standard DL estimate.
Sensitivity to MeasureReinterprets log-OR as log-RR or SMD (and vice versa). Shows conclusion stability across effect measures. Uses √3/π ≈ 0.5513.
Permutation TestPermutes signs of effect estimates to test heterogeneity. More accurate than χ² for small k. 2k exhaustive (k≤20) or 1000 random with seeded PRNG. (Higgins & Thompson 2004)

Sensitivity Methods

REML EstimationRestricted Maximum Likelihood via Fisher scoring. Less biased than DL for small k. Q-profile CI for τ². (Viechtbauer 2005, Stat Med)
Knapp-HartungReplaces z-based CI with tk−1-based CI using adjusted variance: seKH = se × √max(1, Q/(k−1)). Often substantially wider. (Knapp & Hartung 2003)
Profile Likelihood CIGrid search (200 points) over beta. Bounds where −2LL drops by χ²1 = 3.841. Typically asymmetric. (Hardy & Thompson 1996)
HKSJ Prediction IntervalMost conservative PI: β ± tk−2 × √(seKH² + τ²REML). Combines Knapp-Hartung variance with REML heterogeneity.
Fragility DirectionClassifies each LOO reversal as weight-driven (large study), effect-driven (extreme outlier), or both. Explains WHY the MA is fragile.
Temporal AnalysisCumulative MA in input order. Tracks pooled estimate and RFI evolution as evidence accumulates. Shows whether fragility was always present or developed with specific studies.