RetractionImpact

Retraction Fragility Quantifier — How robust is your meta-analysis to study retractions?

Data Input

Results

Impact Waterfall

Fragility Curve

Leave-One-Out

Advanced Diagnostics

Sensitivity

Temporal

Radial Plot

Guide

Load Example

Meta-Analysis Data

One study per line: Study name, effect estimate (log scale), standard error. The effect is assumed to be on log scale (log OR, log RR, log HR, or SMD).

Effect measure

Null value

0 for log scales, 0 for SMD

Study data (study, effect, SE)

Pooling method

Max retraction depth

Max # studies to retract combinatorially

Retraction Fragility Summary

Original Meta-Analysis

Most Impactful Single Retraction

Retraction Fragility Index (RFI)

Leave-One-Out Impact Table

Retracted Study	Pooled Est.	Change	% Change	Reversal?	New P

Impact Waterfall — Effect of Retracting Each Study

Bars show the change in pooled estimate when each study is removed. Sorted by absolute impact.

Fragility Curve — P-value After Sequential Retraction

Studies removed in order of most-to-least impactful. Shows how quickly significance erodes.

Leave-One-Out Forest Plot

Prediction Interval

Influence Diagnostics Table

Cook's D > 4/k, |DFBETAS| > 2/√k, |Studentized Residual| > 2 flag influential/outlier studies. High leverage = disproportionate weight.

Study	Leverage (h_i)	Stud. Resid (r_i*)	Cook's D	DFBETAS	Flags

Baujat Plot

X = contribution to Q (heterogeneity), Y = influence on pooled estimate. Upper-right = influential + heterogeneous.

GOSH Density Plot

Distribution of pooled estimates across random study subsets. Bimodality suggests important subgroups.

Credibility Ceiling

The credibility ceiling (Ioannidis 2017) is the maximum bias probability u at which the pooled conclusion still holds. Higher ceiling = more robust to potential bias.

Doi Plot & LFK Index

Alternative to the funnel plot for detecting asymmetry. X-axis = Z-score (effect/SE), Y-axis = |Z|. The LFK index quantifies asymmetry: |LFK| < 1 = no asymmetry, 1–2 = minor, > 2 = major. (Furuya-Kanamori et al. 2018, Int J Evid Based Healthc)

E-value for Unmeasured Confounding

The E-value (VanderWeele & Ding 2017, Ann Int Med) is the minimum strength of association that an unmeasured confounder would need with both exposure and outcome to explain away the observed effect. Higher E-values = more robust.

Henmi-Copas Adjusted CI

Publication bias-adjusted confidence interval (Henmi & Copas 2010, Stat Med). If the adjusted CI is much wider than the standard CI, the pooled result is fragile to selective reporting.

Power Analysis

Post-hoc power and prospective sample size for the meta-analysis. How much power does this MA have, and how many more studies would be needed?

Robust Meta-Analysis (Huber M-Estimator)

Huber's M-estimation downweights outlying studies instead of removing them. The tuning constant k=1.345 gives 95% efficiency at the normal distribution. Compare with standard DL. (Huber 1964, Ann Math Stat)

Sensitivity to Effect Measure

If the input is log-OR, also interpret as log-RR or SMD. Shows how the conclusion changes if the effect measure is reinterpreted. The conversion uses the constant √3/π ≈ 0.5513 for OR↔SMD.

Permutation Test for Heterogeneity

Permutes the signs of effect estimates to build a null distribution of Q. More accurate than the χ² approximation for small k. Uses seeded PRNG for reproducibility. (Higgins & Thompson 2004, Stat Med)

REML vs DL Heterogeneity Estimation

Restricted Maximum Likelihood (REML) via Fisher scoring provides a less biased tau² estimate than DerSimonian-Laird, especially for small k. The Q-profile method gives a confidence interval for tau². (Viechtbauer 2005, Stat Med)

Knapp-Hartung Adjustment

The Knapp-Hartung (HKSJ) adjustment replaces the z-based CI with a t-based CI using an adjusted variance estimator: se_KH = se × √max(1, Q/(k−1)). Uses t_k−1 quantile, often giving substantially wider (and more honest) intervals. (Knapp & Hartung 2003, Stat Med)

Profile Likelihood CI

The profile likelihood CI (Hardy & Thompson 1996) evaluates the log-likelihood at a grid of 200 beta values and finds the bounds where −2LL drops by χ²_1,α = 3.841. Typically asymmetric. Compared with Wald CI and KH CI.

HKSJ Prediction Interval

The most conservative prediction interval: PI = β ± t_k−2 × √(se_KH² + τ²_REML). Combines REML heterogeneity with Knapp-Hartung variance for small-k honesty.

Fragility Direction Analysis

When a retraction reverses significance: is it because the study was (a) the largest (weight-driven), (b) the most extreme effect (effect-driven), or (c) both? Classifies each LOO reversal to explain WHY the MA is fragile.

Temporal Retraction Analysis

Studies are ordered by input order (proxy for publication chronology). Cumulative meta-analysis adds studies one-by-one, tracking how the pooled estimate and RFI evolve as evidence accumulates. Shows whether fragility was always present or developed with specific studies.

Galbraith (Radial) Plot

X = precision (1/SE), Y = Z-score (effect/SE). The regression line through the origin has slope = pooled estimate. Studies outside the 95% band (±1.96) are heterogeneity-driving outliers. (Galbraith 1988)

Retraction Impact — Method Guide

Motivation: Retractions in meta-analyses can invalidate pooled conclusions. The Retraction Fragility Index (RFI) quantifies how many studies would need to be retracted to reverse the statistical conclusion (significant → non-significant or vice versa).

RFI = 1	Extremely fragile — removing any single study reverses the conclusion.
RFI = 2–3	Fragile — a small cluster of retractions can change the conclusion.
RFI ≥ 4	Moderately robust — multiple retractions needed.
RFI = k	Maximally robust — even removing all but 2 studies doesn't change conclusion.

How It Works

1. Compute the full pooled estimate and its P-value.
2. For each subset of 1, 2, ..., d studies (up to max depth), re-pool without those studies.
3. The RFI is the smallest number of retractions that reverses significance (crosses P = 0.05).
4. The fragility curve shows P-value degradation as studies are greedily removed (most impactful first).

Advanced Diagnostics

Cook's D	D_i = (β_full − β_−i)² / Var(β). Threshold: 4/k. (Viechtbauer & Cheung 2010, Res Synth Methods)
DFBETAS	(β_full − β_−i) / SE_−i. Threshold: \|DFBETAS\| > 2/√k. Standardized influence on the pooled estimate.
Leverage (h_i)	w_i* / Σw_j* where w* = 1/(v_i + τ²). High leverage = study dominates the pooled weight.
Studentized Residual	r_i* = (y_i − β) / √((v_i + τ²)(1 − h_i)). \|r*\| > 2 suggests outlier.
Baujat Plot	X = study's contribution to Q, Y = influence on pooled estimate. Upper-right studies are both heterogeneous and influential. (Baujat et al. 2002, Stat Med)
GOSH	Graphical Overview of Study Heterogeneity. Pools all 2^k−1 subsets (capped at 1000 random subsets for k>15). Bimodality suggests subgroups.
Credibility Ceiling	Max bias probability u where conclusion holds with inflated variance v_i + u²·θ_i². (Ioannidis 2017, J Clin Epidemiol)
Prediction Interval	β ± t_k−2 · √(SE² + τ²). Range of true effects expected in a new study setting.
Galbraith (Radial) Plot	X = 1/SE (precision), Y = effect/SE (Z-score). Slope of regression through origin = pooled estimate. Studies outside ±1.96 band are outliers. (Galbraith 1988)
Doi Plot & LFK Index	Alternative to funnel plot. X = Z-score, Y = \|Z\|. LFK index: \|LFK\| < 1 none, 1–2 minor, > 2 major asymmetry. (Furuya-Kanamori 2018)
E-value	Minimum confounding strength to explain away the effect. E = RR + √(RR × (RR − 1)). Higher = more robust. (VanderWeele & Ding 2017)
Henmi-Copas CI	Publication bias-adjusted CI. Wider CI = fragile to selective reporting. (Henmi & Copas 2010, Stat Med)
Power Analysis	Post-hoc power and prospective sample size. Reports detectable effect at 80% power and studies needed for a target effect.
Robust MA (Huber)	Huber M-estimator with k=1.345. Iteratively downweights outliers. Compares with standard DL estimate.
Sensitivity to Measure	Reinterprets log-OR as log-RR or SMD (and vice versa). Shows conclusion stability across effect measures. Uses √3/π ≈ 0.5513.
Permutation Test	Permutes signs of effect estimates to test heterogeneity. More accurate than χ² for small k. 2^k exhaustive (k≤20) or 1000 random with seeded PRNG. (Higgins & Thompson 2004)

Sensitivity Methods

REML Estimation	Restricted Maximum Likelihood via Fisher scoring. Less biased than DL for small k. Q-profile CI for τ². (Viechtbauer 2005, Stat Med)
Knapp-Hartung	Replaces z-based CI with t_k−1-based CI using adjusted variance: se_KH = se × √max(1, Q/(k−1)). Often substantially wider. (Knapp & Hartung 2003)
Profile Likelihood CI	Grid search (200 points) over beta. Bounds where −2LL drops by χ²₁ = 3.841. Typically asymmetric. (Hardy & Thompson 1996)
HKSJ Prediction Interval	Most conservative PI: β ± t_k−2 × √(se_KH² + τ²_REML). Combines Knapp-Hartung variance with REML heterogeneity.
Fragility Direction	Classifies each LOO reversal as weight-driven (large study), effect-driven (extreme outlier), or both. Explains WHY the MA is fragile.
Temporal Analysis	Cumulative MA in input order. Tracks pooled estimate and RFI evolution as evidence accumulates. Shows whether fragility was always present or developed with specific studies.