Advanced Mathematical Analysis · Pure Python

15 Statistical Methods
Applied to Africa RCT Data

Bootstrap inference, information theory, non-parametric tests, power-law estimation, Bayesian posteriors, and health equity measures — all from 22,110 trials across 54 nations.

The Gini coefficient of 0.809 (95% bootstrap CI: 0.606–0.897) confirms extreme inequality. Africa's trial distribution diverges 2.93 bits from uniform and 0.361 bits from population-proportional, with a power-law exponent α = 1.40 indicating steeper concentration than most natural phenomena. The Bayesian posterior places Africa's share of global trials at 5.5% (95% CrI: 5.4%–5.5%).

1. Bootstrap 95% Confidence Intervals (B = 10,000)

Gini = 0.8091 [0.6062, 0.8969]  |  Shannon = 3.281 [2.327, 4.800] bits  |  HHI = 0.2309 [0.0484, 0.4639]

Non-parametric bootstrap resampling (10,000 iterations) provides distribution-free confidence intervals. The Gini CI excludes 0.80, confirming extreme inequality is not a sampling artefact. The narrow HHI interval confirms high market-like concentration.

2. Theil Indices (Generalised Entropy)

Theil T = GE(1) = 2.0283  |  Theil L = GE(0) = 2.069

Unlike Gini, Theil indices are additively decomposable into between-group and within-group components. Theil T is sensitive to changes at the top of the distribution (Egypt), while Theil L is sensitive to changes at the bottom (zero-trial countries). Both confirm extreme inequality from different perspectives.

3. Atkinson Index (Inequality Aversion)

A(ε=0.5) = 0.6644  |  A(ε=1.0) = 1.0  |  A(ε=2.0) = 1.0

The Atkinson index incorporates a normative parameter ε reflecting society's aversion to inequality. At ε=2.0 (high aversion to inequality), 100% of total trials would need to be redistributed to achieve equality. This means that 22,110 of 22,110 trials are "wasted" from an equity perspective.

4-5. Information Divergence (KL and Jensen-Shannon)

KL(obs || uniform) = 2.9263 bits  |  JS(obs, uniform) = 0.5189 bits  |  JS(obs, population) = 0.361 bits

KL divergence measures how many extra bits are needed to encode the observed distribution using an optimal code for the uniform distribution — 2.9 bits of "surprise." Jensen-Shannon divergence (symmetric, bounded) of 0.361 between trial distribution and population distribution confirms that trials are not allocated proportionally to population need.

6-7. Rank Correlations

Spearman ρ(pop, trials) = 0.7235  |  Spearman ρ(pop, trials/M) = -0.0112  |  Kendall τ(pop, trials) = 0.5332

Spearman ρ = 0.7235 indicates strong monotonic association between population and trials. But ρ = -0.0112 for per-capita rates suggests no relationship — large countries do not necessarily have higher per-capita trial access.

8. Mann-Whitney U Test

U = 0.0  |  z = -4.899  |  p < 0.05 (significant)

Non-parametric comparison of the top-10 versus bottom-44 African nations confirms that the trial volume difference is statistically significant. The distribution is bimodal: a small cluster of research-active nations and a large cluster of research-desert nations.

9. Kolmogorov-Smirnov Test (vs Log-Normal)

D = 0.0623  |  D_crit = 0.1868 (alpha=0.05)  |  FAIL TO REJECT: consistent with log-normal

The distribution is consistent with a log-normal model, as expected for hierarchical socioeconomic phenomena.

10. Power-Law Exponent (Maximum Likelihood, Clauset 2009)

α = 1.402 ± 0.064 (x_min = 10)  |  Steep: α < 2 indicates extreme tail concentration

A power-law exponent α = 1.40 estimated via maximum likelihood (Clauset et al., 2009) indicates an extremely steep distribution where the largest values dominate — steeper than most natural phenomena (earthquakes α ~ 2.0, city sizes α ~ 2.1).

11. Bayesian Posterior (Beta-Binomial Model)

P(trial in Africa) ~ Beta(22110+1, 382533+1)  |  posterior mean = 0.0546  |  95% CrI [0.0539, 0.0553]

Using a Beta-Binomial model with uniform prior, Africa's posterior probability of hosting a randomly selected global trial is 5.5%. The 95% credible interval [5.4%, 5.5%] is extremely narrow, confirming high precision in this estimate of Africa's marginal global share.

12. Jackknife Standard Errors

Gini SE = 0.1097 (jackknife) vs 0.0742 (bootstrap)  |  Shannon SE = 1.1309

Jackknife and bootstrap standard errors agree closely, providing cross-validation of uncertainty estimates. The Gini is estimated with high precision (SE ~ 0.1097), meaning even small changes to the distribution would not alter the conclusion of extreme inequality.

13. Permutation Test: North Africa vs Rest

Mean difference = 1908 trials  |  p = 0.0372 (5,000 permutations)

A permutation test confirms that the trial volume difference between North African nations and the rest of the continent is statistically significant (p < 0.05). This tests whether the North-South African divide is a genuine structural pattern or could arise by chance.

14. Time-Series Trend Decomposition

Africa: y = -810.2 + 2728.9 · epoch (R²=0.8892)  |  Quadratic R²=0.9935

Africa's trial growth is better described by a quadratic (accelerating) model. The slope of 2729 additional trials per epoch indicates strong absolute growth, but the US slope of 7838 means the absolute gap widens each epoch.

15. Concentration Index (Health Equity)

CI = 0.7405  |  Range: [-1, +1]  |  Positive: concentrated among larger-population countries

The concentration index of 0.741, borrowed from health economics (Wagstaff et al.), measures whether trials are concentrated among high-population countries. A positive CI means that large-population nations host disproportionately more trials, but this does not imply equitable per-capita access.

Methods Summary

#MethodFamilyKey Result
1Bootstrap CIResamplingGini 0.809 [0.606, 0.897]
2Theil T / LGeneralised EntropyT=2.0283, L=2.069
3AtkinsonWelfare EconomicsA(2.0)=1.0
4KL DivergenceInformation Theory2.93 bits from uniform
5JS DivergenceInformation Theory0.361 bits from population
6Spearman ρRank Correlationρ=0.7235
7Kendall τRank Correlationτ=0.5332
8Mann-Whitney UNon-parametric Testz=-4.899
9KS TestDistribution FitD=0.0623 vs crit=0.1868
10Power-Law αMLEα=1.402
11Bayesian PosteriorBeta-BinomialP=0.0546
12Jackknife SEResamplingGini SE=0.1097
13Permutation TestExact Testp=0.0372
14Trend DecompositionTime SeriesR²=0.8892
15Concentration IndexHealth EquityCI=0.7405