Mathematical Analysis · ClinicalTrials.gov

Statistical Deep Dive
Africa RCT Distribution

Ten mathematical lenses applied to the distribution of 22,110 clinical trials across 54 African nations.

Gini Coefficient

0.857

Shannon Entropy

2.83 bits

Normalised Entropy

0.492

HHI

0.3152

Zipf Slope

-2.11

Benford MAD

0.0297

Pop→Trial R²

0.485

Pareto Countries

6/54

Africa's clinical trial distribution has a Gini coefficient of 0.857 — comparable to the world's most unequal income distributions. Just 6 countries (11.1% of the continent) account for 80% of all trials. Egypt alone hosts 53% of the total.

1. Inequality: Gini Coefficient & Lorenz Curve

Gini = (2 · Σ(i · x_i)) / (n · Σx_i) − (n+1)/n = 0.8570

The Gini coefficient measures inequality on a 0–1 scale. A value of 0.857 indicates extreme inequality in trial distribution. For comparison, South Africa's income Gini is ~0.63 — Africa's trial Gini is even higher.

The Lorenz curve shows the cumulative share of trials held by each percentile of countries. The red area between the curve and the equality line is proportional to the Gini coefficient.

2. Diversity: Shannon Entropy

H = −Σ p_i · log₂(p_i) = 2.829 bits | H_max = log₂(54) = 5.755 bits | H/H_max = 0.492

Shannon entropy measures the diversity of trial distribution. With a normalised entropy of 0.492, the distribution uses only 49% of its maximum possible diversity. If trials were spread equally across all 54 countries, entropy would be 5.75 bits. The actual value of 2.83 bits reflects heavy concentration in a few nations.

3. Market Concentration: Herfindahl-Hirschman Index

HHI = Σ(s_i)² = 0.3152 | Equivalent firms = 1/HHI = 3.2

The HHI, borrowed from antitrust economics, treats each country as a "market participant." An HHI of 0.3152 means the 54 African countries behave, in terms of concentration, like only 3.2 equal-sized countries. In antitrust terms, this would indicate a highly concentrated market. The US DOJ considers HHI > 0.25 as "highly concentrated."

4. Pareto Analysis (80/20 Rule)

6 countries (11.1%) account for 80% of all 22,110 African clinical trials:

Country	Trials	% of Total	Cumulative %
Egypt	11,752	53.2%	53.2%
South Africa	3,654	16.5%	69.7%
Uganda	809	3.7%	73.3%
Kenya	788	3.6%	76.9%
Tunisia	540	2.4%	79.3%
Tanzania	460	2.1%	81.4%

5. Zipf's Law (Rank-Size Distribution)

log(trials) = -2.112 · log(rank) + C | R² = 0.842 | Ideal Zipf slope = −1.0

Zipf's law predicts that the second-largest city is half the size of the largest, the third is a third, etc. Applied to trial counts, a slope of -2.11 deviates from the ideal −1.0. The distribution is more top-heavy than Zipf predicts — Egypt dominates even more than expected. R² = 0.842 indicates good fit.

6. Benford's Law (First-Digit Test)

MAD = 0.0297 | χ² = 6.35 (df=8, critical=15.51 at α=0.05) → CONFORMS

Benford's Law predicts that in naturally occurring datasets, the digit 1 appears as the first digit ~30% of the time, digit 2 ~17.6%, and so on. The trial counts conform to Benford at the 5% significance level, suggesting natural (non-fabricated) data. MAD of 0.0297 is outside the close-conformity threshold of 0.015.

7. Log-Linear Regression: Population → Trials

log(trials) = 0.925 × log(population) + 1.669 | R² = 0.485

A log-log regression tests whether trial counts scale proportionally with population. A slope of 0.93 means that doubling a country's population is associated with a 1.9-fold increase in trials. Population alone poorly predicts trial activity — other factors dominate.

Green dots are overperformers (more trials than population predicts). Red dots are underperformers. Countries above the regression line have stronger research infrastructure relative to their size.

8. Regression Residuals (Over/Underperformers)

Country	Trials	Population	Residual
Egypt	11,752	104.5M	+3.40
South Africa	3,654	60.4M	+2.74
Tunisia	540	12.5M	+2.29
Botswana	123	2.6M	+2.26
Mauritius	48	1.3M	+1.96
Gambia	82	2.7M	+1.82
Guinea-Bissau	57	2.1M	+1.69
Gabon	64	2.4M	+1.68
Eswatini	30	1.2M	+1.56
Uganda	809	48.6M	+1.43
Malawi	344	20.4M	+1.38
Kenya	788	55.1M	+1.29
Zambia	307	20.6M	+1.26
Lesotho	38	2.3M	+1.20
Zimbabwe	201	16.7M	+1.03
Burkina Faso	215	22.7M	+0.81
Rwanda	138	14.1M	+0.81
Mali	183	22.6M	+0.66
Ghana	261	33.5M	+0.65
Tanzania	460	65.5M	+0.59
Seychelles	1	0.1M	+0.46
Senegal	113	17.9M	+0.39
Sierra Leone	54	8.6M	+0.33
Liberia	31	5.4M	+0.21
Cameroon	133	28.6M	+0.12
Morocco	162	37.5M	+0.07
Mozambique	147	33.9M	+0.06
Benin	55	13.4M	-0.06
Comoros	4	0.9M	-0.19
Cote d'Ivoire	93	28.9M	-0.25
Ethiopia	302	126.5M	-0.44
Algeria	114	45.6M	-0.47
Guinea	35	14.2M	-0.57
Namibia	7	2.6M	-0.61
Nigeria	379	223.8M	-0.74
Democratic Republic of Congo	160	102.3M	-0.87
Niger	44	26.2M	-0.91
Libya	12	7.0M	-0.98
Togo	14	9.0M	-1.06
Equatorial Guinea	3	1.7M	-1.06
Djibouti	2	1.1M	-1.06
Central African Republic	9	5.7M	-1.08
Cabo Verde	1	0.6M	-1.20
Sudan	55	48.1M	-1.24
Madagascar	33	30.3M	-1.33
Burundi	13	13.2M	-1.49
South Sudan	8	11.1M	-1.82
Congo (Brazzaville)	4	6.1M	-1.96
Chad	9	18.3M	-2.16
Mauritania	2	4.9M	-2.45
Somalia	6	18.1M	-2.56
Angola	10	36.7M	-2.70
Eritrea	1	3.7M	-2.88

Positive residuals indicate countries with more trials than their population would predict. Negative residuals indicate underperformance. The residual captures the effect of governance, infrastructure, language, colonial history, and funding beyond what population alone explains.

9. Regional Inequality Decomposition

Region	Trials	Population	Trials/M	Internal Gini
North	12,635	255M	49.5	0.794
West	1,619	436M	3.7	0.521
East	3,104	444M	7.0	0.677
Central	382	165M	2.3	0.638
Southern	4,370	143M	30.5	0.790

Internal Gini measures inequality within each sub-region. High internal Gini means one country dominates the region (e.g., Egypt in North, South Africa in Southern). Low internal Gini means trials are more evenly distributed across the region's countries.

10. Descriptive Statistics Summary

Mean

409.4

Median

55.0

Std Dev

1638.7

CV

4.0

IQR

174

Min / Max

0 / 11,752

CV = σ/μ = 1638.7/409.4 = 4.0 | IQR = Q3 − Q1 = 183 − 9 = 174

A coefficient of variation of 4.0 (mean = 409.4, std = 1638.7) indicates extreme variability. The median (55.0) is far below the mean (409.4), confirming a heavily right-skewed distribution dominated by a few large values.

Methods Summary

Method	Origin	What It Measures	Result
Gini Coefficient	Economics (Corrado Gini, 1912)	Inequality of distribution	0.857
Shannon Entropy	Information Theory (Claude Shannon, 1948)	Diversity / evenness	2.83 bits (49.2% of max)
HHI	Antitrust Economics	Market concentration	0.3152 (equiv. 3.2 countries)
Pareto Analysis	Management Science (Vilfredo Pareto)	80/20 concentration	6 countries = 80% of trials
Zipf's Law	Linguistics / Complex Systems (George Zipf)	Rank-size regularity	slope = -2.11, R² = 0.842
Benford's Law	Number Theory (Frank Benford, 1938)	First-digit naturalness	MAD = 0.0297, χ² = 6.35
Log-Linear Regression	Statistics (OLS)	Population → Trial scaling	β = 0.925, R² = 0.485
Coefficient of Variation	Descriptive Statistics	Relative variability	CV = 4.0
Regional Gini	Decomposition Analysis	Within-region inequality	5 sub-regions
Z-Score Outliers	Standardisation	Extreme values	1 outlier(s)

Statistical Deep DiveAfrica RCT Distribution