Mathematical Analysis · ClinicalTrials.gov

Statistical Deep Dive
Africa RCT Distribution

Ten mathematical lenses applied to the distribution of 22,110 clinical trials across 54 African nations.

Gini Coefficient
0.857
Shannon Entropy
2.83 bits
Normalised Entropy
0.492
HHI
0.3152
Zipf Slope
-2.11
Benford MAD
0.0297
Pop→Trial R²
0.485
Pareto Countries
6/54
Africa's clinical trial distribution has a Gini coefficient of 0.857 — comparable to the world's most unequal income distributions. Just 6 countries (11.1% of the continent) account for 80% of all trials. Egypt alone hosts 53% of the total.

1. Inequality: Gini Coefficient & Lorenz Curve

Gini = (2 · Σ(i · x_i)) / (n · Σx_i) − (n+1)/n = 0.8570

The Gini coefficient measures inequality on a 0–1 scale. A value of 0.857 indicates extreme inequality in trial distribution. For comparison, South Africa's income Gini is ~0.63 — Africa's trial Gini is even higher.

The Lorenz curve shows the cumulative share of trials held by each percentile of countries. The red area between the curve and the equality line is proportional to the Gini coefficient.

Cumulative % of Countries (ranked low→high) Cumulative % of Trials Gini = 0.857 Line of equality 0%25%50%75%100% 0%25%50%75%100%

2. Diversity: Shannon Entropy

H = −Σ p_i · log₂(p_i) = 2.829 bits   |   H_max = log₂(54) = 5.755 bits   |   H/H_max = 0.492

Shannon entropy measures the diversity of trial distribution. With a normalised entropy of 0.492, the distribution uses only 49% of its maximum possible diversity. If trials were spread equally across all 54 countries, entropy would be 5.75 bits. The actual value of 2.83 bits reflects heavy concentration in a few nations.

3. Market Concentration: Herfindahl-Hirschman Index

HHI = Σ(s_i)² = 0.3152   |   Equivalent firms = 1/HHI = 3.2

The HHI, borrowed from antitrust economics, treats each country as a "market participant." An HHI of 0.3152 means the 54 African countries behave, in terms of concentration, like only 3.2 equal-sized countries. In antitrust terms, this would indicate a highly concentrated market. The US DOJ considers HHI > 0.25 as "highly concentrated."

4. Pareto Analysis (80/20 Rule)

6 countries (11.1%) account for 80% of all 22,110 African clinical trials:

CountryTrials% of TotalCumulative %
Egypt11,75253.2%53.2%
South Africa3,65416.5%69.7%
Uganda8093.7%73.3%
Kenya7883.6%76.9%
Tunisia5402.4%79.3%
Tanzania4602.1%81.4%

5. Zipf's Law (Rank-Size Distribution)

log(trials) = -2.112 · log(rank) + C   |   R² = 0.842   |   Ideal Zipf slope = −1.0

Zipf's law predicts that the second-largest city is half the size of the largest, the third is a third, etc. Applied to trial counts, a slope of -2.11 deviates from the ideal −1.0. The distribution is more top-heavy than Zipf predicts — Egypt dominates even more than expected. R² = 0.842 indicates good fit.

Rank-Size Distribution (log-log) Zipf slope = -2.11 | R² = 0.842 log(Rank) log(Trials)

6. Benford's Law (First-Digit Test)

MAD = 0.0297   |   χ² = 6.35 (df=8, critical=15.51 at α=0.05)   →   CONFORMS

Benford's Law predicts that in naturally occurring datasets, the digit 1 appears as the first digit ~30% of the time, digit 2 ~17.6%, and so on. The trial counts conform to Benford at the 5% significance level, suggesting natural (non-fabricated) data. MAD of 0.0297 is outside the close-conformity threshold of 0.015.

First-Digit Distribution: Expected (green) vs Observed (red) MAD = 0.0297 | n = 53 countries 130.1%32.1%217.6%9.4%312.5%20.8%49.7%9.4%57.9%9.4%66.7%3.8%75.8%3.8%85.1%5.7%94.6%5.7% First Digit

7. Log-Linear Regression: Population → Trials

log(trials) = 0.925 × log(population) + 1.669   |   R² = 0.485

A log-log regression tests whether trial counts scale proportionally with population. A slope of 0.93 means that doubling a country's population is associated with a 1.9-fold increase in trials. Population alone poorly predicts trial activity — other factors dominate.

Green dots are overperformers (more trials than population predicts). Red dots are underperformers. Countries above the regression line have stronger research infrastructure relative to their size.

EgyptSouth AfricaUgandaKenyaTunisiaNigeriaMalawiZambiaZimbabweRwandaBotswanaGambiaGabonGuinea-BissauSudanMauritiusLesothoMadagascarEswatiniTogoBurundiAngolaCentral African RepublicChadSouth SudanSomaliaCongo (Brazzaville)Equatorial GuineaDjiboutiMauritaniaCabo VerdeEritrea Population vs Trials (log-log) slope = 0.93 | R² = 0.485 log(Population in millions) log(Trial count) Green = overperforms Red = underperforms

8. Regression Residuals (Over/Underperformers)

CountryTrialsPopulationResidual
Egypt11,752104.5M+3.40
South Africa3,65460.4M+2.74
Tunisia54012.5M+2.29
Botswana1232.6M+2.26
Mauritius481.3M+1.96
Gambia822.7M+1.82
Guinea-Bissau572.1M+1.69
Gabon642.4M+1.68
Eswatini301.2M+1.56
Uganda80948.6M+1.43
Malawi34420.4M+1.38
Kenya78855.1M+1.29
Zambia30720.6M+1.26
Lesotho382.3M+1.20
Zimbabwe20116.7M+1.03
Burkina Faso21522.7M+0.81
Rwanda13814.1M+0.81
Mali18322.6M+0.66
Ghana26133.5M+0.65
Tanzania46065.5M+0.59
Seychelles10.1M+0.46
Senegal11317.9M+0.39
Sierra Leone548.6M+0.33
Liberia315.4M+0.21
Cameroon13328.6M+0.12
Morocco16237.5M+0.07
Mozambique14733.9M+0.06
Benin5513.4M-0.06
Comoros40.9M-0.19
Cote d'Ivoire9328.9M-0.25
Ethiopia302126.5M-0.44
Algeria11445.6M-0.47
Guinea3514.2M-0.57
Namibia72.6M-0.61
Nigeria379223.8M-0.74
Democratic Republic of Congo160102.3M-0.87
Niger4426.2M-0.91
Libya127.0M-0.98
Togo149.0M-1.06
Equatorial Guinea31.7M-1.06
Djibouti21.1M-1.06
Central African Republic95.7M-1.08
Cabo Verde10.6M-1.20
Sudan5548.1M-1.24
Madagascar3330.3M-1.33
Burundi1313.2M-1.49
South Sudan811.1M-1.82
Congo (Brazzaville)46.1M-1.96
Chad918.3M-2.16
Mauritania24.9M-2.45
Somalia618.1M-2.56
Angola1036.7M-2.70
Eritrea13.7M-2.88

Positive residuals indicate countries with more trials than their population would predict. Negative residuals indicate underperformance. The residual captures the effect of governance, infrastructure, language, colonial history, and funding beyond what population alone explains.

9. Regional Inequality Decomposition

RegionTrialsPopulationTrials/MInternal Gini
North12,635255M49.50.794
West1,619436M3.70.521
East3,104444M7.00.677
Central382165M2.30.638
Southern4,370143M30.50.790

Internal Gini measures inequality within each sub-region. High internal Gini means one country dominates the region (e.g., Egypt in North, South Africa in Southern). Low internal Gini means trials are more evenly distributed across the region's countries.

10. Descriptive Statistics Summary

Mean
409.4
Median
55.0
Std Dev
1638.7
CV
4.0
IQR
174
Min / Max
0 / 11,752
CV = σ/μ = 1638.7/409.4 = 4.0   |   IQR = Q3 − Q1 = 183 − 9 = 174

A coefficient of variation of 4.0 (mean = 409.4, std = 1638.7) indicates extreme variability. The median (55.0) is far below the mean (409.4), confirming a heavily right-skewed distribution dominated by a few large values.

Methods Summary

MethodOriginWhat It MeasuresResult
Gini CoefficientEconomics (Corrado Gini, 1912)Inequality of distribution0.857
Shannon EntropyInformation Theory (Claude Shannon, 1948)Diversity / evenness2.83 bits (49.2% of max)
HHIAntitrust EconomicsMarket concentration0.3152 (equiv. 3.2 countries)
Pareto AnalysisManagement Science (Vilfredo Pareto)80/20 concentration6 countries = 80% of trials
Zipf's LawLinguistics / Complex Systems (George Zipf)Rank-size regularityslope = -2.11, R² = 0.842
Benford's LawNumber Theory (Frank Benford, 1938)First-digit naturalnessMAD = 0.0297, χ² = 6.35
Log-Linear RegressionStatistics (OLS)Population → Trial scalingβ = 0.925, R² = 0.485
Coefficient of VariationDescriptive StatisticsRelative variabilityCV = 4.0
Regional GiniDecomposition AnalysisWithin-region inequality5 sub-regions
Z-Score OutliersStandardisationExtreme values1 outlier(s)