Ten mathematical lenses applied to the distribution of 22,110 clinical trials across 54 African nations.
The Gini coefficient measures inequality on a 0–1 scale. A value of 0.857 indicates extreme inequality in trial distribution. For comparison, South Africa's income Gini is ~0.63 — Africa's trial Gini is even higher.
The Lorenz curve shows the cumulative share of trials held by each percentile of countries. The red area between the curve and the equality line is proportional to the Gini coefficient.
Shannon entropy measures the diversity of trial distribution. With a normalised entropy of 0.492, the distribution uses only 49% of its maximum possible diversity. If trials were spread equally across all 54 countries, entropy would be 5.75 bits. The actual value of 2.83 bits reflects heavy concentration in a few nations.
The HHI, borrowed from antitrust economics, treats each country as a "market participant." An HHI of 0.3152 means the 54 African countries behave, in terms of concentration, like only 3.2 equal-sized countries. In antitrust terms, this would indicate a highly concentrated market. The US DOJ considers HHI > 0.25 as "highly concentrated."
6 countries (11.1%) account for 80% of all 22,110 African clinical trials:
| Country | Trials | % of Total | Cumulative % |
|---|---|---|---|
| Egypt | 11,752 | 53.2% | 53.2% |
| South Africa | 3,654 | 16.5% | 69.7% |
| Uganda | 809 | 3.7% | 73.3% |
| Kenya | 788 | 3.6% | 76.9% |
| Tunisia | 540 | 2.4% | 79.3% |
| Tanzania | 460 | 2.1% | 81.4% |
Zipf's law predicts that the second-largest city is half the size of the largest, the third is a third, etc. Applied to trial counts, a slope of -2.11 deviates from the ideal −1.0. The distribution is more top-heavy than Zipf predicts — Egypt dominates even more than expected. R² = 0.842 indicates good fit.
Benford's Law predicts that in naturally occurring datasets, the digit 1 appears as the first digit ~30% of the time, digit 2 ~17.6%, and so on. The trial counts conform to Benford at the 5% significance level, suggesting natural (non-fabricated) data. MAD of 0.0297 is outside the close-conformity threshold of 0.015.
A log-log regression tests whether trial counts scale proportionally with population. A slope of 0.93 means that doubling a country's population is associated with a 1.9-fold increase in trials. Population alone poorly predicts trial activity — other factors dominate.
Green dots are overperformers (more trials than population predicts). Red dots are underperformers. Countries above the regression line have stronger research infrastructure relative to their size.
| Country | Trials | Population | Residual |
|---|---|---|---|
| Egypt | 11,752 | 104.5M | +3.40 |
| South Africa | 3,654 | 60.4M | +2.74 |
| Tunisia | 540 | 12.5M | +2.29 |
| Botswana | 123 | 2.6M | +2.26 |
| Mauritius | 48 | 1.3M | +1.96 |
| Gambia | 82 | 2.7M | +1.82 |
| Guinea-Bissau | 57 | 2.1M | +1.69 |
| Gabon | 64 | 2.4M | +1.68 |
| Eswatini | 30 | 1.2M | +1.56 |
| Uganda | 809 | 48.6M | +1.43 |
| Malawi | 344 | 20.4M | +1.38 |
| Kenya | 788 | 55.1M | +1.29 |
| Zambia | 307 | 20.6M | +1.26 |
| Lesotho | 38 | 2.3M | +1.20 |
| Zimbabwe | 201 | 16.7M | +1.03 |
| Burkina Faso | 215 | 22.7M | +0.81 |
| Rwanda | 138 | 14.1M | +0.81 |
| Mali | 183 | 22.6M | +0.66 |
| Ghana | 261 | 33.5M | +0.65 |
| Tanzania | 460 | 65.5M | +0.59 |
| Seychelles | 1 | 0.1M | +0.46 |
| Senegal | 113 | 17.9M | +0.39 |
| Sierra Leone | 54 | 8.6M | +0.33 |
| Liberia | 31 | 5.4M | +0.21 |
| Cameroon | 133 | 28.6M | +0.12 |
| Morocco | 162 | 37.5M | +0.07 |
| Mozambique | 147 | 33.9M | +0.06 |
| Benin | 55 | 13.4M | -0.06 |
| Comoros | 4 | 0.9M | -0.19 |
| Cote d'Ivoire | 93 | 28.9M | -0.25 |
| Ethiopia | 302 | 126.5M | -0.44 |
| Algeria | 114 | 45.6M | -0.47 |
| Guinea | 35 | 14.2M | -0.57 |
| Namibia | 7 | 2.6M | -0.61 |
| Nigeria | 379 | 223.8M | -0.74 |
| Democratic Republic of Congo | 160 | 102.3M | -0.87 |
| Niger | 44 | 26.2M | -0.91 |
| Libya | 12 | 7.0M | -0.98 |
| Togo | 14 | 9.0M | -1.06 |
| Equatorial Guinea | 3 | 1.7M | -1.06 |
| Djibouti | 2 | 1.1M | -1.06 |
| Central African Republic | 9 | 5.7M | -1.08 |
| Cabo Verde | 1 | 0.6M | -1.20 |
| Sudan | 55 | 48.1M | -1.24 |
| Madagascar | 33 | 30.3M | -1.33 |
| Burundi | 13 | 13.2M | -1.49 |
| South Sudan | 8 | 11.1M | -1.82 |
| Congo (Brazzaville) | 4 | 6.1M | -1.96 |
| Chad | 9 | 18.3M | -2.16 |
| Mauritania | 2 | 4.9M | -2.45 |
| Somalia | 6 | 18.1M | -2.56 |
| Angola | 10 | 36.7M | -2.70 |
| Eritrea | 1 | 3.7M | -2.88 |
Positive residuals indicate countries with more trials than their population would predict. Negative residuals indicate underperformance. The residual captures the effect of governance, infrastructure, language, colonial history, and funding beyond what population alone explains.
| Region | Trials | Population | Trials/M | Internal Gini |
|---|---|---|---|---|
| North | 12,635 | 255M | 49.5 | 0.794 |
| West | 1,619 | 436M | 3.7 | 0.521 |
| East | 3,104 | 444M | 7.0 | 0.677 |
| Central | 382 | 165M | 2.3 | 0.638 |
| Southern | 4,370 | 143M | 30.5 | 0.790 |
Internal Gini measures inequality within each sub-region. High internal Gini means one country dominates the region (e.g., Egypt in North, South Africa in Southern). Low internal Gini means trials are more evenly distributed across the region's countries.
A coefficient of variation of 4.0 (mean = 409.4, std = 1638.7) indicates extreme variability. The median (55.0) is far below the mean (409.4), confirming a heavily right-skewed distribution dominated by a few large values.
| Method | Origin | What It Measures | Result |
|---|---|---|---|
| Gini Coefficient | Economics (Corrado Gini, 1912) | Inequality of distribution | 0.857 |
| Shannon Entropy | Information Theory (Claude Shannon, 1948) | Diversity / evenness | 2.83 bits (49.2% of max) |
| HHI | Antitrust Economics | Market concentration | 0.3152 (equiv. 3.2 countries) |
| Pareto Analysis | Management Science (Vilfredo Pareto) | 80/20 concentration | 6 countries = 80% of trials |
| Zipf's Law | Linguistics / Complex Systems (George Zipf) | Rank-size regularity | slope = -2.11, R² = 0.842 |
| Benford's Law | Number Theory (Frank Benford, 1938) | First-digit naturalness | MAD = 0.0297, χ² = 6.35 |
| Log-Linear Regression | Statistics (OLS) | Population → Trial scaling | β = 0.925, R² = 0.485 |
| Coefficient of Variation | Descriptive Statistics | Relative variability | CV = 4.0 |
| Regional Gini | Decomposition Analysis | Within-region inequality | 5 sub-regions |
| Z-Score Outliers | Standardisation | Extreme values | 1 outlier(s) |