# Research Archetypes & Cluster Mapping

In machine learning applied to research systems, does cluster analysis of trial features reveal distinct research archetypes that differ between African and European portfolios? This audit applied K-Means clustering to enrollment size, phase distribution, and endpoint count for 23,873 African and 142,126 European trials using ClinicalTrials.gov metadata. Investigators identified dominant research archetypes and reported their regional distribution as the primary estimand. African trials clustered eighty percent into a high-volume late-phase validation archetype characterised by large enrollment and Phase 3 dominance, mirroring patterns in India and Brazil. European trials showed three balanced clusters including early-phase discovery (forty-two percent), mixed-phase development, and late-phase validation. The archetype homogeneity of African research limits its capacity for the diversified scientific discovery that drives therapeutic innovation. These findings demonstrate that Africa's research portfolio is structurally optimised for confirming rather than creating medical knowledge. Interpretation is limited by the feature selection and cluster count which influence archetype identification.

## References

1. Alemayehu C, et al. "Behind the mask of the African clinical trials landscape." Trials. 2018;19:519.
2. Drain PK, et al. "Global migration of clinical trials." Nat Rev Drug Discov. 2018;17:765-766.

## Note Block

- Type: research
- App: https://mahmood726-cyber.github.io/africa-e156-students/health-disease/dashboards/research-archetypes.html
- Code: https://github.com/mahmood726-cyber/africa-e156-students/blob/master/health-disease/code/research-archetypes.py
- Data: ClinicalTrials.gov API v2
- Date: 2026-04-05
