The past decade has seen swathes of biomedical data collected across Africa1,2. These include, for example, demographic, health and environmental data from the INDEPTH Network, which is revealing links between climate change and disease; and information about genetic diversity and disease risk factors in African populations from the Human Heredity and Health in Africa (H3Africa) initiative. But the promise that these and other data sets hold for improving health care and scientific innovation is at risk of being squandered, unless the data can be made more accessible.
As a group of African bioethics and data-science researchers with expertise in bioinformatics, genomics and decolonization of health research, we are well aware of these challenges in African data science. Consider the H3Africa initiative, which between 2010 and 2022 ran 51 projects aiming to improve health in Africa, funded by the US National Institutes of Health and the UK biomedical funder Wellcome. The projects ranged from studying the genomics of schizophrenia in the Xhosa people of South Africa to analysing how microorganisms in the nasal passages affect respiratory disease in African children. The H3Africa initiative overall has generated almost 1,000 whole-genome sequences and information about the traits they encode from at least 50 ethnolinguistic groups3. But in some cases, difficulties with logistics, costs and infrastructure led to delays in data collection, which in turn prevented groups from depositing their data before funding expired.
How to meet Africa’s grand challenges with African know-how
Many Africa-based researchers seem to prefer sharing data informally with trusted peers or as part of projects or collaborations on grant applications, rather than securing them formally from a repository4–7. For example, data from 23,421 biological samples collected across 16 H3Africa studies can be accessed through requests to a data and biospecimen access committee. But, according to one study4, the committee received just 28 data-access requests between December 2018 and June 2023, of which only 6 came from Africa. Of the remainder, 20 were from the United States, United Kingdom and Europe combined, one came from Asia and one from South America.
Here, we set out how such data hoarding can be combated by developing a ‘social contract’ for responsible data stewardship. This will require building up trust between researchers, commercial partners, governments, funders and the public by providing guarantees that data will be used in the best interests of African researchers, communities and research participants.
Hoarding habit
Data sharing is not an ingrained habit in most African countries. Many researchers are wary. For example, in a 2019 survey of 129 postgraduate students and researchers at South African universities, roughly three-quarters said that they would be happy to use external data sets for their research, yet only half would consider sharing their data5. Scientists in other African nations show similar attitudes6,7.
According to surveys, reasons include fears about research ideas being scooped and about inadequate reward systems for sharers6,8 — as is the case elsewhere. But, in Africa, much of this reluctance can be traced back to historical and structural factors.
African nations too often lack control over what happens to the data they generate. Research into HIV/AIDS, for instance, has been led by teams in a few countries, such as Kenya, Uganda, Cameroon and South Africa, that are supported by institutions in high-income nations. These collaborations, although valuable, expose a deep power imbalance — decisions about data storage, secondary use and long-term access frequently rest with the partners outside Africa.
Similarly, digital health platforms on mobile devices that aim to improve maternal health care, vaccine tracking and infectious-disease surveillance across Africa — often introduced by international non-governmental organizations and technology companies — typically rely on cloud-based data-storage infrastructure hosted by foreign firms (see go.nature.com/4g2tz2h). There are few safeguards to ensure that governments or communities in Africa retain meaningful oversight of the data.
Open science, done wrong, will compound inequities
Even flagship programmes that were designed to empower African science can fall short. Consortia such as H3Africa and the Malaria Genomic Epidemiology Network have built genomics capacities on the continent and have invested in the development of equitable data-sharing policies. But both store data in public repositories outside Africa — leading to the risk that Africa remains a data donor rather than a data power.
Technical limitations can also encourage data hoarding. Africa holds less than 1% of the global infrastructure capacity for digital data. Much of this infrastructure is concentrated in a few institutions in South Africa, Nigeria and Kenya, with others lacking the means to manage, store and share data in ways that comply with the requirements of research funders, or with evolving national and international regulatory standards. And there is often a lack of staff trained in data management and analytics across African scientific institutions.
These weaknesses can lead researchers in Africa to conclude that there are limited benefits to sharing data. Added to this are worries about legal compliance if researchers need to deposit data outside their own country, as well as the risks of their research being scooped by other scientists who have more resources — African or international — if the data are made public.
Data hoarding might be exacerbated by requirements for informed consent and evolving data-protection laws. In many African countries, such laws emphasize the principles of specifying a purpose for the data, limiting its storage, adhering to data minimization (collecting, processing and storing only those data that are necessary for the defined purpose) and giving participating individuals the right to restrict processing and have autonomy over their health and genetic data9. Although some of the laws include exceptions for scientific research, these are often defined ambiguously. This ethical and legal uncertainty can lead to researchers not sharing data — even if research participants give broad consent for their data to be shared — for fear of breaching legal requirements. This is a particular problem in the absence of clear mechanisms for participants to withdraw consent or to provide consent for secondary uses.
A social contract
There are signs that attitudes are shifting. A 2022 survey of 160 scientists engaged in data-intensive research in Africa showed that 88% would be willing to share data, provided there were robust governance mechanisms in place7. Respondents also said they would exchange data if there were opportunities for tangible benefits such as co-authorship (87%), shared royalties (53%) and collaboration on projects (78%).
What is needed now is a system-wide move towards responsible data stewardship that takes these expectations into account. In our view, trust can best be gained by establishing a ‘social contract’ in which data sharing becomes the collective responsibility of researchers, study communities, funders and research institutions. A social contract frames data sharing not simply as a mandate from funders, but as a moral and social commitment to equity, transparency, shared responsibility and shared benefit. Similar thinking was behind the development of the Bermuda principles for DNA sequence data sharing10 and the Fort Lauderdale Agreement11, which laid the foundations of the importance of open data sharing in genomics.

Saliva samples in a tuberculosis research laboratory in Benin.Credit: Yanick Folly/AFP/Getty
The social contract we propose involves four strands that together define an ethical and operational framework for using data.
Clear commitments to sharing and rewards. All funders should demand plans for data sharing as a part of grant proposals. They should follow up to ensure that researchers can and do share data. When institutional infrastructure is lacking, funders could offer technical support or conditional infrastructure grants to enable compliance. They should also introduce clear accountability measures, including funding restrictions or ineligibility for future grants, for individuals or institutions that consistently fail to share data without adequate justification. Commercial and international partners should also commit to data sharing in collaboration agreements.
Universities and other research institutions should consider data-sharing practices as part of their promotion criteria, as they do with publication and teaching metrics. This could extend to criteria for world university rankings, such as those generated by Times Higher Education and QS, which could incorporate open-science and data-sharing metrics to encourage institutional leaders to take the practice seriously.
Benefit-sharing agreements. Academic, commercial and community partners in data-science projects should together agree on fair benefit-sharing mechanisms. To the best of our knowledge, such agreements are rare in commercial collaborations involving African research data, and formal mechanisms for equitable returns are lacking12. For instance, if academics in Africa collaborate with international partners, they might agree to technology-transfer mechanisms, such as capacity-building programmes or investments in setting up and maintaining secure data infrastructure in Africa.
Collaborations with commercial entities might involve a commitment that any diagnostics, treatments and tools developed using African data are made available to African communities at fair prices. Revenue-sharing agreements could ensure that African researchers (and, where appropriate, communities) receive a share of profits for any commercial products developed using their data. And social-investment clauses, such as contractual commitments that require reinvestment of a percentage of revenue into local scientific infrastructure or public health, could help to build long-term sustainability and public trust in science.
Interoperability. Data-science funders and governments should invest in making data systems interoperable, which would enhance collaboration across borders and between institutions. Some efforts to do this are already under way.
The Data Science for Health Discovery and Innovation in Africa initiative, for instance, is working to develop an interoperable open-data science platform called eLwazi. The initiative hosts training workshops and is developing a data catalogue to help ensure that clinical, climate and genomics data sets in research projects in Africa adhere to FAIR data principles (meaning that data should be findable, accessible, interoperable and reusable).