The development of molecular biology has driven a revolution in our understanding of disease at the molecular level, greatly improving diagnosis, prognosis and therapeutic decision-making. The study of both data and physical specimens deposited in biobanks is crucial for this. However, the vast majority of biobanks are based in Europe and North America, and the samples and data will be a reflection of the local populations. As we know, what you get out depends on what you put in, so the data and knowledge obtained may not be accurate for underrepresented populations, such as Africans who also have the oldest and therefore most diverse genetics in the world. CLI caught up with Delali Attipoe (Chief Operating Officer of 54gene) and Marilyn Matz (Chief Executive Officer of Paradigm4) to find out what they are doing to redress this imbalance.
What is precision medicine and why is it important?
Marilyn Matz (MM):
Precision medicine is an emerging, data-driven approach for disease treatment and prevention that takes into account individual variability in genes, environment, and behaviour for each person. This approach will allow doctors and researchers to predict more accurately which treatment and prevention strategies for a particular disease will work in which groups of people, i.e. ‘the right treatment to the right patient at the right dose and the most beneficial time’. Of course, diagnosis and clinical practice have always been based on this broad goal but this was an empirical process until the advent of the deep genetic, biochemical and molecular data that can now be used in making medical decisions about an individual.
Delali Attipoe (DA): There are so many examples of conditions that have benefitted from a precision medicine approach. A universal example that stands out for me is blood transfusions and bone marrow treatments, with blood types and bone marrow being characterized and matched to the right patient to ensure safety and efficacy. And an early, and well-known ‘genetic data-driven’ example is breast cancer treatment with trastuzumab (Herceptin), where a molecular diagnostic test that assesses HER2 (human epidermal growth factor receptor 2) status offers a reliable predictor of treatment success.
MM: Today, the role of human genetic information to drive decisions in pharmaceutical R&D is firmly established. However, until single cell analysis came along, researchers were looking at an aggregated picture, the ’omics of a whole tissue system, rather than that of a single cell type. Now, single cell analysis has become a major focus of interest and is widely seen as the ‘game changer’ – with the potential to take precision medicine to the next level by adding ‘right cell or cell types’, into the mix.
What are biobanks and why are they useful in the development of precision medicine?
MM: Biobanks originally offered a repository for a variety of physical samples that could be accessed for research purposes. Today, they are also the home to massive, curated data sets of ’omics data about individuals and populations that have been derived from these samples along with extremely rich phenotype and healthcare data. The purpose is the same – to allow clinical researchers to test hypotheses of disease pathology, to look for genetic links that can be exploited for biomarkers or new drug targets, or to expand and find new uses for drugs that are already on the market . To give a little context and scale, the UK Biobank has 7400 categories of phenotypes along with single nucleotide polymorphism (SNP) and whole exome sequencing (WES) data from 500¦000 participants – an important milestone in the availability of population health data. They are now embarking on whole genome sequencing and proteomics. Today most countries have or are creating national biobanks, of which 54gene, with head offices in Nigeria and Washington, DC, USA, is one of the newest with great potential for understanding human genomic variation.
It is no surprise that biobanks have played a part in COVID-19 research. For example, biobank data has been used in a COVID-19 antibody study to determine the extent of past infection rates across the UK . The study found that 99% of its participants who had tested positive for previous infection retained antibodies to SARS-CoV-2 for 3 months after being infected – an early indication that any vaccine produced that stimulated antibody production was likely to offer protection against infection.
DA: One thing that the COVID-19 pandemic has shown us is that to be truly beneficial and impactful, genomic research needs to be done at scale, and address the needs of the global population. Biobanks typically hold information from a relatively restricted population with, for example, fewer than 3% of the genomes analysed to date coming from Africans, who offer the largest genetic diversity . This needs to change. We must represent the wider population to really see the effects of precision medicine across the world, and we believe biobanks are key to supporting this goal.
What information can be obtained from biobanks?
MM: As we touched upon earlier, genetic information such as whole genome sequencing, WES and SNP, for example, is aggregated with a range of other data on the same individuals: health records such as GP data, hospitalizations, diagnoses, prescriptions, MRI scans, lab results from biochemistry and haematology, as well as patient reported data such as family history, behavioural history, and socio-demographics. These data are used in computing large association analyses that associate specific genetic variations and specific phenotypes with susceptibility to or protection from particular diseases. Paradigm4 performs these computationally intensive analyses at record price-performance levels such as one billion linear regressions in less than an hour. These methods uncover genetic markers that are candidates for drug targets and biomarkers that can be used to predict the presence of a disease. This information is made available to medical research projects and, depending on the project, different biobanks will offer different opportunities and restrictions for using the data. Companies using biobanks must adhere to the appropriate regulatory and ethical standards when they access data.
How is this information obtained and what are the challenges?
MM: We are talking about a lot of complex data here. Definitely within the definition of ‘Big Data’. As a result, rapid, cost effective and scalable data analytics and management approaches are becoming increasingly necessary for the data users to gain maximum advantages from the data they extract.
Current data storage, analytics platforms and scalable computing approaches are often slow and laborious, requiring computer scientists and ‘data wranglers’ to help researchers find answers to their questions in the data. Disruptive technologies, such as our SciDBTM scientific computing platform, make it much easier to ask and answer research questions by addressing the challenges of population-scale data management and optimizing the cost for biobank data analysis.
54gene faces the up-front challenges of specifying, collecting and curating data from many clinics across the countries in Africa as they build their African Health Information Ecosystem, a more challenging task than assembling a biobank from a single country with a national health system.
What are the challenges and limitations of biobanks for developing personalized medicine for, for example, African populations? How can these limitations be overcome?
DA: The samples typically collected by most biobanks do not come from a diverse range of populations, which limits the body of data they can offer. Presently, most genomic data used for research and development are from Europe, United Kingdom and North America, with African genomic data accounting for only a tiny fraction. As a result, information that could prove beneficial to the improvement of healthcare for all populations across the world may be missed.
Socio-cultural factors as well as lack of awareness are limitations in Africa. Africa contains more genetic diversity that any other continent because it is widely seen as ‘the cradle of humankind’. The diversity in African DNA can provide insights into human evolution as well as common diseases. By gathering insights from the African genome, we could power medical breakthroughs and drug discoveries that will advance healthcare globally. By better understanding the genetic drivers of disease, we can ensure that African people and the global community benefit from cutting-edge medical innovation developed from the insights we have generated.
MM: In addition to the dangers of not properly researching and including a genetically diverse population that Delali mentions, there are challenges with access to the diversity and scale of the data that will become available in the 54gene biobank. Most organizations struggle to provide their scientists with a unified science-ready platform for systemic analysis of biobank data – and importantly, as the storage and use of biobanks scales up in the future, the time and effort it takes to manage, analyse and run advanced algorithms cannot increase with it, signalling the need for better performing platforms that optimize price and performance.
What is the aim of 54gene regarding precision medicine and what does it envisage for the future of precision medicine?
DA: We are tackling the disparity in precision medicine by building one of the world’s richest data sets from the most genetically diverse populations. We are applying deep analytics to derive key insights from this unique data set. Our mission is to deliver on the promise of precision medicine for Africans – and the global population – by bridging the disparity gap in genomics data and our goal is to be part of the reason that new drugs are discovered.
We are re-imagining a world where precision medicine is equalized, and everyone can live healthier and longer.