A new UW model can help narrow down which genetic mutations affect how genes splice and contribute to disease.
Between any two people, there are likely to be at least 10 million differences in the genetic sequence that makes up their DNA.
Most of these differences don’t alter the way cells behave or cause health problems. But some genetic variations greatly increase the likelihood that a person will develop cancer, diabetes, colour-blindness or a host of other diseases.
Despite rapid advances in our ability to map an individual’s genome — the precise coding that makes up his or her genes — we know much less about which mutations or anomalies actually cause disease.
Now, a new model and publicly available Web tool developed by University of Washington researchers can more accurately and quantitatively predict which genetic mutations significantly change how genes splice and may warrant increased attention from disease researchers and drug developers.
The model — the first to train a machine learning algorithm on vast amounts of genetic data created with synthetic biology techniques — is outlined in a paper recently published.
“Some people have variations in a particular gene, but what you really want to know is whether those matter or not,” said lead author Alexander Rosenberg, a UW electrical engineering doctoral student. “This model can help you narrow down the universe — hugely — of the mutations that might be most likely to cause disease.”
In particular, the model predicts how these genetic sequence variations affect alternative splicing — a critical process that enables a single gene to create many different forms of proteins by including or excluding snippets of RNA.
“This is an avenue that’s unexplored to a large extent,” said Rosenberg. “It’s fairly easy to look at how mutations affect proteins directly, but people have not been able to look at how mutations affect proteins through splicing.”
For example, a scientist studying the genetic underpinnings of lung cancer or depression or a particular birth defect could type the most commonly shared DNA sequence in a particular gene into the Web tool, as well as multiple variations. The model will tell the scientist which mutations cause outsized differences in how the gene splices — which could be a sign of trouble — and which have little or no effect.
The researcher would still need to investigate whether a particular genetic sequence causes harmful changes, but the online tool can help rule out the many variations that aren’t likely to be of interest to health researchers. To validate the model’s predictive powers, the UW team tested it on a handful of well-understood mutations such as those in the BRCA2 gene that have been linked to breast and ovarian cancer.
Compared to previously published models, the UW approach is roughly three times more accurate at predicting the extent to which a mutation will cause genetic material to be included or excluded in the protein-making process — which can change how those proteins function and cause biological processes to go awry. University of Washington