Finding our way around DNA

Most of us would be lost without Google maps or similar route-guidance technologies. And when those mapping tools include additional data about traffic or weather, we can navigate even more effectively. For scientists who navigate the mammalian genome to better understand genetic causes of disease, combining various types of data sets makes finding their way easier, too.

A team at the Salk Institute has developed a computational algorithm that integrates two different data types to make locating key regions within the genome more precise and accurate than other tools. The method could help researchers conduct vastly more targeted searches for disease-causing genetic variants in the human genome, such as ones that promote cancer or cause metabolic disorders.

“Most of the variation between individuals is in noncoding regions of the genome,” says senior author Joseph Ecker, a Howard Hughes Medical Institute investigator and director of Salk’s Genomic Analysis Laboratory. “These regions don’t code for proteins, but they still contain genetic variants that cause disease. We just haven’t had very effective tools to locate these areas in a variety of tissues and cell types—until now.”

Only about two percent of our DNA is made up of genes, which code for proteins that keep us healthy and functional. For many years, the other 98 percent was thought to be extraneous “junk.” But, as science has developed ever more sophisticated tools to probe the genome, it has become clear that much of that so-called junk has vital regulatory roles. For example, sections of DNA called “enhancers” dictate where and when the gene information is read out.

Increasingly, mutations or disruption in enhancers have been tied to major causes of human disease, but enhancers have been hard to locate within the genome. Clues about them can be found in certain types of experimental data, such as in the binding of proteins that regulate gene activity, chemical modifications of proteins (called histones) that DNA wraps around, or in the presence of chemical compounds called methyl groups in DNA that turn genes on or off (an epigenetic factor called DNA methylation). Typically, computational methods for finding enhancers have relied on histone modification data. But Ecker’s new system, called REPTILE (for “regulatory-element prediction based on tissue-specific local epigenomic signatures”), combines histone modification and methylation data to predict which regions of the genome contain enhancers. In the team’s experiments, REPTILE proved more accurate at finding enhancers than algorithms that rely on histone modification alone.

 “The novelty of this method is that it uses DNA methylation to really narrow down the candidate regulatory sequences suggested by histone modification data,” says Yupeng He, a Salk graduate student and first author of the paper. “We were then able to test REPTILE’S predictions in the lab and validate them with experimental data, which gave us a high degree of confidence in the algorithm’s ability to find enhancers.”

Salk Institute www.salk.edu/news-release/finding-way-around-dna/