Machine learning assisted protein structure analysis reveals SARS-CoV-2 virus tactics
How does the SARS-CoV-2 virus manage to evade immune defence and replicate itself in patients suffering from COVID-19? To shed light on this question, an internationalresearch team has assembled the most comprehensive view to date on the precise, three-dimensional shape of each SARS-CoV-2 protein – including the well-known spike protein.
To assemble this map, the team used highthroughput machine learning, an approach that enabled them to predict structural states likely to occur in coronavirus proteins, based on states that have been seen in related proteins. The final map consists of 2,060 atomic-resolution 3D models involving the coronavirus’s 27 proteins. All structural models are freely available through the Aquaria-COVID resource: https://aquaria.ws/covid .
“This provides an unprecedented wealth of detail that will help researchers better understand the molecular mechanisms that cause COVID-19, and will help in developing therapies to fight the pandemic, for example, by identifying potential new targets for future treatments or vaccines,” says Burkhard Rost, professor of bioinformatics at the Technical University of Munich.
A second part ofthe study used a complementary approach, known as human-in-the-loop machine learning, to create a novel, visual interface that summarizes everything currently known – and not known – about the shape of SARS-CoV-2 proteins.
The visual interface, called a structural coverage map, can also be used as a navigation aid for researchers, helping them find and use structural models that relate to specific research questions. And these models have already revealed several vital clues into how coronaviruses are able to take over our cells.
Using their machine learning approaches, the team was able to identify three coronavirus proteins (NSP3, NSP13, and NSP16) that ‘mimic’ human proteins, successfully fooling a host cell into believing that they are native proteins operating in the best interest of the cell.
The modelling also revealed five coronavirus proteins (NSP1, NSP3, spike glycoprotein, envelope protein, and ORF9b protein)thatthe researchers say ‘hijack’ or disrupt processes in human cells, thereby helping the virus take control, complete its life cycle and spread to other cells.
“In analyzing these structural models, we also found new clues into precisely how the virus copies its own genome – which is the central process that allows the virus to spread rapidly in any person unlucky enough to become infected,” says Rost. “The insights from our study take us closer to understanding how the virus operates, and what we can do to stop it.”
“The longer the virus circulates, the more chances it has to mutate and form new variants such as the Delta strain,” says Professor Sean O’Donoghue, the lead author of the study from the Garvan Institute in Sydney. “Our resource will help researchers understand how new strains of the virus differ from each other – a piece of the puzzle that we hope will help in dealing with new variants as they emerge.”