Bio-Rad - Preparing for a Stress-free QC Audit

New AlphaSync database ensures protein structure predictions stay current

St. Jude Children’s Research Hospital scientists have launched AlphaSync, a free database addressing a critical gap in structural biology by continuously synchronising predicted protein structures with the latest sequence data from UniProt. The resource currently provides 2.6 million updated structural models across 925 species, including disease-relevant proteins previously missing from existing databases.

 

Solving the synchronisation problem in structural biology

The AlphaFold Protein Structure Database revolutionised structural biology in 2022 by providing predicted structures for nearly all known proteins. However, a fundamental challenge emerged: the database does not automatically update when protein sequences are discovered or corrected. This creates a widening gap between available structures and current biological knowledge.

The AlphaSync team, led by Benjamin Lang and M. Madan Babu at St. Jude, identified that 676 human proteins (3.3% of the human canonical reference proteome) had been newly added or updated since the AlphaFold database’s October 2022 release. These included 62 clinically important genes implicated in genetic disorders and 12 genes causally linked to cancer, such as BRCA2, MYC, AXIN2 and ATRX.

“In a rapidly evolving scientific landscape, having access to the most current and detailed information on protein structural models is essential for breakthroughs in medicine and biology,” said M. Madan Babu, St. Jude Senior Vice President of Data Science. “With AlphaSync, we ensure predicted protein structures stay continuously updated and enriched with key information such as amino acid interaction networks, surface accessibility and disorder status so that researchers can move from sequence to insight faster than ever before.”

alphasync

Comprehensive coverage across species and isoforms

AlphaSync achieves complete, up-to-date proteome coverage for 42 species, including humans, key pathogens and model organisms such as mouse, fruit fly, roundworm and yeast. The database predicted 69,118 new structures for 40,016 proteins using AlphaFold 2 on sequences from UniProt release 2025_03.

A significant develpment is the inclusion of protein isoforms – alternative versions of proteins arising from the same gene. The researchers noted a median absolute length difference of 73 amino acids between human canonical and alternative isoforms, sufficient to result in notably different structural models. One clinically relevant example is the vascular endothelial growth factor splice variant VEGF165B, which binds the KDR receptor but inhibits tumour growth rather than activating angiogenesis.

“AlphaSync performs an important job in keeping all of these predicted structures updated,” said first author Benjamin Lang. “The AlphaSync database ensures that the structure you are looking at matches the sequence of the protein you are working with.”

Enhanced annotations for functional analysis

Beyond updating structures, AlphaSync provides pre-computed residue-level annotations including solvent accessibility, dihedral angles, intrinsic disorder status and over 4.7 billion atom-level noncovalent contacts. The authors note in their Nature Structural & Molecular Biology paper: “Such information can guide protein design efforts, enable the functional interpretation of structures and inform the effects of disease-associated and phenotype-associated variants.”

The database converts complex three-dimensional structural information into a simpler two-dimensional tabular format, making it more accessible for downstream machine learning applications and biomedical research projects investigating disease mechanisms.

For antibody design, features such as solvent-accessible surface area can identify epitopes that must be exposed to be targetable. Conservation of noncovalent contacts between equivalent residues across species can help uncover determinants of protein folding, even when structurally equivalent residues are not identical.

Addressing technical challenges at scale

The AlphaSync team encountered substantial computational challenges in achieving complete proteome coverage. The 69,118 structure predictions required over 13 years of sequential multicore CPU and GPU compute time. The researchers split the AlphaFold 2 pipeline into a parallelisable CPU-only multiple-sequence alignment component and a GPU-only inference part, increasing GPU usage efficiency approximately fourfold.

To handle large proteins exceeding 2,700 amino acids ­– including titin at 34,350 amino acids – AlphaSync employs a fragmentation approach, splitting proteins into overlapping segments. The database processes these fragments into a single representation by averaging properties and discarding residues near artificial termini.

The team also developed solutions for proteins containing nonstandard amino acid characters. Rather than excluding these proteins, they implemented reasonable substitutions, enabling structural prediction for 1,668 additional proteins, including 443 human proteins.

Accessible interface and programmatic access

AlphaSync provides an intuitive web interface with an interactive Mol* structure viewer and Plotly-based predicted aligned error matrix visualisation. Users can search by protein names, gene symbols, database identifiers or sequences. When hovering over a residue in the sequence display, detailed information appears and the position is highlighted within the structure viewer.

The database also offers a REST API for programmatic access, enabling advanced users to integrate AlphaSync data into automated workflows and software. All newly predicted structures are available for download in mmCIF format, following the AlphaFold database’s file naming pattern for direct integration into existing computational pipelines.

The researchers envision updating AlphaSync with each new UniProt release, currently every two months, with each update estimated to take one to two weeks depending on the number of new structures requiring prediction.

Reference

Lang, B., Mészáros, B., Sejdiu, B. I., et. al. (2025). AlphaSync is an enhanced AlphaFold structure database synchronized with UniProt. Nature Structural & Molecular Biology. Published online 11 November 2025. https://doi.org/10.1038/s41594-025-01719-x