Scientists develop GPU-powered AI tool to analyse single-cell gene data
A new machine learning algorithm developed by St. Jude Children’s Research Hospital researchers enables faster and more accurate analysis of large-scale single-cell gene expression datasets, potentially accelerating discoveries in cancer and other diseases.
Novel approach to handling vast datasets
Scientists have developed a breakthrough computational method that overcomes key limitations in analysing the exponentially growing volumes of single-cell genomic data. The new tool, called CSI-GEP (Consensus and Scalable Inference of Gene Expression Programs), leverages graphics processing units (GPUs) and unsupervised machine learning to process and interpret single-cell RNA sequencing data more efficiently and accurately than existing methods.
The research, published in Cell Genomics, demonstrates how CSI-GEP can effectively analyse datasets containing millions of individual cells while avoiding the biases and contradictory findings that often emerge from current analytical approaches.
Addressing a growing challenge
“We’ve implemented a new toolset that can be scaled as these single-cell RNA sequencing datasets continue to grow,” said corresponding author Paul Geeleher, PhD, from St. Jude Department of Computational Biology. “There has been an exponential explosion in the compute time for single-cell analysis, and our method brings accurate analysis back into a tractable timeframe.”
The increasing adoption of single-cell analysis techniques has generated vast repositories of genomic data that offer unprecedented insights into cellular biology. However, existing computational methods struggle to handle such large volumes of data efficiently, often forcing researchers to make compromises that can introduce biases into their analyses.
The team’s solution employs GPUs – specialized processors typically used for graphics rendering – to handle the intensive computational requirements of single-cell data analysis. “We created a method that uses graphics processing units or GPUs,” explained first author Xueying Liu, PhD. “The GPU integration gave us the processing power to perform the computational load in a scalable way.”
CSI-GEP utilises unsupervised machine learning to automatically determine parameters for analysis, removing potential biases from manual parameter selection. “Our method uses unsupervised machine learning, which automatically determines more robust and less arbitrary parameters for the analysis,” Liu said. “It learns how to group cells based on their different active biological processes or cell type identities.”
When tested against existing methods using both simulated and real datasets, CSI-GEP demonstrated superior accuracy and efficiency. The researchers validated the tool using various datasets, including a comprehensive mouse brain atlas containing 2.2 million cells. Notably, CSI-GEP identified previously unrecognised cell types and biological processes that other methods had missed.
The authors note in their paper that CSI-GEP’s methodological advances could have applications beyond single-cell RNA sequencing, potentially extending to other data modalities like spatial transcriptomics and multi-modal data analysis.
Democratising access
The researchers have made CSI-GEP freely available to the scientific community through GitHub < https://github.com/geeleherlab/CSI-GEP >, enabling widespread adoption and potential improvements by other researchers. This open-source approach aligns with the team’s goal of accelerating biological research through improved computational tools.
Reference:
Liu, X., Chapple, R. H., Bennett, D., et. al. (2025). CSI-GEP: A GPU-based unsupervised machine learning approach for recovering gene expression programs in atlas-scale single-cell RNA-seq data. Cell Genomics, 5(1), 100739. https://doi.org/10.1016/j.xgen.2024.100739
Paul Geeleher, PhD, St. Jude Department of Computational Biology
Credit: St. Jude Children’s Research Hospital