Novel analyses improve identification of cancer associated genes from microarray data
Dartmouth Institute for Quantitative Biomedical Sciences (iQBS) researchers developed a new gene expression analysis approach for identifying cancer genes. The study results challenge the current paradigm of microarray data analysis and suggest that the new method may improve identification of cancer-associated genes.
Typical microarray-based gene expression analyses compare gene expression in adjacent normal and cancerous tissues. In these analyses, genes with strong statistical differences in expression are identified. However, many genes are aberrantly expressed in tumours as a byproduct of tumorigenesis. These ‘passenger’ genes are differentially expressed between normal and tumour tissues, but they are not ‘drivers’ of tumorigenesis. Therefore, better analytical approaches that enrich the list of candidate genes with authentic cancer-associated ‘driver’ genes are needed.
Lead authors of the study, Ivan P. Gorlov, PhD, Associate Professor of Community and Family Medicine and Christopher Amos, PhD, Professor of Community and Family Medicine and Director of the Center for Genomic Medicine described a new method to analyse microarray data. The research team demonstrated that ranking genes based on inter-tumour variation in gene expression outperforms traditional analytical approaches. The results were consistent across four major cancer types: breast, colorectal, lung, and prostate cancer.
The team used text-mining to identify genes known to be associated with breast, colorectal, lung, and prostate cancers. Then, they estimated enrichment factors by determining how frequently those known cancer-associated genes occurred among the top gene candidates identified by different analysis methods. The enrichment factor described how frequently cancer associated genes were identified compared to the frequency of identification that one could expect by pure chance. Across all four cancer types, the new method of selecting candidate genes based on inter-tumour variation in gene expression outperformed the other methods, including the standard method of comparing mean expression in adjacent normal and tumour tissues. Dr. Gorlov and colleagues also used this approach to identify novel cancer-associated genes.
The authors cite tumour heterogeneity as the most likely reason for the success of their variance-based approach. The method is based on the knowledge that different tumours can be driven by different subsets of cancer genes. By identifying genes with high variation in expression between tumours, the method preferentially identifies genes specifically associated with cancer. This same feature, tumour heterogeneity, may reduce the ability to identify critical gene expression changes when comparing mean gene expression in adjacent tumor and normal tissues, as tumors of the same type may have different sets of genes differentially expressed.
The results of the study challenge the model that comparing mean gene expression in adjacent normal and cancer tissues is the best approach to identifying cancer-associated genes. Indeed, the team identified high variation in adjacent ‘normal’ tissue samples, which are typically used as control samples for comparison in analyses based on mean gene expression. The study suggests that methods based on variance may help get the most from existing and future global gene expression studies. Dartmouth Institute for Quantitative Biomedical Sciences