Normalization and batch correction are critical steps in processing single-cell RNA sequencing (scRNA-seq) data, which remove technical effects and systematic biases to unmask biological signals of interest. Although a number of computational methods have been developed, there is no guidance for choosing appropriate procedures in different scenarios. In this study, we assessed the performance of popular scRNA-seq noise reduction procedures in multiple scenarios using simulated and real datasets. The scenarios accounted for multiple biological and technical factors that greatly affect the denoising performance, including relative magnitude of batch effects, the extent of cell population imbalance, the complexity of cell group structures, the proportion and the similarity of nonoverlapping cell populations, dropout rates and variable library sizes. We used multiple quantitative metrics and visualization of low-dimensional cell embeddings to evaluate the performance on batch mixing while preserving the original cell group and gene structures. Based on our results, we specified technical or biological factors affecting the performance of each method and recommended proper methods in different scenarios. In addition, we highlighted one challenging scenario where most methods failed and resulted in overcorrection.
Dr. Shih-Kai Chu is a postdoctoral research fellow at the Institute of Statistical Science in Academia Sinica since 2021, and he received his Ph.D. degree in Bioinformatics at the Taiwan International Graduate Program (TIGP) in 2017. His research focuses on applying statistical methods to look into interindividual variations and their links to adverse reactions and poor responses, and the results have showed interethnic effects on pharmaco-epigenetics. In addition, he studies computational algorithms to prioritize candidate pathogenic variants, and the methods have been applied to clinical case studies.