COMPARATIVE ANALYSIS OF REFERENCE-BASED CELL TYPE MAPPING AND MANUAL ANNOTATION IN SINGLE CELL RNA SEQUENCING ANALYSIS

Authors

  • Larisa Goričan Centre for Human Genetics and Pharmacogenomics, Faculty of Medicine, University of Maribor, Taborska ulica 8, SI-2000 Maribor, Slovenia
  • Boris Gole Centre for Human Genetics and Pharmacogenomics, Faculty of Medicine, University of Maribor, Taborska ulica 8, SI-2000 Maribor, Slovenia
  • Gregor Jezernik Centre for Human Genetics and Pharmacogenomics, Faculty of Medicine, University of Maribor, Taborska ulica 8, SI-2000 Maribor, Slovenia
  • Gloria Krajnc Centre for Human Genetics and Pharmacogenomics, Faculty of Medicine, University of Maribor, Taborska ulica 8, SI-2000 Maribor; Department for Science and Research, University Medical Centre Maribor, Ljubljanska ulica 5, SI-2000 Maribor, Slovenia
  • Uroš Potočnik Centre for Human Genetics and Pharmacogenomics, Faculty of Medicine, University of Maribor, Taborska ulica 8, SI-2000 Maribor; Laboratory for Biochemistry, Molecular Biology and Genomics, Faculty of Chemistry and Chemical Engineering, University of Maribor, Smetanova ulica 17, SI-2000 Maribor; Department for Science and Research, University Medical Centre Maribor, Ljubljanska ulica 5, SI-2000 Maribor, Slovenia
  • Mario Gorenjak * Centre for Human Genetics and Pharmacogenomics, Faculty of Medicine, University of Maribor, Taborska ulica 8, SI-2000 Maribor, Slovenia, mario.gorenjak@um.si

DOI:

https://doi.org/10.26873/SVR-1920-2024

Keywords:

single-cell transcriptomics, peripheral blood mononuclear cells, reference mapping, cell-type annotation, immune system

Abstract

Single-cell RNA sequencing (scRNA-seq) offers unprecedented insight into cellular diversity in complex tissues like peripheral blood mononuclear cells (PBMC). Furthermore, differential gene expression at a single-cell level can provide a basis for understanding the specialized roles of individual cells and cell types in biological processes and disease mechanisms. Accurate annotation of cell types in scRNA-seq datasets is, however, challenging due to the high complexity of the data. Here, we compare two cell-type annotation strategies applied to PBMCs in scRNA-seq datasets: automated reference-based tool Azimuth and unsupervised Shared Nearest Neighbor (SNN) clustering, followed by manual annotation. Our results highlight the strengths and limitations of the two approaches. Azimuth easily processed large-scale scRNAseq datasets and reliably identified even relatively rare cell populations. It, however, struggled with cell types outside its reference range. In contrast, unsupervised SNN clustering clearly delineated all the different cell populations in a sample. This makes it well suited for identifying rare or novel cell types, but the method requires time-consuming and bias-prone manual annotation. To minimize the bias, we used rigorous criteria and the collaborative expertise of multiple independent evaluators, which resulted in the manual annotation that was closely related to the automated one. Finally, pseudo-temporal analysis of the major cell types further confirmed the validity of the Azimuth and manual annotations. In conclusion, each annotation method has its merits and downsides. Our research thus highlights the need to combine different clustering and annotation approaches to manage the complexity of scRNA-seq and to improve the reliability and depth of scRNA-seq analyses.

Primerjalna analiza referenčno osnovanega mapiranja celičnih tipov in ročne anotacije pri analizi sekvenciranja RNA posamezne celice

Izvleček: Sekvenciranje RNA v posamezni celici (scRNA-seq) omogoča edinstven vpogled v celično raznolikost kompleksnih tkiv, kot so mononuklearne celice periferne krvi (PBMC). Dodatno je diferencialno izražanje genov na ravni posameznih celic lahko osnova za razumevanje specializiranih vlog posameznih celic in celičnih tipov v bioloških procesih in bolezenskih mehanizmih. Zaradi velike kompleksnosti pa je točna določitev celičnih tipov v zbirkah podatkov scRNA-seq zahtevna. V članku primerjamo dve strategiji določanja celičnih tipov, ki se uporabljata za PBMC v zbirkah podatkov scRNA-seq: avtomatizirano, na referenčnih bazah podatkov temelječe orodje »Azimuth« in nenadzorovano razvrščanje v grozde »Shared Nearest Neighbour« (SNN), ki mu sledi ročno določanje celičnih tipov. Naši rezultati poudarijo prednosti in omejitve obeh pristopov. »Azimuth« je zlahka obdelal obsežne podatkovne nize scRNAseq in zanesljivo prepoznal tudi razmeroma redke populacije celic. Imel pa je težave s celičnimi tipi izven svojega referenčnega območja. Nasprotno je nenadzorovano razvrščanje SNN jasno razmejilo vse različne celične populacije v vzorcu. Metoda SNN je zato zelo primerna za prepoznavanje redkih ali novih tipov celic, vendar zahteva dolgotrajno ročno določanje celičnih tipov, ki je nagnjeno k pristranskosti. S strogimi merili in skupnim strokovnim znanjem več neodvisnih ocenjevalcev smo to pristranskost minimalizirali. Naše ročno določanje celičnih tipov je tako le malo odstopalo od avtomatiziranega. Nazadnje je veljavnost določitve celičnih tipov z orodjem »Azimuth« in ročno metodo potrdila še psevdočasovna analiza glavnih celičnih tipov. Naša raziskava tako poudarja nujo po kombiniranju različnih pristopov razvrščanja in določanja celičnih populacij za izboljšanje zanesljivosti in globine analiz scRNA-seq.

Ključne besede: transkriptomika posamezne celice; mononuklearne celice periferne krvi; referenčno mapiranje; anotacija celičnih tipov; imunski sistem

References

1. Wang Z, Gerstein M, Snyder M. RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet 2009; 10(1): 57–63. doi: 10.1038/nrg2484

2. Stark R, Grzelak M, Hadfield J. RNA sequencing: the teenage years. Nat Rev Genet 2019; 20(11): 631–56. doi: 10.1038/s41576-019-0150-2

3. Haque A, Engel J, Teichmann SA, Lönnberg T. A practical guide to single-cell RNA-sequencing for biomedical research and clinical applications. Genome Med 2017; 9(1): 75. doi: 10.1186/s13073-017-0467-4

4. Wagner A, Regev A, Yosef NC. Revealing the vectors of cellular identity with single-cell genomics. Nat Biotechnol 2016; 34(11): 1145–60. doi: 10.1038/nbt.3711

5. Tang F, Barbacioru C, Wang Y, Nordman E, Lee C, Xu N, et al. mRNA-Seq whole-transcriptome analysis of a single cell. Nat Methods 2009; 6(5):3 77–82. doi: 10.1038/nmeth.1315

6. Wang T, Li B, Nelson CE, Nabavi S. Comparative analysis of differential gene expression analysis tools for single-cell RNA sequencing data. BMC Bioinformatics 2019; 20(1): 40. doi: 10.1186/s12859-019-2599-6

7. Poirion OB, Zhu X, Ching T, Garmire L. Single-cell transcriptomics bioinformatics and computational challenges. Front Genet 2016; 7: 163. doi: 10.3389/fgene.2016.00163

8. Hao Y, Hao S, Andersen-Nissen E, et al. Integrated analysis of multimodal single-cell data. Cell 2021; 184(13): 3573–7, e29. doi: 10.1016/j.cell.2021.04.048

9. Stuart T, Butler A, Hoffman P, et al. Comprehensive integration of single-cell data. Cell 2019; 177(7): 1888–902, e21. doi: 10.1016/j.cell.2019.05.031

10. Butler A, Hoffman P, Smibert P, Papalexi E, Satija R. Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat Biotechnol 2018; 36(5): 411–20. doi: 10.1038/nbt.4096

11. The Human Protein Atlas. Stockholm: Affinity proteomics, 2023. https://www.proteinatlas.org/ (18. 11. 2023)

12. EMBL-EBI. Single cell expression atlas. Hinxton: European Molecular Biology Laboratory, 2023. https://www.ebi.ac.uk/gxa/sc/home (5. 12. 2023)

13. 10x Genomics Datasets. Pleasanton: 10x Genomics, 2023. https://www.10xgenomics.com/datasets?query=&page=1&configure%5BhitsPerPage%5D=50&configure%5BmaxValuesPerFacet%5D=1000 (3. 7. 2023)

14. R Foundation. The R project for statistical computing. Wien: The R Foundation, 2023. https://www.r-project.org/ (3. 7. 2023)

15. McGinnis CS, Murrow LM, Gartner ZJ. Doubletfinder: doublet detection in single-cell RNA sequencing data using artificial nearest neighbors. Cell Syst 2019; 8(4): 329–37, e4. doi: 10.1016/j.cels.2019.03.003

16. Subramanian A, Alperovich M, Yang Y, Li B. Biology-inspired data-driven quality control for scientific discovery in single-cell transcriptomics. Genome Biol 2022; 23(1): 267. doi: 10.1186/s13059-022-02820-w

17. Hafemeister C, Satija R. Normalization and variance stabilization of single-cell RNA-seq data using regularized negative binomial regression. Genome Biol 2019; 20(1): 296. doi: 10.1186/s13059-019-1874-1

18. Zappia L, Oshlack A. Clustering trees: a visualization for evaluating clusterings at multiple resolutions. Gigascience 2018; 7(7): giy083. doi: 10.1093/gigascience/giy083

19. Waltman L, van Eck NJ. A smart local moving algorithm for large-scale modularity-based community detection. Eur Phys J B. 2013; 86(11): 471. doi: 10.1140/epjb/e2013-40829-0

20. Lu S, Li J, Song C, Shen K, Tseng GC. Biomarker detection in the integration of multiple multi-class genomic studies. Bioinformatics 2010; 26(3): 333–40. doi: 10.1093/bioinformatics/btp669

21. Uhlen M, Karlsson MJ, Zhong W, et al. A genome-wide transcriptomic analysis of protein-coding genes in human blood cells. Science 2019 ; 20; 366(6472): eaax9198. doi: 10.1126/science.aax9198

22. Monaco G, Lee B, Xu W, et al. RNA-seq signatures normalized by mrna abundance allow absolute deconvolution of human immune cell types. Cell Rep 2019; 26(6): 1627–40, e7. doi: 10.1016/j.celrep.2019.01.041

23. Qiu X, Mao Q, Tang Y, et al. Reversed graph embedding resolves complex single-cell trajectories. Nat Methods 2017; 14(10): 979–82. doi: 10.1038/nmeth.4402

24. Trapnell C, Cacchiarelli D, Grimsby J, et al. The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells. Nat Biotechnol 2014; 32(4): 381–6. doi: 10.1038/nbt.2859

25. Cao J, Spielmann M, Qiu X, et al. The single-cell transcriptional landscape of mammalian organogenesis. Nature 2019; 566(7745): 496–502. doi: 10.1038/s41586-019-0969-x

26. Li X, Wang CY. From bulk, single-cell to spatial RNA sequencing. Int J Oral Sci 2021; 13(1): 36. doi: 10.1038/s41368-021-00146-0

27. Lähnemann D, Köster J, Szczurek E, et al. Eleven grand challenges in single-cell data science. Genome Biol 2020; 21(1): 31. doi: 10.1186/s13059-020-1926-6

28. Pasquini G, Rojo Arias JE, Schäfer P, Busskamp V. Automated methods for cell type annotation on scRNA-seq data. Comput Struct Biotechnol J 2021; 19: 961–9. doi: 10.1016/j.csbj.2021.01.015

29. Abdelaal T, Michielsen L, Cats D, et al. A comparison of automatic cell identification methods for single-cell RNA sequencing data. Genome Biol 2019; 20: 194. doi: 10.1186/s13059-019-1795-z

30. Cheng Y, Fan X, Zhang J, Li Y. A scalable sparse neural network framework for rare cell type annotation of single-cell transcriptome data. Commun Biol 2023; 6: 545. doi: 10.1038/s42003-023-04928-6

31. Flórez-Grau G, Escalona JC, Lacasta-Mambo H, et al. Human dendritic cell subset isolation by magnetic bead sorting: a protocol to efficiently obtain pure populations. Bio Protoc 2023; 13(20): e4851. doi: 10.21769/BioProtoc.4851

32. Nishide M, Nishimura K, Matsushita H, et al. Single-cell multi-omics analysis identifies two distinct phenotypes of newly-onset microscopic polyangiitis. Nat Commun 2023; 14(1): 5789. doi: 10.1038/s41467-023-41328-0

33. Bonne-Année S, Bush MC, Nutman TB. Differential Modulation of Human Innate Lymphoid Cell (ILC) Subsets by IL-10 and TGF-β. Sci Rep. 20191004th ed. 2019 Oct;9(1):14305.

34. Bej S, Galow AM, David R, Wolfien M, Wolkenhauer O. Automated annotation of rare-cell types from single-cell RNA-sequencing data through synthetic oversampling. BMC Bioinformatics 2021; 22(1): 557. doi: 10.1186/s12859-021-04469-x

35. Guo H, Li J. scSorter: assigning cells to known cell types according to marker genes. Genome Biol 2021; 22(1): 69. doi: 10.1186/s13059-021-02281-7

36. Andreatta M, Corria-Osorio J, Müller S, Cubas R, Coukos G, Carmona SJ. Interpretation of T cell states from single-cell transcriptomics data using reference atlases. Nat Commun 2021; 12(1): 2965. doi: 10.1038/s41467-021-23324-4

37. Bendall SC, Davis KL, Amir el-AD, et al. Single-cell trajectory detection uncovers progression and regulatory coordination in human B cell development. Cell. 2014; 157(3): 714–25. doi: 10.1016/j.cell.2014.04.005

38. Yao C, Sun HW, Lacey NE, et al. Single-cell RNA-seq reveals TOX as a key regulator of CD8+ T cell persistence in chronic infection. Nat Immunol 2019; 20(7): 890–901. doi: 10.1038/s41590-019-0403-4

39. Wan H, Chen L, Deng M. scEMAIL: universal and source-free annotation method for scRNA-seq data with novel cell-type perception. Genomics Proteomics Bioinformatics 2022; 20(5): 939–58. doi: 10.1016/j.gpb.2022.12.008

40. Ji X, Tsao D, Bai K, Tsao M, Xing L, Zhang X. scAnnotate: an automated cell-type annotation tool for single-cell RNA-sequencing data. Bioinform Adv 2023; 3(1): vbad030. doi: 10.1093/bioadv/vbad030

41. Nguyen V, Griss J. scAnnotatR: framework to accurately classify cell types in single-cell RNA-sequencing data. BMC Bioinformatics 2022; 23(1): 44. doi: 10.1186/s12859-022-04574-5

42. Yao Z, Liu H, Xie F, et al. A transcriptomic and epigenomic cell atlas of the mouse primary motor cortex. Nature 2021; 598(7879): 103–10. doi: 10.1038/s41586-021-03500-8

Downloads

Published

2024-12-31

How to Cite

Goričan, L., Boris Gole, Jezernik, G., Krajnc, G., Potočnik, U., & Gorenjak, M. (2024). COMPARATIVE ANALYSIS OF REFERENCE-BASED CELL TYPE MAPPING AND MANUAL ANNOTATION IN SINGLE CELL RNA SEQUENCING ANALYSIS. Slovenian Veterinary Research, 61(4), 245–61. https://doi.org/10.26873/SVR-1920-2024

Issue

Section

Original Research Article