CytoTRACE: predicting differentiation state of cells from single-cell RNA-sequencing data.
CytoTRACE (Cellular (Cyto) Trajectory Reconstruction Analysis using gene Counts and Expression) is a computational method that predicts the differentiation state of cells from single-cell RNA-sequencing data. CytoTRACE leverages a simple, yet robust, determinant of developmental potential—the number of detectably expressed genes per cell, or gene counts. We have validated CytoTRACE on ~150K single-cell transcriptomes spanning 315 cell phenotypes, 52 lineages, 14 tissue types, 9 scRNA-seq platforms, and 5 species.
References:
- Gunsagar S. Gulati, Shaheen S. Sikandar, Daniel J. Wesche, Anoop Manjunath, Anjan Bharadwaj,
Mark J. Berger, Francisco Ilagan, Angera H. Kuo, Robert W. Hsieh, Shang Cai, Maider Zabala,
Ferenc A. Scheeren, Neethan A. Lobo, Dalong Qian, Feiqiao B. Yu, Frederick M. Dirbas,
Michael F. Clarke, Aaron M. Newman
Single-cell transcriptional diversity is a hallmark of developmental potential.
Science 367, 405–411 (2020)
Documentation
Important Notes
- Module Name: citotrace (see the modules page for more information)
- Unusual environment variables set
- CT_HOME installation directory
- CT_BIN executables directory
- CT_SRC source code directory
- CT_DATA sample input data directory
Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.
[user@biowulf]$ sinteractive --mem=20g --gres=lscratch:20 -c4 [user@cn0861 ~]$ module load cytotrace [+] Loading singularity 3.10.5 on cn2372 [+] Loading cytotrace 0.3.4 [user@cn0861 ~]$ mkdir -p /data/$USER/CytoTrace && cd /data/$USER/CytoTraceDownload sample data:
[user@cn0861 ~]$ wget https://cytotrace.stanford.edu/dataset_marrow10x.txt [user@cn0861 ~]$ wget wget https://cytotrace.stanford.edu/dataset_marrowplate.txtRun CytoTRACE on these data:
[user@cn0861 ~]$ R-ct > library(CytoTRACE) Welcome to the CytoTRACE R package, a tool for the unbiased prediction of differentiation states in scRNA-seq data. For more information about this method, please visit https://cytotrace.stanford.edu or email us at cytotrace@gmail.com. > CytoTRACE(read.table("dataset_marrow10x.txt")) The number of cells in your dataset exceeds 3,000. CytoTRACE will now be run in fast mode (see documentation). You can multi-thread this run using the 'ncores' flag. To disable fast mode, please indicate 'enableFast = FALSE'. CytoTRACE will be run on 3 sub-sample(s) of approximately 1142 cells each using 1 / 1 core(s) Pre-processing data and generating similarity matrix... Calculating gene counts signature... Smoothing values with NNLS regression and diffusion... Calculating genes associated with CytoTRACE... ... $gcsGenes Eif5a Ybx1 Ppia Rps17 Cox8a 8.930324e-01 8.805865e-01 8.792047e-01 8.780918e-01 8.739148e-01 Snrpg Rpl28 Rps2 Atp5g2 Rpsa 8.717692e-01 8.711641e-01 8.658732e-01 8.639576e-01 8.627667e-01 ... Counts X10X_P7_2_AAACCTGCAGTAACGG X10X_P7_2_AAACGGGAGGACGAAA 2301 2627 X10X_P7_2_AAACGGGAGGTACTCT X10X_P7_2_AAACGGGAGGTGCTTT 3183 1034 ... Gm106 0.0000000 0.0000000 Rpl7 2.8272534 2.2531575 Rdh10 0.0000000 0.0000000 X10X_P7_3_TTTGTCAAGCGCTCCA X10X_P7_3_TTTGTCAAGGCAGTCA Mrpl15 0.2984039 0.0000000 Lypla1 0.2984039 0.0000000 Tcea1 0.0000000 0.0000000 Atp6v1h 0.0000000 0.0000000 Rb1cc1 0.5455397 0.1230472 Pcmtd1 0.0000000 0.0000000 Rrs1 0.5455397 0.0000000 Adhfe1 0.0000000 0.0000000 Mybl1 0.0000000 0.0000000 ... Terf1 0.0000000 Gm106 0.0000000 Rpl7 3.9047688 Rdh10 0.0000000 [ reached getOption("max.print") -- omitted 13488 rows ] Warning message: In CytoTRACE(read.table("dataset_marrow10x.txt")) : 9 genes have zero expression in the matrix and were filtered > iCytoTRACE(list(read.table("dataset_marrow10x.txt"), read.table("dataset_marrowplate.txt"))) Would you like to create a default Python environment for the reticulate package? (Yes/no/cancel) no Found 13453 genes among all datasets [[0. 0.65976072] [0. 0. ]] Processing datasets (0, 1) Found 13453 genes among all datasets [[0. 0.65976072] [0. 0. ]] Processing datasets (0, 1) The number of cells in your integrated dataset is less than 10,000. Fast mode has been disabled. CytoTRACE will be run on 1 sub-sample(s) of approximately 7869 cells each using 1 / 1 core(s) Calculating genes associated with iCytoTRACE... $exprMatrix ... X10X_P7_2_GATGAAACACATTCGA -2.893816e-03 X10X_P7_2_GATGAAAGTGACAAAT 3.646599e-03 X10X_P7_2_GATGAAAGTGCACTTA -3.304935e-03 X10X_P7_2_GATGAAAGTTACGTCA 5.292558e-03 X10X_P7_2_GATGAGGAGCACCGCT 4.832055e-03 X10X_P7_2_GATGAGGAGGTGCACA 5.018071e-03 X10X_P7_2_GATGAGGCAGTCGATT -2.733117e-04 X10X_P7_2_GATGAGGCATGGTTGT 4.190494e-03 X10X_P7_2_GATGAGGGTTGATTGC -1.724375e-02 X10X_P7_2_GATGAGGTCAACACCA 5.849512e-03 X10X_P7_2_GATGAGGTCCTCCTAG -5.341377e-03 X10X_P7_2_GATGCTAAGTCACGCC -4.335768e-03 X10X_P7_2_GATGCTACATGGGAAC 8.643952e-03 X10X_P7_2_GATGCTAGTACCTACA 3.564311e-03 X10X_P7_2_GATTCAGTCACTCCTG 1.188439e-02 X10X_P7_2_GCAAACTAGATGAGAG -2.990178e-03 X10X_P7_2_GCAAACTAGCCTTGAT 1.439598e-02 X10X_P7_2_GCAAACTGTTCTGTTT -1.817505e-02 X10X_P7_2_GCAAACTTCGACAGCC -5.349718e-03 X10X_P7_2_GCAATCACAGTCGTGC 7.415567e-03 X10X_P7_2_GCAATCATCGGAGCAA -2.271764e-02 X10X_P7_2_GCAATCATCTAACCGA -1.381307e-02 X10X_P7_2_GCACATACATGGATGG -1.717161e-02 X10X_P7_2_GCACATATCTGAGGGA 9.387268e-05 X10X_P7_2_GCACTCTAGTGCCAGA 5.458449e-03 X10X_P7_2_GCACTCTCAATGGATA -2.407448e-03 X10X_P7_2_GCACTCTGTACTTGAC 4.371174e-03 X10X_P7_2_GCAGCCAAGGCAAAGA -1.165523e-03 X10X_P7_2_GCAGCCACAAGTCATC -8.224130e-03 X10X_P7_2_GCAGCCACATGATCCA -1.164582e-02 X10X_P7_2_GCAGCCAGTAGATTAG -3.261392e-04 X10X_P7_2_GCAGCCATCGGAGCAA -2.124615e-03 X10X_P7_2_GCAGCCATCGGCTACG 1.063609e-02 X10X_P7_2_GCAGTTAAGGAGTACC -6.749588e-03 X10X_P7_2_GCAGTTACAATAACGA 4.061926e-03 X10X_P7_2_GCAGTTATCTCAACTT -5.714046e-03 X10X_P7_2_GCATACACATGGGACA 4.263237e-03 X10X_P7_2_GCATACAGTGACTACT -1.923917e-02 X10X_P7_2_GCATACATCGTGACAT 5.930570e-03 X10X_P7_2_GCATGATAGAGACTTA 2.220203e-02 X10X_P7_2_GCATGATCAGTTAACC -3.296641e-03 X10X_P7_2_GCATGATTCCGCGTTT -1.341521e-02 X10X_P7_2_GCATGATTCTGAGGGA 8.092239e-03 X10X_P7_2_GCATGCGAGAAACCAT 1.087689e-02 X10X_P7_2_GCATGCGAGGTTCCTA 3.337084e-02 X10X_P7_2_GCATGCGCACGAGAGT -4.072700e-02 X10X_P7_2_GCATGCGCATGAAGTA -5.383050e-03 X10X_P7_2_GCATGTAAGGTGACCA -8.580112e-03 X10X_P7_2_GCATGTACAAAGGCGT -7.766561e-03 X10X_P7_2_GCATGTACAGTCTTCC -3.473503e-03 X10X_P7_2_GCATGTAGTCCAGTAT -1.260405e-02 X10X_P7_2_GCATGTATCACTTATC 2.053231e-02 X10X_P7_2_GCCAAATAGATCGGGT 5.880604e-03 X10X_P7_2_GCCTCTACACTCAGGC -2.391912e-03 X10X_P7_2_GCCTCTACATAAGACA 1.152437e-02 X10X_P7_2_GCGAGAATCTTCCTTC -1.009419e-03 X10X_P7_2_GCGCAACGTAAAGGAG -5.744237e-03 X10X_P7_2_GCGCAACGTATAAACG -6.318104e-03 X10X_P7_2_GCGCAACGTGTGGTTT 3.011639e-03 X10X_P7_2_GCGCAGTAGGCTACGA 5.207578e-03 X10X_P7_2_GCGCAGTAGTCATCCA 1.798909e-02 X10X_P7_2_GCGCAGTTCCCTAATT 3.193344e-03 X10X_P7_2_GCGCCAACAGCCAATT 2.698768e-02 X10X_P7_2_GCGCCAACAGGTCTCG -1.462139e-02 X10X_P7_2_GCGCCAACATCACAAC 1.835279e-04 [ reached getOption("max.print") -- omitted 6870 rows ] $filteredCells character(0) [user@cn0861 ~]$ exit salloc.exe: Relinquishing job allocation 46116226