CytoSPACE implements an optimization method for mapping individual cells from a single-cell RNA sequencing atlas to spatial expression profiles. Across diverse platforms and tissue types, it outperforms previous methods with respect to noise tolerance and accuracy, enabling tissue cartography at single-cell resolution.
Allocate an interactive session and run the program. Sample session:
[user@biowulf]$ sinteractive --mem=12g -c8 --grep=lscratch:20 [user@cig 3335 ~]$ module load cytospace [+] Loading singularity 4.0.1 on cn3335 [+] Loading cytospace 1.0.6 [user@cn3335 ~]$ cytospace -h connect localhost port 6000: Connection refused usage: cytospace [-h] -sp SCRNA_PATH -ctp CELL_TYPE_PATH [-stp ST_PATH] [-cp COORDINATES_PATH] [-srp SPACERANGER_PATH] [-stctp ST_CELL_TYPE_PATH] [-ctfep CELL_TYPE_FRACTION_ESTIMATION_PATH] [-ncpsp N_CELLS_PER_SPOT_PATH] [-o OUTPUT_FOLDER] [-op OUTPUT_PREFIX] [-mcn MEAN_CELL_NUMBERS] [--downsample-off] [-smtpc SCRNA_MAX_TRANSCRIPTS_PER_CELL] [-sc] [-noss NUMBER_OF_SELECTED_SPOTS] [-sss] [-nosss NUMBER_OF_SELECTED_SUB_SPOTS] [-nop NUMBER_OF_PROCESSORS] [-sm {lapjv,lapjv_compat,lap_CSPR}] [-dm {Pearson_correlation,Spearman_correlation,Euclidean}] [-sam {duplicates,place_holders}] [-se SEED] [-p] [-g GEOMETRY] [-nc NUM_COLUMN] [-mp MAX_NUM_CELLS_PLOT] CytoSPACE is a computational strategy for assigning single-cell transcriptomes to in situ spatial transcriptomics (ST) data. Our method solves single cell/spot assignment by minimizing a correlation-based cost function through a linear programming-based optimization routine. optional arguments: -h, --help show this help message and exit -stp ST_PATH, --st-path ST_PATH Path to spatial transcriptomics data (expressions) -cp COORDINATES_PATH, --coordinates-path COORDINATES_PATH Path to transcriptomics data (coordinates) -srp SPACERANGER_PATH, --spaceranger-path SPACERANGER_PATH Path to SpaceRanger tar.gz data file -stctp ST_CELL_TYPE_PATH, --st-cell-type-path ST_CELL_TYPE_PATH Path to ST cell type file (recommended for single-cell ST) -ctfep CELL_TYPE_FRACTION_ESTIMATION_PATH, --cell-type-fraction-estimation-path CELL_TYPE_FRACTION_ESTIMATION_PATH Path to ST cell type fraction file (recommended for bulk ST) -ncpsp N_CELLS_PER_SPOT_PATH, --n-cells-per-spot-path N_CELLS_PER_SPOT_PATH Path to number of cells per ST spot file -o OUTPUT_FOLDER, --output-folder OUTPUT_FOLDER Relative path to the output folder -op OUTPUT_PREFIX, --output-prefix OUTPUT_PREFIX Prefix of results stored in the 'output_folder' -mcn MEAN_CELL_NUMBERS, --mean-cell-numbers MEAN_CELL_NUMBERS Mean number of cells per spot, default 5 (appropriate for Visium). If analyzing legacy spatial transcriptomics data, set to 20 --downsample-off Turn off downsampling for scRNA-seq data -smtpc SCRNA_MAX_TRANSCRIPTS_PER_CELL, --scRNA_max_transcripts_per_cell SCRNA_MAX_TRANSCRIPTS_PER_CELL Number of transcripts per cell to downsample scRNA-seq dataset to. This allows for assignments that are not dependent on the overall expression level -sc, --single-cell Use single-cell spatial approach if specified -noss NUMBER_OF_SELECTED_SPOTS, --number-of-selected-spots NUMBER_OF_SELECTED_SPOTS Number of selected spots from ST data used in each iteration -sss, --sampling-sub-spots Sample subspots to limit the number of mapped cells if specified -nosss NUMBER_OF_SELECTED_SUB_SPOTS, --number-of-selected-sub-spots NUMBER_OF_SELECTED_SUB_SPOTS Number of selected subspots from ST data to limit the number of mapped cells -nop NUMBER_OF_PROCESSORS, --number-of-processors NUMBER_OF_PROCESSORS Number of processors used for the analysis -sm {lapjv,lapjv_compat,lap_CSPR}, --solver-method {lapjv,lapjv_compat,lap_CSPR} Which solver to use for the linear assignment problem, default 'lapjv' -dm {Pearson_correlation,Spearman_correlation,Euclidean}, --distance-metric {Pearson_correlation,Spearman_correlation,Euclidean} Which distance metric to use for the cost matrix, default 'Pearson_correlation' -sam {duplicates,place_holders}, --sampling-method {duplicates,place_holders} Which underlying method to use for dealing with duplicated cells, default 'duplicates' -se SEED, --seed SEED Set seed for random generators, default 1 -p, --plot-off Turn create plots on/off -g GEOMETRY, --geometry GEOMETRY ST geometry, either 'honeycomb' or 'square' accepted -nc NUM_COLUMN, --num-column NUM_COLUMN Number of columns in figure -mp MAX_NUM_CELLS_PLOT, --max-num-cells-plot MAX_NUM_CELLS_PLOT Maximum number of cells to plot in single-cell visualization Required arguments: -sp SCRNA_PATH, --scRNA-path SCRNA_PATH Path to scRNA-Seq data -ctp CELL_TYPE_PATH, --cell-type-path CELL_TYPE_PATH Path to cell type labels [user@cn3335 ~]$ mkdir /data/$USER/cytospace && cd /data/$USER/cytospace [user@cn3335 ~]$ cp $CS_DATA/* . [user@cn3335 ~]$ cytospace \ -sp brca_scRNA_GEP.txt \ -ctp brca_scRNA_celllabels.txt \ -stp brca_STdata_GEP.txt \ -cp brca_STdata_coordinates.txt ... Read and validate data ... 100% |██████████████████████████████████████████████████| Reading data [done] Estimating cell type fractions 2024-01-25 09:08:04 Load ST data PC_ 1 Positive: IGKC, IGHG1, DCN, IGHG2, IGHA1, COL6A3, APOE, JCHAIN, LUM, MMP2 AEBP1, IGLC1, IGHG3, COL3A1, HLA-DRA, C1R, SFRP4, HMOX1, VIM, POSTN SPARC, COL6A1, IGHG4, LYZ, SFRP2, COL1A1, C3, APOC1, COL6A2, COL1A2 Negative: AZGP1, MUCL1, SCGB2A2, ERBB2, KRT7, CD24, SCGB1D2, MAL2, CRISP3, ATG5 TACSTD2, SPINT2, PPDPF, KRT8, LCN2, LTF, PIGR, SLPI, CLDN4, CRABP2 KIAA1324, PSMD3, CFB, ORMDL3, FGB, ARPC1A, FOXA1, S100A9, PDZK1IP1, IFI6 PC_ 2 Positive: SFRP4, DCN, COL6A3, IGKC, MMP2, SFRP2, AEBP1, IGHG1, LUM, C1R COL3A1, CCN2, COL1A2, COL1A1, SPARC, CCDC80, IGLC1, IGFBP7, FBLN1, C1S JCHAIN, IGHG4, CXCL14, IGHG2, C3, IGFBP4, CTSK, FBLN2, IGHA1, VCAN Negative: APOC1, APOE, FTL, SPP1, CTSL, IFI30, CTSB, CTSD, CD68, LAPTM5 SLC11A1, ACP5, TYROBP, PLIN2, FABP5, GLUL, GPNMB, SDS, CTSZ, PLAUR HMOX1, SAT1, AQP9, SCD, FCGR3A, C1QB, LYZ, TREM2, FCER1G, PSAP PC_ 3 Positive: HMOX1, POSTN, CTSB, FGB, ACTA2, FGG, TAGLN, AEBP1, SPARC, BGN SPP1, COL1A1, CCN2, FTL, TGFBI, MYL9, COL1A2, COL5A1, LUM, TIMP1 SULF1, CTSL, FN1, DCN, IGHG3, GLUL, COL3A1, APOE, CTSD, APOC1 Negative: CCL19, TRAC, TRBC2, LTB, CXCL9, CCL5, TRBC1, IL7R, LAMP3, BIRC3 CXCR4, ISG15, CXCL11, CD3D, PTPRC, IFITM1, CCR7, IFI6, IDO1, CORO1A CD37, SELL, UBD, CXCL13, IKZF1, ISG20, CD3E, RAC2, IFI44L, IL2RG PC_ 4 Positive: IGKC, IGHG1, IGHG2, JCHAIN, IGHA1, IGHG4, IGLC1, IGHG3, IGKV4-1, HMOX1 IGHM, SFRP4, IGHJ6, C3, DERL3, MZB1, MMP2, DCN, PTGDS, IGHV6-1 XBP1, PIM2, SCGB2A2, TENT5C, TXNDC5, SCD, POU2AF1, IGHD, CCDC80, SELENOP Negative: COL4A1, IGFBP7, ACTA2, COL4A2, HSPG2, MCAM, PLVAP, TIMP3, MYL9, VWF A2M, TIMP1, TAGLN, CST1, TPM2, ENG, SPARC, KRT5, S100A2, MMP11 COL15A1, ID1, MYLK, CD93, LAMC2, PODXL, FN1, POSTN, CDH5, CALD1 PC_ 5 Positive: LTF, FGB, FGG, RARRES1, CLU, LCN2, WFDC2, S100A9, SERPINA3, CP LBP, PDZK1IP1, ORM1, AGT, SLPI, CAPN13, TGM2, RDH10, TACSTD2, CHI3L2 FGA, ORM2, MGP, UBD, SLC34A2, SOD2, ELF3, GPRC5A, GABRP, CFB Negative: SCGB2A2, FADS2, TOP2A, PPP1R1A, MUCL1, PEG10, SCGB1D2, HIST1H1B, PIP, HIST1H2BH GATA3, HIST1H2BG, C2orf72, HIST1H3H, CYP4Z1, HIST1H4A, UBE2C, DBI, HIST1H4D, TRPS1 CDC6, ADAMTS1, NQO1, NPNT, NRG1, ASPH, SPDEF, CLEC3A, FASN, HIST1H2BO 2024-01-25 09:08:48 Load scRNA data PC_ 1 Positive: IGFBP7, SPARC, COL1A2, COL1A1, COL3A1, CALD1, COL6A2, TAGLN, BGN, MYL9 LUM, THY1, DCN, TPM2, POSTN, COL5A2, COL6A3, IGFBP4, AEBP1, COL6A1 CTHRC1, ACTA2, C1S, IFITM3, SFRP2, RARRES2, CTGF, TIMP3, VCAN, CTSK Negative: HLA-DRA, HLA-DRB1, TYROBP, CD74, HLA-DPB1, HLA-DPA1, HLA-DQA1, CCL4, FCER1G, CCL5 CD69, SRGN, HLA-DQB1, LYZ, CXCR4, C1QB, C1QA, RGS1, C1QC, LAPTM5 CCL3, NKG7, FCGR3A, CD52, AIF1, DUSP2, CD83, APOC1, CCL4L2, PTPRC PC_ 2 Positive: MUCL1, CD24, KRT7, CALML5, KRT18, SCGB1B2P, NKG7, FXYD3, KRT8, CD69 GZMA, MGST1, CD7, GNLY, CLDN4, AZGP1, CCL5, SLPI, CD2, ERBB2 KLRB1, RPL13A, PERP, CD3D, TACSTD2, CD3E, S100P, TM4SF1, GZMB, ELF3 Negative: HLA-DRA, FTL, HLA-DRB1, CD74, HLA-DPA1, C1QA, C1QB, HLA-DPB1, HLA-DQA1, C1QC APOE, LYZ, TYROBP, CTSB, APOC1, FTH1, HLA-DQB1, FCER1G, CD68, CST3 AIF1, MS4A6A, CTSD, FCGR3A, CTSS, PSAP, CTSZ, FN1, MS4A7, CCL3 PC_ 3 Positive: CD24, KRT7, CALML5, MUCL1, TM4SF1, KRT18, KRT8, MGST1, FXYD3, AZGP1 SLPI, CLDN4, ERBB2, TACSTD2, SPINT2, DBI, MIEN1, GRB7, S100P, ELF3 CRIP2, PERP, C17orf89, PSMD3, KRT19, MIF, EPCAM, S100A14, TM7SF2, LMTK3 Negative: CCL5, CD69, NKG7, GZMA, GNLY, CD7, CCL4, CXCR4, CD2, CST7 GZMB, KLRB1, CD3E, RGCC, CD3D, IL7R, IL32, CD52, TRBC2, CTSW PTPRC, TRBC1, TNFAIP3, TRAC, KLRD1, DUSP2, B2M, IFNG, SRGN, RHOH PC_ 4 Positive: MUCL1, CD24, KRT7, CALML5, KRT18, MGST1, FXYD3, KRT8, AZGP1, CLDN4 SLPI, DBI, SCGB1B2P, ERBB2, SPINT2, LUM, DCN, PERP, SFRP2, TACSTD2 CTSK, S100P, RARRES2, ELF3, COL1A1, SDC1, MIEN1, COL3A1, GRB7, AEBP1 Negative: PLVAP, RAMP2, CALCRL, VWF, PECAM1, SPARCL1, IGFBP7, HSPG2, AQP1, RAMP3 ADGRL4, ESAM, EMCN, CLEC14A, GNG11, CD34, CD93, COL4A1, EGFL7, RNASE1 ENG, COL4A2, IFITM3, A2M, IFITM1, ADAMTS1, SPRY1, CDH5, CXorf36, FLT1 PC_ 5 Positive: NDUFA4L2, RGS5, COL18A1, MCAM, NOTCH3, SOD3, LHFP, PPP1R14A, CCDC102B, ADIRF HIGD1B, TBX2, PDGFA, NR2F2, CPE, C11orf96, PGF, PLXDC1, TPPP3, COL4A2 COX4I2, EPS8, CALD1, COL4A1, ID4, ENPEP, SEPT4, PDGFRB, EGFL6, ACTA2 Negative: CTHRC1, MMP2, SFRP2, CTSK, DCN, COL10A1, RARRES2, FBLN1, COL11A1, LUM MFAP5, THBS2, HTRA1, HSPG2, NBL1, RAMP2, SFRP4, PLVAP, COL8A1, WISP2 VCAN, AEBP1, CCDC80, FAP, ITGBL1, VWF, CXCL12, PECAM1, PDGFRL, DPYSL3 2024-01-25 09:09:29 Integration Performing PCA on the provided reference using 2118 features as input. Projecting PCA Finding neighborhoods Finding anchors Found 1129 anchors Filtering anchors Retained 1043 anchors Finding integration vectors Finding integration vector weights 0% 10 20 30 40 50 60 70 80 90 100% [----|----|----|----|----|----|----|----|----|----| **************************************************| Predicting cell labels 100% |██████████████████████████████████████████████████| Reading data [done] Time to read and validate data: 180.52 seconds Estimating number of cells in each spot ... Time to estimate number of cells per spot: 0.99 seconds Down/up sample of scRNA-seq data according to estimated cell type fractions Time to down/up sample scRNA-seq data: 6.02 seconds Building cost matrix ... Time to build cost matrix: 6.22 seconds Solving linear assignment problem ... Time to solve linear assignment problem: 96.06 seconds Total time to run CytoSPACE core algorithm: 114.61 seconds Saving results ... 100% |██████████████████████████████████████████████████| Reading data [done] Detecting row and column indexing of Visium data; rescaling for coordinates findfont: Font family ['sans-serif'] not found. Falling back to DejaVu Sans. findfont: Generic family 'sans-serif' not found because none of the following families were found: Arial findfont: Font family ['sans-serif'] not found. Falling back to DejaVu Sans. findfont: Generic family 'sans-serif' not found because none of the following families were found: Arial Detecting row and column indexing of Visium data; rescaling for coordinates Total execution time: 363.29 seconds [user@cn3335 ~]$ exit user@biowulf]$