CytoSRPACE: Robust and rapid alignment of single-cell and spatial transcriptomes

CytoSPACE implements an optimization method for mapping individual cells from a single-cell RNA sequencing atlas to spatial expression profiles. Across diverse platforms and tissue types, it outperforms previous methods with respect to noise tolerance and accuracy, enabling tissue cartography at single-cell resolution.

References:

Documentation
Important Notes

Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

Allocate an interactive session and run the program. Sample session:

[user@biowulf]$ sinteractive --mem=12g -c8 --grep=lscratch:20
[user@cig 3335 ~]$ module load cytospace
[+] Loading singularity  4.0.1   on cn3335
[+] Loading cytospace  1.0.6 
[user@cn3335 ~]$ cytospace -h 
connect localhost port 6000: Connection refused
usage: cytospace [-h] -sp SCRNA_PATH -ctp CELL_TYPE_PATH [-stp ST_PATH] [-cp COORDINATES_PATH] [-srp SPACERANGER_PATH]
                 [-stctp ST_CELL_TYPE_PATH] [-ctfep CELL_TYPE_FRACTION_ESTIMATION_PATH] [-ncpsp N_CELLS_PER_SPOT_PATH]
                 [-o OUTPUT_FOLDER] [-op OUTPUT_PREFIX] [-mcn MEAN_CELL_NUMBERS] [--downsample-off]
                 [-smtpc SCRNA_MAX_TRANSCRIPTS_PER_CELL] [-sc] [-noss NUMBER_OF_SELECTED_SPOTS] [-sss]
                 [-nosss NUMBER_OF_SELECTED_SUB_SPOTS] [-nop NUMBER_OF_PROCESSORS] [-sm {lapjv,lapjv_compat,lap_CSPR}]
                 [-dm {Pearson_correlation,Spearman_correlation,Euclidean}] [-sam {duplicates,place_holders}] [-se SEED]
                 [-p] [-g GEOMETRY] [-nc NUM_COLUMN] [-mp MAX_NUM_CELLS_PLOT]

CytoSPACE is a computational strategy for assigning single-cell transcriptomes to in situ spatial transcriptomics (ST)
data. Our method solves single cell/spot assignment by minimizing a correlation-based cost function through a linear
programming-based optimization routine.

optional arguments:
  -h, --help            show this help message and exit
  -stp ST_PATH, --st-path ST_PATH
                        Path to spatial transcriptomics data (expressions)
  -cp COORDINATES_PATH, --coordinates-path COORDINATES_PATH
                        Path to transcriptomics data (coordinates)
  -srp SPACERANGER_PATH, --spaceranger-path SPACERANGER_PATH
                        Path to SpaceRanger tar.gz data file
  -stctp ST_CELL_TYPE_PATH, --st-cell-type-path ST_CELL_TYPE_PATH
                        Path to ST cell type file (recommended for single-cell ST)
  -ctfep CELL_TYPE_FRACTION_ESTIMATION_PATH, --cell-type-fraction-estimation-path CELL_TYPE_FRACTION_ESTIMATION_PATH
                        Path to ST cell type fraction file (recommended for bulk ST)
  -ncpsp N_CELLS_PER_SPOT_PATH, --n-cells-per-spot-path N_CELLS_PER_SPOT_PATH
                        Path to number of cells per ST spot file
  -o OUTPUT_FOLDER, --output-folder OUTPUT_FOLDER
                        Relative path to the output folder
  -op OUTPUT_PREFIX, --output-prefix OUTPUT_PREFIX
                        Prefix of results stored in the 'output_folder'
  -mcn MEAN_CELL_NUMBERS, --mean-cell-numbers MEAN_CELL_NUMBERS
                        Mean number of cells per spot, default 5 (appropriate for Visium). If analyzing legacy spatial
                        transcriptomics data, set to 20
  --downsample-off      Turn off downsampling for scRNA-seq data
  -smtpc SCRNA_MAX_TRANSCRIPTS_PER_CELL, --scRNA_max_transcripts_per_cell SCRNA_MAX_TRANSCRIPTS_PER_CELL
                        Number of transcripts per cell to downsample scRNA-seq dataset to. This allows for assignments
                        that are not dependent on the overall expression level
  -sc, --single-cell    Use single-cell spatial approach if specified
  -noss NUMBER_OF_SELECTED_SPOTS, --number-of-selected-spots NUMBER_OF_SELECTED_SPOTS
                        Number of selected spots from ST data used in each iteration
  -sss, --sampling-sub-spots
                        Sample subspots to limit the number of mapped cells if specified
  -nosss NUMBER_OF_SELECTED_SUB_SPOTS, --number-of-selected-sub-spots NUMBER_OF_SELECTED_SUB_SPOTS
                        Number of selected subspots from ST data to limit the number of mapped cells
  -nop NUMBER_OF_PROCESSORS, --number-of-processors NUMBER_OF_PROCESSORS
                        Number of processors used for the analysis
  -sm {lapjv,lapjv_compat,lap_CSPR}, --solver-method {lapjv,lapjv_compat,lap_CSPR}
                        Which solver to use for the linear assignment problem, default 'lapjv'
  -dm {Pearson_correlation,Spearman_correlation,Euclidean}, --distance-metric {Pearson_correlation,Spearman_correlation,Euclidean}
                        Which distance metric to use for the cost matrix, default 'Pearson_correlation'
  -sam {duplicates,place_holders}, --sampling-method {duplicates,place_holders}
                        Which underlying method to use for dealing with duplicated cells, default 'duplicates'
  -se SEED, --seed SEED
                        Set seed for random generators, default 1
  -p, --plot-off        Turn create plots on/off
  -g GEOMETRY, --geometry GEOMETRY
                        ST geometry, either 'honeycomb' or 'square' accepted
  -nc NUM_COLUMN, --num-column NUM_COLUMN
                        Number of columns in figure
  -mp MAX_NUM_CELLS_PLOT, --max-num-cells-plot MAX_NUM_CELLS_PLOT
                        Maximum number of cells to plot in single-cell visualization

Required arguments:
  -sp SCRNA_PATH, --scRNA-path SCRNA_PATH
                        Path to scRNA-Seq data
  -ctp CELL_TYPE_PATH, --cell-type-path CELL_TYPE_PATH
                        Path to cell type labels
[user@cn3335 ~]$ mkdir /data/$USER/cytospace && cd /data/$USER/cytospace
[user@cn3335 ~]$ cp $CS_DATA/* .
[user@cn3335 ~]$ cytospace \
   -sp brca_scRNA_GEP.txt \
   -ctp brca_scRNA_celllabels.txt \
   -stp brca_STdata_GEP.txt        \
   -cp brca_STdata_coordinates.txt
...
Read and validate data ...
100% |██████████████████████████████████████████████████| Reading data [done]
Estimating cell type fractions
2024-01-25 09:08:04 Load ST data
PC_ 1
Positive:  IGKC, IGHG1, DCN, IGHG2, IGHA1, COL6A3, APOE, JCHAIN, LUM, MMP2
           AEBP1, IGLC1, IGHG3, COL3A1, HLA-DRA, C1R, SFRP4, HMOX1, VIM, POSTN
           SPARC, COL6A1, IGHG4, LYZ, SFRP2, COL1A1, C3, APOC1, COL6A2, COL1A2
Negative:  AZGP1, MUCL1, SCGB2A2, ERBB2, KRT7, CD24, SCGB1D2, MAL2, CRISP3, ATG5
           TACSTD2, SPINT2, PPDPF, KRT8, LCN2, LTF, PIGR, SLPI, CLDN4, CRABP2
           KIAA1324, PSMD3, CFB, ORMDL3, FGB, ARPC1A, FOXA1, S100A9, PDZK1IP1, IFI6
PC_ 2
Positive:  SFRP4, DCN, COL6A3, IGKC, MMP2, SFRP2, AEBP1, IGHG1, LUM, C1R
           COL3A1, CCN2, COL1A2, COL1A1, SPARC, CCDC80, IGLC1, IGFBP7, FBLN1, C1S
           JCHAIN, IGHG4, CXCL14, IGHG2, C3, IGFBP4, CTSK, FBLN2, IGHA1, VCAN
Negative:  APOC1, APOE, FTL, SPP1, CTSL, IFI30, CTSB, CTSD, CD68, LAPTM5
           SLC11A1, ACP5, TYROBP, PLIN2, FABP5, GLUL, GPNMB, SDS, CTSZ, PLAUR
           HMOX1, SAT1, AQP9, SCD, FCGR3A, C1QB, LYZ, TREM2, FCER1G, PSAP
PC_ 3
Positive:  HMOX1, POSTN, CTSB, FGB, ACTA2, FGG, TAGLN, AEBP1, SPARC, BGN
           SPP1, COL1A1, CCN2, FTL, TGFBI, MYL9, COL1A2, COL5A1, LUM, TIMP1
           SULF1, CTSL, FN1, DCN, IGHG3, GLUL, COL3A1, APOE, CTSD, APOC1
Negative:  CCL19, TRAC, TRBC2, LTB, CXCL9, CCL5, TRBC1, IL7R, LAMP3, BIRC3
           CXCR4, ISG15, CXCL11, CD3D, PTPRC, IFITM1, CCR7, IFI6, IDO1, CORO1A
           CD37, SELL, UBD, CXCL13, IKZF1, ISG20, CD3E, RAC2, IFI44L, IL2RG
PC_ 4
Positive:  IGKC, IGHG1, IGHG2, JCHAIN, IGHA1, IGHG4, IGLC1, IGHG3, IGKV4-1, HMOX1
           IGHM, SFRP4, IGHJ6, C3, DERL3, MZB1, MMP2, DCN, PTGDS, IGHV6-1
           XBP1, PIM2, SCGB2A2, TENT5C, TXNDC5, SCD, POU2AF1, IGHD, CCDC80, SELENOP
Negative:  COL4A1, IGFBP7, ACTA2, COL4A2, HSPG2, MCAM, PLVAP, TIMP3, MYL9, VWF
           A2M, TIMP1, TAGLN, CST1, TPM2, ENG, SPARC, KRT5, S100A2, MMP11
           COL15A1, ID1, MYLK, CD93, LAMC2, PODXL, FN1, POSTN, CDH5, CALD1
PC_ 5
Positive:  LTF, FGB, FGG, RARRES1, CLU, LCN2, WFDC2, S100A9, SERPINA3, CP
           LBP, PDZK1IP1, ORM1, AGT, SLPI, CAPN13, TGM2, RDH10, TACSTD2, CHI3L2
           FGA, ORM2, MGP, UBD, SLC34A2, SOD2, ELF3, GPRC5A, GABRP, CFB
Negative:  SCGB2A2, FADS2, TOP2A, PPP1R1A, MUCL1, PEG10, SCGB1D2, HIST1H1B, PIP, HIST1H2BH
           GATA3, HIST1H2BG, C2orf72, HIST1H3H, CYP4Z1, HIST1H4A, UBE2C, DBI, HIST1H4D, TRPS1
           CDC6, ADAMTS1, NQO1, NPNT, NRG1, ASPH, SPDEF, CLEC3A, FASN, HIST1H2BO
2024-01-25 09:08:48 Load scRNA data
PC_ 1
Positive:  IGFBP7, SPARC, COL1A2, COL1A1, COL3A1, CALD1, COL6A2, TAGLN, BGN, MYL9
           LUM, THY1, DCN, TPM2, POSTN, COL5A2, COL6A3, IGFBP4, AEBP1, COL6A1
           CTHRC1, ACTA2, C1S, IFITM3, SFRP2, RARRES2, CTGF, TIMP3, VCAN, CTSK
Negative:  HLA-DRA, HLA-DRB1, TYROBP, CD74, HLA-DPB1, HLA-DPA1, HLA-DQA1, CCL4, FCER1G, CCL5
           CD69, SRGN, HLA-DQB1, LYZ, CXCR4, C1QB, C1QA, RGS1, C1QC, LAPTM5
           CCL3, NKG7, FCGR3A, CD52, AIF1, DUSP2, CD83, APOC1, CCL4L2, PTPRC
PC_ 2
Positive:  MUCL1, CD24, KRT7, CALML5, KRT18, SCGB1B2P, NKG7, FXYD3, KRT8, CD69
           GZMA, MGST1, CD7, GNLY, CLDN4, AZGP1, CCL5, SLPI, CD2, ERBB2
           KLRB1, RPL13A, PERP, CD3D, TACSTD2, CD3E, S100P, TM4SF1, GZMB, ELF3
Negative:  HLA-DRA, FTL, HLA-DRB1, CD74, HLA-DPA1, C1QA, C1QB, HLA-DPB1, HLA-DQA1, C1QC
           APOE, LYZ, TYROBP, CTSB, APOC1, FTH1, HLA-DQB1, FCER1G, CD68, CST3
           AIF1, MS4A6A, CTSD, FCGR3A, CTSS, PSAP, CTSZ, FN1, MS4A7, CCL3
PC_ 3
Positive:  CD24, KRT7, CALML5, MUCL1, TM4SF1, KRT18, KRT8, MGST1, FXYD3, AZGP1
           SLPI, CLDN4, ERBB2, TACSTD2, SPINT2, DBI, MIEN1, GRB7, S100P, ELF3
           CRIP2, PERP, C17orf89, PSMD3, KRT19, MIF, EPCAM, S100A14, TM7SF2, LMTK3
Negative:  CCL5, CD69, NKG7, GZMA, GNLY, CD7, CCL4, CXCR4, CD2, CST7
           GZMB, KLRB1, CD3E, RGCC, CD3D, IL7R, IL32, CD52, TRBC2, CTSW
           PTPRC, TRBC1, TNFAIP3, TRAC, KLRD1, DUSP2, B2M, IFNG, SRGN, RHOH
PC_ 4
Positive:  MUCL1, CD24, KRT7, CALML5, KRT18, MGST1, FXYD3, KRT8, AZGP1, CLDN4
           SLPI, DBI, SCGB1B2P, ERBB2, SPINT2, LUM, DCN, PERP, SFRP2, TACSTD2
           CTSK, S100P, RARRES2, ELF3, COL1A1, SDC1, MIEN1, COL3A1, GRB7, AEBP1
Negative:  PLVAP, RAMP2, CALCRL, VWF, PECAM1, SPARCL1, IGFBP7, HSPG2, AQP1, RAMP3
           ADGRL4, ESAM, EMCN, CLEC14A, GNG11, CD34, CD93, COL4A1, EGFL7, RNASE1
           ENG, COL4A2, IFITM3, A2M, IFITM1, ADAMTS1, SPRY1, CDH5, CXorf36, FLT1
PC_ 5
Positive:  NDUFA4L2, RGS5, COL18A1, MCAM, NOTCH3, SOD3, LHFP, PPP1R14A, CCDC102B, ADIRF
           HIGD1B, TBX2, PDGFA, NR2F2, CPE, C11orf96, PGF, PLXDC1, TPPP3, COL4A2
           COX4I2, EPS8, CALD1, COL4A1, ID4, ENPEP, SEPT4, PDGFRB, EGFL6, ACTA2
Negative:  CTHRC1, MMP2, SFRP2, CTSK, DCN, COL10A1, RARRES2, FBLN1, COL11A1, LUM
           MFAP5, THBS2, HTRA1, HSPG2, NBL1, RAMP2, SFRP4, PLVAP, COL8A1, WISP2
           VCAN, AEBP1, CCDC80, FAP, ITGBL1, VWF, CXCL12, PECAM1, PDGFRL, DPYSL3
2024-01-25 09:09:29 Integration
Performing PCA on the provided reference using 2118 features as input.
Projecting PCA
Finding neighborhoods
Finding anchors
        Found 1129 anchors
Filtering anchors
        Retained 1043 anchors
Finding integration vectors
Finding integration vector weights
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Predicting cell labels
100% |██████████████████████████████████████████████████| Reading data [done]
Time to read and validate data: 180.52 seconds
Estimating number of cells in each spot ...
Time to estimate number of cells per spot: 0.99 seconds
Down/up sample of scRNA-seq data according to estimated cell type fractions
Time to down/up sample scRNA-seq data: 6.02 seconds
Building cost matrix ...
Time to build cost matrix: 6.22 seconds
Solving linear assignment problem ...
Time to solve linear assignment problem: 96.06 seconds
Total time to run CytoSPACE core algorithm: 114.61 seconds
Saving results ...
100% |██████████████████████████████████████████████████| Reading data [done]
Detecting row and column indexing of Visium data; rescaling for coordinates
findfont: Font family ['sans-serif'] not found. Falling back to DejaVu Sans.
findfont: Generic family 'sans-serif' not found because none of the following families were found: Arial
findfont: Font family ['sans-serif'] not found. Falling back to DejaVu Sans.
findfont: Generic family 'sans-serif' not found because none of the following families were found: Arial
Detecting row and column indexing of Visium data; rescaling for coordinates
Total execution time: 363.29 seconds
[user@cn3335 ~]$ exit
user@biowulf]$