Biowulf High Performance Computing at the NIH
cellSNP on Biowulf

cellSNP aims to pileup the expressed alleles in single-cell or bulk RNA-seq data, which can be directly used for donor deconvolution in multiplexed single-cell RNA-seq data, particularly with vireo, which assigns cells to donors and detects doublets, even without genotyping reference.

cellSNP heavily depends on pysam, a Python interface for samtools and bcftools. This program should give very similar results as samtools/bcftools mpileup. Also, there are two major differences comparing to bcftools mpileup:

Documentation

https://github.com/single-cell-genetics/cellSNP

Important Notes
Submitting an interactive job

Allocate an interactive session and run the interactive job there.

[biowulf]$ sinteractive  --mem=5g --cpus-per-task=4
salloc.exe: Granted job allocation 789523
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn0135 are ready for job

[cn0135]$ cd /data/$USER/

[cn0135]$ module load cellsnp

[cn0135]$ cp -r /usr/local/apps/cellsnp/0.1.7/test .

[cn0135]$ cd test

[cn0135]$ bash ./test_10x.sh

[cn0135]$ exit
salloc.exe: Job allocation 789523 has been revoked.
[biowulf]$

Submitting a single batch job

1. Create a script file (myscript) similar to the one below

#! /bin/bash
# myscript
set -e

module load cellsnp || exit 1
cd /data/$USER/test/
cellSNP -p $SLURM_CPUS_PER_TASK -s file.bam -O OurDir -R file.csv.gz -b file.tsv --minCOUNT 20

2. Submit the script on biowulf:

[biowulf]$ sbatch --mem=5g -cpus-per-task=4 myscript

Using Swarm

Using the 'swarm' utility, one can submit many jobs to the cluster to run concurrently.

Set up a swarm command file (eg /data/$USER/cmdfile).

cd /data/$USER/dir1; cellSNP ...
cd /data/$USER/dir2; cellSNP ...
cd /data/$USER/dir3; cellSNP ...
...
cd /data/$USER/dir20; cellSNP ...

submit the swarm job:

$ swarm -f cmdfile --module cellsnp -g 5 -t 4

For more information regarding running swarm, see swarm.html