High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
Scanpy on Biowulf

Scanpy is a scalable toolkit for analyzing single-cell gene expression data. It includes preprocessing, visualization, clustering, pseudotime and trajectory inference and differential expression testing. The Python-based implementation efficiently deals with datasets of more than one million cells.


Important Notes

This application contains graphical content and requires an X-Windows connection. It is primarily meant to be run via Jupyter notebooks.

Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

Allocate an interactive session and run the program.
Sample session (user input in bold):

[user@biowulf]$ sinteractive
salloc.exe: Pending job allocation 46116226
salloc.exe: job 46116226 queued and waiting for resources
salloc.exe: job 46116226 has been allocated resources
salloc.exe: Granted job allocation 46116226
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn3144 are ready for job

[user@cn3144 ~]$ module load scanpy
[user@cn3144 ~]$ cp $SCANPY_EXAMPLES/3K_cluster/* .
[user@cn3144 ~]$ python cluster.py

[user@cn3144 ~]$ exit
salloc.exe: Relinquishing job allocation 46116226
[user@biowulf ~]$

Batch job
Most jobs should be run as batch jobs.

Create a batch input file (e.g. scanpy.sh). For example:

module load scanpy
python my_script.py

Submit this job using the Slurm sbatch command.

sbatch [--cpus-per-task=#] [--mem=#] scanpy.sh