High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
Dropseq on Biowulf

Drop-seq is a technology that allows biologists to analyze genome-wide gene expression in thousands of individual cells in a single experiment.

References:

Documentation
Important Notes

Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

Allocate an interactive session and run the program. Sample session:

[user@biowulf]$ sinteractive
salloc.exe: Pending job allocation 46116226
salloc.exe: job 46116226 queued and waiting for resources
salloc.exe: job 46116226 has been allocated resources
salloc.exe: Granted job allocation 46116226
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn3144 are ready for job

[user@cn3144 ~]$ module load dropseq

[user@cn3144 ~]$ cd /data/$USER/dir

[user@cn3144 ~]$ BAMTagHistogram -- -h
USAGE: BAMTagHistogram [options]

Create a histogram of values for the given tag
Version: 1.0(a568873_1439010606)


Options:

--help
-h                            Displays options specific to this tool.

--stdhelp
-H                            Displays options specific to this tool AND options common to all Picard command line 
                              tools.

--version                     Displays program version.

INPUT=File
I=File                        The input SAM or BAM file to analyze.  Must be coordinate sorted. (???)  Required. 

OUTPUT=File
O=File                        Output file of histogram of tag value frequencies. This supports zipped formats like gz 
                              and bz2.  Required. 

TAG=String                    Tag to extract  Required. 

FILTER_PCR_DUPLICATES=Boolean Filter PCR Duplicates.  Default value: false. This option can be set to 'null' to clear 
                              the default value. Possible values: {true, false} 

READ_QUALITY=Integer          Read quality filter.  Filters all reads lower than this mapping quality.  Defaults to 10.  
                              Set to 0 to not filter reads by map quality.  Default value: 10. This option can be set 
                              to 'null' to clear the default value. 

[user@cn3144 ~]$ exit
salloc.exe: Relinquishing job allocation 46116226
[user@biowulf ~]$

Batch job
Most jobs should be run as batch jobs.

Create a batch input file (e.g. dropseq.sh). For example:

#!/bin/bash
module load dropseq 
cd /data/$USER/dir 
BAMTagHistogram -I=file1 -O=file2 ....
....
....

Submit this job using the Slurm sbatch command.

sbatch dropseq.sh
Swarm of Jobs
A swarm of jobs is an easy way to submit a set of independent commands requiring identical resources.

Create a swarmfile (e.g. dropseq.swarm). For example:

cd /data/user/run1/; BAMTagHistogram -I=file1 -O=file2 
cd /data/user/run2/; BAMTagHistogram -I=file1 -O=file2 
cd /data/user/run3/; BAMTagHistogram -I=file1 -O=file2 
........

Submit this job using the swarm command.

swarm -f dropseq.swarm --module dropseq
where
--module dropseq Loads the dropseq module for each subjob in the swarm