Biowulf High Performance Computing at the NIH
Dr.seq on Biowulf

Dr.seq is a quality control (QC) and analysis pipeline for Drop-seq data. It takes a fastq file with barcode data and a fastq file of reads along with supporting files (annotation and indices for alignment) to provide QC data at the level of reads, individual cells, bulk cells, and cell-clustering.

References

Documentation
Important Notes
Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

Allocate an interactive session and run the program. Sample session (user input in bold):

[user@biowulf ~]$ sinteractive --cpus-per-task=6 --mem=10g --gres=lscratch:30
salloc.exe: Pending job allocation 12668997
salloc.exe: job 12668997 queued and waiting for resources
salloc.exe: job 12668997 has been allocated resources
salloc.exe: Granted job allocation 12668997
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn0873 are ready for job
srun: error: x11: no local DISPLAY defined, skipping
error: unable to open file /tmp/slurm-spank-x11.12668997.0
slurmstepd: error: x11: unable to read DISPLAY value

[user@cn0873 ~]$ cd /lscratch/$SLURM_JOB_ID

[user@cn0873 12668997]$ module load bowtie drseq
[+] Loading bowtie  2-2.4.2
[+] Loading bedtools  2.30.0
[+] Loading gcc  9.2.0  ...
[+] Loading GSL 2.6 for GCC 9.2.0 ...
[-] Unloading gcc  9.2.0  ...
[+] Loading gcc  9.2.0  ...
[+] Loading openmpi 3.1.4  for GCC 9.2.0
[+] Loading ImageMagick  7.0.8  on cn0873
[+] Loading HDF5  1.10.4
[-] Unloading gcc  9.2.0  ...
[+] Loading gcc  9.2.0  ...
[+] Loading NetCDF 4.7.4_gcc9.2.0
[+] Loading pandoc  2.13  on cn0873
[+] Loading pcre2 10.21  ...
[+] Loading R 4.0.3
[+] Loading singularity  3.7.3  on cn0873
[+] Loading macs  2.2.7.1
[+] Loading drseq  2.2.0

[user@cn0873 12668997]$ cp ${DRSEQ_TESTDATA}/* .

[user@cn0873 12668997]$ gunzip *.gz

[user@cn0873 12668997]$ DrSeq simple -b drseq_test_1.fastq -r drseq_test_2.fastq -n test -f \
    -g /lscratch/${SLURM_JOB_ID}/mm10_refgenes.txt --maptool bowtie2 \
    --mapindex /fdb/igenomes/Mus_musculus/UCSC/mm10/Sequence/Bowtie2Index/genome \
    --cellbarcoderange 1:12   --umirange 13:20   --clean   \
    --thread $SLURM_CPUS_PER_TASK
Start Drseq
Step0: Data integrate
Detected input file format is fastq
use bowtie2 as alignment tools
option setting:
mapping thread is 6
/bin/sh: pdflatex: command not found
pdflatex was not installed, Dr.seq is still processing but no summary QC report generated
Step0 Data integrate DONE
Step1: alignment
Now start mapping in /lscratch/12668997/test/mapping/ , all mapping result will be here

[snip...]
Step5 summary DONE, check /lscratch/12668997/test/summary/ for final outputs

[user@cn0873 12668997]$ exit
exit
salloc.exe: Relinquishing job allocation 12668997

[user@biowulf ~]$
Batch job on Biowulf

Create a batch script similar to the following example:

#! /bin/bash
# this file is drseq_batch.sh
module load bowtie
module load drseq

Drseq.py simple \
  -b drseq_test_1.fastq \
  -r drseq_test_2.fastq \
  -n test_out -f \
  -g $PWD/mm10_refgenes.txt \
  --maptool bowtie2 \
  --mapindex /fdb/igenomes/Mus_musculus/UCSC/mm10/Sequence/Bowtie2Index/genome \
  --cellbarcoderange 1:12 \
  --umirange 13:20 \
  --clean \
  --thread $SLURM_CPUS_PER_TASK

Note that in this example used data obtained from the Dr.seq home page as well as annotation obtained from the UCSC browser.

Submit to the queue with sbatch:

biowulf$ sbatch --cpus-per-task=6 --mem=10g drseq_batch.sh
Swarm of jobs on Biowulf

Create a swarm command file similar to the following example:

# this file is drseq.swarm
DrSeq simple \
  -b sample1_1.fastq \
  -r sample1_2.fastq \
  -n sample1 -f \
  -g $PWD/mm10_refgenes.txt \
  --maptool bowtie2 \
  --mapindex /fdb/igenomes/Mus_musculus/UCSC/mm10/Sequence/Bowtie2Index/genome \
  --cellbarcoderange 1:12 \
  --umirange 13:20 \
  --clean \
  --thread $SLURM_CPUS_PER_TASK
DrSeq simple \
  -b sample2_1.fastq \
  -r sample2_2.fastq \
  -n sample2 -f \
  -g $PWD/mm10_refgenes.txt \
  --maptool bowtie2 \
  --mapindex /fdb/igenomes/Mus_musculus/UCSC/mm10/Sequence/Bowtie2Index/genome \
  --cellbarcoderange 1:12 \
  --umirange 13:20 \
  --clean \
  --thread $SLURM_CPUS_PER_TASK

And submit to the queue with swarm

biowulf$ swarm -f drseq.swarm -g10 -t6 --module drseq --module bowtie