High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
Dr.seq on Biowulf & Helix

Description

Dr.seq is a quality control (QC) and analysis pipeline for Drop-seq data. It takes a fastq file with barcode data and a fastq file of reads along with supporting files (annotation and indices for alignment) to provide QC data at the level of reads, individual cells, bulk cells, and cell-clustering.

There are changes to the command line interface between versions. Documentation here should refer to the newest version.

There may be multiple versions of Dr.seq available. An easy way of selecting the version is to use modules. To see the modules available, type

module avail drseq 

To select a module use

module load drseq/[version]

where [version] is the version of choice.

drseq is a multithreaded application. Make sure to match the number of cpus requested with the number of threads.

Environment variables set

Dependencies

Drseq can use either bowtie2 or STAR as a short read mapper. Since the choice is up to the user, neither of these modules is loaded by default. Please load the correct module for your analysis manually.

When analyzind Drop-ChIP or ATAC-Seq data please load the mac 1.4 module.

R, samtools and betools are loaded automatically.

References

Documentation

On Helix

This package should not be used on helix. If you need to do interactive work please use an interactive session on biowulf.

Batch job on Biowulf

Create a batch script similar to the following example:

#! /bin/bash
# this file is drseq_batch.sh
module load bowtie
module load drseq

Drseq.py simple \
  -b drseq_test_1.fastq \
  -r drseq_test_2.fastq \
  -n test_out -f \
  -g $PWD/mm10_refgenes.txt \
  --maptool bowtie2 \
  --mapindex /fdb/igenomes/Mus_musculus/UCSC/mm10/Sequence/Bowtie2Index/genome \
  --cellbarcoderange 1:12 \
  --umirange 13:20 \
  --clean \
  --thread $SLURM_CPUS_PER_TASK

Note that in this example used data obtained from the Dr.seq home page as well as annotation obtained from the UCSC browser.

Submit to the queue with sbatch:

biowulf$ sbatch --cpus-per-task=6 --mem=10g drseq_batch.sh
Swarm of jobs on Biowulf

Create a swarm command file similar to the following example:

# this file is drseq.swarm
DrSeq simple \
  -b sample1_1.fastq \
  -r sample1_2.fastq \
  -n sample1 -f \
  -g $PWD/mm10_refgenes.txt \
  --maptool bowtie2 \
  --mapindex /fdb/igenomes/Mus_musculus/UCSC/mm10/Sequence/Bowtie2Index/genome \
  --cellbarcoderange 1:12 \
  --umirange 13:20 \
  --clean \
  --thread $SLURM_CPUS_PER_TASK
DrSeq simple \
  -b sample2_1.fastq \
  -r sample2_2.fastq \
  -n sample2 -f \
  -g $PWD/mm10_refgenes.txt \
  --maptool bowtie2 \
  --mapindex /fdb/igenomes/Mus_musculus/UCSC/mm10/Sequence/Bowtie2Index/genome \
  --cellbarcoderange 1:12 \
  --umirange 13:20 \
  --clean \
  --thread $SLURM_CPUS_PER_TASK

And submit to the queue with swarm

biowulf$ swarm -f drseq.swarm -g10 -t6 --module drseq --module bowtie
Interactive job on Biowulf

Allocate an interactive session with sinteractive and use as described above

biowulf$ sinteractive --cpus-per-task=6 --mem=10g
node$ module load bowtie drseq
node$  DrSeq simple \
  -b drseq_test_1.fastq \
  -r drseq_test_2.fastq \
  -n test -f \
  -g $PWD/mm10_refgenes.txt \
  --maptool bowtie2 \
  --mapindex /fdb/igenomes/Mus_musculus/UCSC/mm10/Sequence/Bowtie2Index/genome \
  --cellbarcoderange 1:12 \
  --umirange 13:20 \
  --clean \
  --thread $SLURM_CPUS_PER_TASK
Start Drseq
Step0: Data integrate
Detected input file format is fastq
use bowtie2 as alignment tools
option setting: 
mapping thread is 4
Step0 Data integrate DONE
Step1: alignment
[...snip...]
node$ exit
biowulf$