Dr.seq on Biowulf
Dr.seq is a quality control (QC) and analysis pipeline for Drop-seq data. It takes a fastq file with barcode data and a fastq file of reads along with supporting files (annotation and indices for alignment) to provide QC data at the level of reads, individual cells, bulk cells, and cell-clustering.
References
Documentation
Important Notes
- There are changes to the command line interface between versions. Documentation here should refer to the newest version.
- There may be multiple versions of Dr.seq available. An easy way of selecting the version is to use modules. To see the modules available, type module avail drseq.
- drseq is a multithreaded application. Make sure to match the number of cpus requested with the number of threads.
- Environment variables set
- $PATH
- $DRSEQ_TESTDATA
- Drseq can use either bowtie2 or STAR as a short read mapper. Since the choice is up to you, neither of these modules is loaded by default. Please load the correct module for your analysis manually.
- When analyzind Drop-ChIP or ATAC-Seq data please load the mac 1.4 module.
- R, samtools and betools are loaded automatically.
- Dependencies
- bowtie2 or STAR
- macs 1.4 (for Drop-ChIP and ATAC-Seq data)
- R
- bedtools
- samtools
Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.
Allocate an interactive session and run the program. Sample session (user input in bold):
[user@biowulf ~]$ sinteractive --cpus-per-task=6 --mem=10g --gres=lscratch:30 salloc.exe: Pending job allocation 12668997 salloc.exe: job 12668997 queued and waiting for resources salloc.exe: job 12668997 has been allocated resources salloc.exe: Granted job allocation 12668997 salloc.exe: Waiting for resource configuration salloc.exe: Nodes cn0873 are ready for job srun: error: x11: no local DISPLAY defined, skipping error: unable to open file /tmp/slurm-spank-x11.12668997.0 slurmstepd: error: x11: unable to read DISPLAY value [user@cn0873 ~]$ cd /lscratch/$SLURM_JOB_ID [user@cn0873 12668997]$ module load bowtie drseq [+] Loading bowtie 2-2.4.2 [+] Loading bedtools 2.30.0 [+] Loading gcc 9.2.0 ... [+] Loading GSL 2.6 for GCC 9.2.0 ... [-] Unloading gcc 9.2.0 ... [+] Loading gcc 9.2.0 ... [+] Loading openmpi 3.1.4 for GCC 9.2.0 [+] Loading ImageMagick 7.0.8 on cn0873 [+] Loading HDF5 1.10.4 [-] Unloading gcc 9.2.0 ... [+] Loading gcc 9.2.0 ... [+] Loading NetCDF 4.7.4_gcc9.2.0 [+] Loading pandoc 2.13 on cn0873 [+] Loading pcre2 10.21 ... [+] Loading R 4.0.3 [+] Loading singularity 3.7.3 on cn0873 [+] Loading macs 2.2.7.1 [+] Loading drseq 2.2.0 [user@cn0873 12668997]$ cp ${DRSEQ_TESTDATA}/* . [user@cn0873 12668997]$ gunzip *.gz [user@cn0873 12668997]$ DrSeq simple -b drseq_test_1.fastq -r drseq_test_2.fastq -n test -f \ -g /lscratch/${SLURM_JOB_ID}/mm10_refgenes.txt --maptool bowtie2 \ --mapindex /fdb/igenomes/Mus_musculus/UCSC/mm10/Sequence/Bowtie2Index/genome \ --cellbarcoderange 1:12 --umirange 13:20 --clean \ --thread $SLURM_CPUS_PER_TASK Start Drseq Step0: Data integrate Detected input file format is fastq use bowtie2 as alignment tools option setting: mapping thread is 6 /bin/sh: pdflatex: command not found pdflatex was not installed, Dr.seq is still processing but no summary QC report generated Step0 Data integrate DONE Step1: alignment Now start mapping in /lscratch/12668997/test/mapping/ , all mapping result will be here [snip...] Step5 summary DONE, check /lscratch/12668997/test/summary/ for final outputs [user@cn0873 12668997]$ exit exit salloc.exe: Relinquishing job allocation 12668997 [user@biowulf ~]$
Batch job on Biowulf
Create a batch script similar to the following example:
#! /bin/bash # this file is drseq_batch.sh module load bowtie module load drseq Drseq.py simple \ -b drseq_test_1.fastq \ -r drseq_test_2.fastq \ -n test_out -f \ -g $PWD/mm10_refgenes.txt \ --maptool bowtie2 \ --mapindex /fdb/igenomes/Mus_musculus/UCSC/mm10/Sequence/Bowtie2Index/genome \ --cellbarcoderange 1:12 \ --umirange 13:20 \ --clean \ --thread $SLURM_CPUS_PER_TASK
Note that in this example used data obtained from the Dr.seq home page as well as annotation obtained from the UCSC browser.
Submit to the queue with sbatch:
biowulf$ sbatch --cpus-per-task=6 --mem=10g drseq_batch.sh
Swarm of jobs on Biowulf
Create a swarm command file similar to the following example:
# this file is drseq.swarm DrSeq simple \ -b sample1_1.fastq \ -r sample1_2.fastq \ -n sample1 -f \ -g $PWD/mm10_refgenes.txt \ --maptool bowtie2 \ --mapindex /fdb/igenomes/Mus_musculus/UCSC/mm10/Sequence/Bowtie2Index/genome \ --cellbarcoderange 1:12 \ --umirange 13:20 \ --clean \ --thread $SLURM_CPUS_PER_TASK DrSeq simple \ -b sample2_1.fastq \ -r sample2_2.fastq \ -n sample2 -f \ -g $PWD/mm10_refgenes.txt \ --maptool bowtie2 \ --mapindex /fdb/igenomes/Mus_musculus/UCSC/mm10/Sequence/Bowtie2Index/genome \ --cellbarcoderange 1:12 \ --umirange 13:20 \ --clean \ --thread $SLURM_CPUS_PER_TASK
And submit to the queue with swarm
biowulf$ swarm -f drseq.swarm -g10 -t6 --module drseq --module bowtie