Discovar on Biowulf

Quick Links

DISCOVAR is a new variant caller and DISCOVAR de novo a new genome assembler, both designed for state-of-the-art data. Their inputs are chosen to optimize quality while keeping costs low. Currently it takes as input Illumina reads of length 250 or longer — produced on MiSeq or HiSeq 2500 — and from a single PCR-free library. These data enable a level of completeness and continuity that was not previously possible.

Documentation

Discovar Main Site

Important Notes

Module Name: discovar (see the modules page for more information)
Multithreaded app (use NUM_THREADS option)
environment variables set
- DISCOVAR_TEST = /usr/local/apps/discovar/TEST_DATA
Example files in $DISCOVAR_TEST

Interactive job

Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

Allocate an interactive session and run the program. Sample session:

[user@biowulf]$ sinteractive -c 4 --mem 10g
salloc.exe: Pending job allocation 46116226
salloc.exe: job 46116226 queued and waiting for resources
salloc.exe: job 46116226 has been allocated resources
salloc.exe: Granted job allocation 46116226
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn3144 are ready for job

[user@cn3144 ~]$ mkdir -p /data/$USER/discovar && cd /data/$USER/discovar

[user@cn3144 ~]$ module load discovar

[user@cn3144 ~]$ cp $DISCOVAR_TEST/* .

[user@cn3144 ~]$ ./run-discovar-assembly.sh
running Discovar READS=sample-reads.bam REGIONS='10:30892106-30933760' OUT_HEAD=./discovar-assembly/assembly TMP=./discovar-assembly/tmp
Performing re-exec to adjust stack size.

--------------------------------------------------------------------------------
Tue Jul 24 08:57:12 2018 run on cn3206, pid=56652 [Jul 23 2018 16:14:14 R52488 ]
Discovar READS=sample-reads.bam REGIONS=10:30892106-30933760                   \
         OUT_HEAD=./discovar-assembly/assembly TMP=./discovar-assembly/tmp
--------------------------------------------------------------------------------
Tue Jul 24 08:57:12 2018: there are 9,644 reads
Tue Jul 24 08:57:12 2018: mean read length = 250.0
Tue Jul 24 08:57:12 2018: mean base quality = 25.8
[...]

DISCOVAR SUMMARY STATS

1 components
45 edges
45553 kmers

Tue Jul 24 08:57:31 2018: done, time used = 19.1 seconds, peak mem used = 0.7 GB

====================================================================================

Discovar has completed correctly.  See the output in ./discovar-assembly

[user@cn3144 ~]$ ./run-discovar-variants.sh
running Discovar READS=sample-reads.bam REFERENCE=sample-genome.fasta REGIONS='10:30892106-30933760' OUT_HEAD=./discovar-variants/assembly TMP=./discovar-variants/tmp
Performing re-exec to adjust stack size.
[...]
DISCOVAR SUMMARY STATS

1 components
45 edges
45553 kmers

Tue Jul 24 08:58:20 2018: done, time used = 23.3 seconds, peak mem used = 1.5 GB

====================================================================================

Discovar has completed correctly.  See the output in ./discovar-variants

[user@cn3144 ~]$ exit
salloc.exe: Relinquishing job allocation 46116226
[user@biowulf ~]$

Batch job

Most jobs should be run as batch jobs.

Create a batch input file (e.g. discovar.sh). For example:

#!/bin/bash
module load discovar
cd /data/$USER/

Discovar READS=sample-reads.bam REGIONS='10:30892106-30933760' OUT_HEAD=./output/assembly TMP=/lscratch/$SLURM_JOBID NUM_THREADS=$SLURM_CPUS_PER_TASK

Submit this job using the Slurm sbatch command.

sbatch --cpus-per-task=10 --mem=20g --gres=lscratch:10 discovar.sh

Swarm of Jobs

A swarm of jobs is an easy way to submit a set of independent commands requiring identical resources.

Create a swarmfile (e.g. discovar.swarm). For example:

Discovar READS=sample1-reads.bam [...] OUT_HEAD=./output1 TMP=/lscratch/$SLURM_JOBID NUM_THREADS=$SLURM_CPUS_PER_TASK
Discovar READS=sample2-reads.bam [...] OUT_HEAD=./output2 TMP=/lscratch/$SLURM_JOBID NUM_THREADS=$SLURM_CPUS_PER_TASK
Discovar READS=sample3-reads.bam [...] OUT_HEAD=./output3 TMP=/lscratch/$SLURM_JOBID NUM_THREADS=$SLURM_CPUS_PER_TASK

Submit this job using the swarm command.

swarm -f discovar.swarm -gres=lscratch:10 -g 10 -t 10 --module discovar

where

`-g #`	Number of Gigabytes of memory required for each process (1 line in the swarm command file)
`-t #`	Number of threads/CPUs required for each process (1 line in the swarm command file).
`--module discovar`	Loads the discovar module for each subjob in the swarm