High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
delly on Biowulf & Helix

Description

DELLY is an integrated structural variant prediction method that can detect deletions, tandem duplications, inversions and translocations at single-nucleotide resolution in short-read massively parallel sequencing data. It uses paired-ends and split-reads to sensitively and accurately delineate genomic rearrangements throughout the genome.

There may be multiple versions of delly available. An easy way of selecting the version is to use modules. To see the modules available, type

module avail delly 

To select a module use

module load delly/[version]

where [version] is the version of choice.

delly is a multithreaded application and the threading is on the level of input samples. Make sure to match the number of cpus requested with the number of threads. The number of threads is determined by the OMP_NUM_THREADS environment variable

Environment variables set

References

Documentation

Batch job on Biowulf

Create a batch script similar to the following example:

#! /bin/bash
# this file is delly.batch

function die {
    echo "$@" >&2
    exit 1
}

module load delly || die "Could not load module"
cd /data/$USER/data_for_delly
delly call -t DEL -o del.vcf -g ref.fa \
    sample1.bam sample2.bam sample3.bam sample4.bam

Submit to the queue with sbatch:

biowulf$ sbatch --cpus-per-task=4 --mem=10g delly.batch

Loading the module as part of the batch script will automatically set the OMP_NUM_THREADS variable to match the number of allocated CPUs. If not loading the module in the batch script, please set OMP_NUM_THREADS explicitly.

Swarm of jobs on Biowulf

Create a swarm command file similar to the following example:

# this file is delly.swarm
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK; cd /data/$USER/dir1; delly call -t DEL -o del.vcf -g ref.fa sample1.bam sample2.bam sample3.bam
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK; cd /data/$USER/dir2; delly call -t DEL -o del.vcf -g ref.fa sample1.bam sample2.bam sample3.bam
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK; cd /data/$USER/dir3; delly call -t DEL -o del.vcf -g ref.fa sample1.bam sample2.bam sample3.bam
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK; cd /data/$USER/dir4; delly call -t DEL -o del.vcf -g ref.fa sample1.bam sample2.bam sample3.bam

And submit to the queue with swarm

biowulf$ swarm -f delly.swarm -g 10 -t 4 --module delly
Interactive job on Biowulf

Allocate an interactive session with sinteractive and use as described above

biowulf$ sinteractive --cpus-per-task=2 --mem=10g
node$ module load delly
node$ cd /data/$USER/dir
node$ delly call -t DEL -o del.vcf -g ref.fa sample1.bam sample2.bam
[...snip...]
node$ exit
biowulf$