Biowulf High Performance Computing at the NIH
SVE on Biowulf

From the GitHub description

SVE is a python script based execution engine for Structural Variation (SV) detection and can be used for any levels of data inputs, raw FASTQs, aligned BAMs, or variant call format (VCFs), and generates a unified VCF as its output. By design, SVE consists of alignment, realignment and the ensemble of state-of-the-art SV-calling algorithms by default. They are BreakDancer, BreakSeq, cnMOPS, CNVnator, DELLY, Hydra and LUMPY. FusorSV is also embedded that is a data mining approach to assess performance and merge callsets from an ensemble of SV-calling algorithms.

References:

  • T. Becker et al. FusorSV: an algorithm for optimally combining data from multiple structural variation detection methods. Genome Biology 19:38 (2018). PubMed |  PMC |  Journal
Documentation
Important Notes

Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

Allocate an interactive session and run the program. Sample session:

[user@biowulf]$ sinteractive --mem=32g --cpus-per-task=16 --gres=lscratch:200
salloc.exe: Pending job allocation 46116226
salloc.exe: job 46116226 queued and waiting for resources
salloc.exe: job 46116226 has been allocated resources
salloc.exe: Granted job allocation 46116226
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn3144 are ready for job

[user@cn3144]$ cd /lscratch/$SLURM_JOB_ID
[user@cn3144]$ ml sve
[user@cn3144]$ mkdir fastq && cp -L ${SVE_TEST_DATA:-none}/* fastq
[user@cn3144]$ ls -lh fastq
total 2.3G
-rw-r--r-- 1 user group 1.2G Nov  1 20:27 ERR194158_NA12889_chr20_R1.fastq.gz
-rw-r--r-- 1 user group 1.2G Nov  1 20:27 ERR194158_NA12889_chr20_R2.fastq.gz

Align the chr20 WGS reads. Note that other genome builds may work better.

[user@cn3144]$ sve align -r /fdb/igenomes/Homo_sapiens/UCSC/hg38/Sequence/BWAIndex/genome.fa \
                    -o align  \
                    -R '@RG\tID:1\tSM:NA12889\tLB:D0UYCACXX\tPL:ILLUMINA' \
                    -t 16 \
                    fastq/ERR194158_NA12889_chr20_R1.fastq.gz \
                    fastq/ERR194158_NA12889_chr20_R2.fastq.gz 
[user@cn3144]$ ls -lh align

Call SV with a few different callers. Note that some of the callers don't appear to be functional.

[user@cn3144]$ for method in breakdancer cnvnator delly; do
                    sve call -r /fdb/igenomes/Homo_sapiens/UCSC/hg38/Sequence/BWAIndex/genome.fa \
                        -g hg38 \
                        -a $method \
                        -o calls \
                        -t 16 \
                        align/*.bam
                done

[user@cn3144]$ exit
salloc.exe: Relinquishing job allocation 46116226
[user@biowulf]$

Batch job
Most jobs should be run as batch jobs.

Create a batch input file (e.g. sve.sh), which uses the input file 'sve.in'. For example:

#!/bin/bash
module load sve || exit 1
cd /lscratch/$SLURM_JOB_ID || exit 1
mkdir fastq && cp -L ${SVE_TEST_DATA:-none}/* fastq
sve align -r /fdb/igenomes/Homo_sapiens/UCSC/hg38/Sequence/BWAIndex/genome.fa \
    -o align  \
    -R '@RG\tID:1\tSM:NA12889\tLB:D0UYCACXX\tPL:ILLUMINA' \
    -t 16 \
    fastq/ERR194158_NA12889_chr20_R1.fastq.gz \
    fastq/ERR194158_NA12889_chr20_R2.fastq.gz

Submit this job using the Slurm sbatch command.

sbatch[--cpus-per-task=16 --mem=30g --gres=lscratch:50 sve.sh