Biowulf High Performance Computing at the NIH
svtk on Biowulf

SVTK: Utilities for consolidating, filtering, resolving, and annotating structural variants. It was wrote by Talkowski lab.


Important Notes

Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

Allocate an interactive session and run the program. Sample session:

[user@biowulf]$ sinteractive --mem=4g
salloc.exe: Pending job allocation 46116226
salloc.exe: job 46116226 queued and waiting for resources
salloc.exe: job 46116226 has been allocated resources
salloc.exe: Granted job allocation 46116226
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn3144 are ready for job

[user@cn3144 ~]$ module load svtk
[+] Loading svtk  0.1

[user@cn3144 ~]$ svtk
usage: svtk [-h]  [options]
[ Preprocessing ]
    standardize    Convert SV calls to a standardized format.
    rdtest2vcf     Convert an RdTest-formatted bed to a standardized VCF.
    vcf2bed        Convert a standardized VCF to an RdTest-formatted bed.

[ Algorithm integration ]
    vcfcluster     Cluster SV calls from a list of VCFs. (Generally PE/SR.)
    bedcluster     Cluster SV calls from a BED. (Generally depth.)

[ Statistics ]
    count-svtypes  Count instances of each svtype in each sample in a VCF

[ Read-depth analysis ]
    bincov         Calculate normalized genome-wide depth of coverage.
    rdtest*        Calculate comparative coverage statistics at CNV sites.

[ PE/SR analysis ]
    collect-pesr   Count clipped reads and extract discordant pairs genomewide.
    sr-test        Calculate enrichment of clipped reads at SV breakpoints.
    pe-test        Calculate enrichment of discordant pairs at SV breakpoints.

[ Variant analysis ]
    resolve        Resolve complex variants from VCF of breakpoints.
    annotate       Annotate genic effects and ovelrap with noncoding elements.

[user@cn3144 ~]$ exit
salloc.exe: Relinquishing job allocation 46116226
[user@biowulf ~]$

Batch job
Most jobs should be run as batch jobs.

Create a batch input file (e.g., which uses the input file ''. For example:

#! /bin/bash

module load svtk || exit 1
svtk  [options]

Submit this job using the Slurm sbatch command.

sbatch --cpus-per-task=2 --mem=10g  --gres=lscratch:20
Swarm of Jobs
A swarm of jobs is an easy way to submit a set of independent commands requiring identical resources.

Create a swarmfile (e.g. svtk.swarm). For example:

svtk  [options]
svtk  [options]

Submit this job using the swarm command.

swarm -f svtk.swarm -g 10 --module svtk --gres=lscratch:10
-g # Number of Gigabytes of memory required for each process (1 line in the swarm command file)
-t # Number of threads/CPUs required for each process (1 line in the swarm command file).
--module svtk Loads the svtk module for each subjob in the swarm