High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
CONTRA on Biowulf & Helix

CONTRA is a tool for copy number variation (CNV) detection for targeted resequencing data such as those from whole-exome capture data. CONTRA calls copy number gains and losses for each target region with key strategies include the use of base-level log-ratios to remove GC-content bias, correction for an imbalanced library size effect on log-ratios, and the estimation of log-ratio variations via binning and interpolation. It takes standard alignment formats (BAM/SAM) and output in variant call format (VCF 4.0) for easy integration with other next generation sequencing analysis package.

CONTRA uses environment modules. Type

module load CONTRA

at the prompt.

Running CONTRA on Helix
module load CONTRA
contra.py \
  -t $CONTRAHOME/Test_Files/0247401_D_BED_20090724_hg19_MERGED.bed \
  -s $CONTRAHOME/Test_Files/P0667T_GATKrealigned_duplicates_marked.bam \
  -c $CONTRAHOME/Test_Files/P0667N_GATKrealigned_duplicates_marked.bam \
  -f /fdb/GATK_resource_bundle/b37/human_g1k_v37.fasta \
  -o P0667Test
Running a single CONTRA batch job on Biowulf

Create a batch script (e.g. CONTRA.sh),

#!/bin/bash
# ----  this file is called CONTRA.sh ---------
module load CONTRA 2>&1
contra.py \
  -t /path/to/bed.file \
  -s /path/to/test.bam \
  -c /path/to/control.bam \
  -f /path/to/reference.fasta \
  -o /path/to/output.folder

modify the bolded paths, and submit it like so:

sbatch --mem=8GB --cpus-per-task=2 CONTRA.sh

contra.py requires two cpus, one per input BAM file. This example assumes that only 8 GB of memory is required.

Running a swarm of CONTRA batch jobs on Biowulf

If there are multiple sets of alignments, CONTRA can be run as a swarm. Create a swarmfile (e.g. CONTRA.swarm),

again after modifying the paths as with the batch job above, and submit it like so:

swarm --module CONTRA -g 8 -t 2 -f CONTRA.swarm
Running an interactive job on Biowulf

Allocate a node and run the command:

sinteractive --mem=8GB --cpus-per-task=2
module load CONTRA
contra.py \
  -t $CONTRAHOME/Test_Files/0247401_D_BED_20090724_hg19_MERGED.bed \
  -s $CONTRAHOME/Test_Files/P0667T_GATKrealigned_duplicates_marked.bam \
  -c $CONTRAHOME/Test_Files/P0667N_GATKrealigned_duplicates_marked.bam \
  -f <(zcat /fdb/GATK_resource_bundle/b37/human_g1k_v37.fasta.gz) \
  -o P0667Test
Documentation