Biowulf High Performance Computing at the NIH
CONTRA on Biowulf

CONTRA is a tool for copy number variation (CNV) detection for targeted resequencing data such as those from whole-exome capture data. CONTRA calls copy number gains and losses for each target region with key strategies include the use of base-level log-ratios to remove GC-content bias, correction for an imbalanced library size effect on log-ratios, and the estimation of log-ratio variations via binning and interpolation. It takes standard alignment formats (BAM/SAM) and output in variant call format (VCF 4.0) for easy integration with other next generation sequencing analysis package.

CONTRA uses environment modules. Type

module load CONTRA

at the prompt.

Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

Allocate an interactive session and run the program. Sample session:

[user@biowulf]$ sinteractive --mem=8GB --cpus-per-task=2
salloc.exe: Pending job allocation 46116226
salloc.exe: job 46116226 queued and waiting for resources
salloc.exe: job 46116226 has been allocated resources
salloc.exe: Granted job allocation 46116226
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn3144 are ready for job

[user@cn3144 ~]$ module load CONTRA

[user@cn3144 ~]$ \
  -t $CONTRAHOME/Test_Files/0247401_D_BED_20090724_hg19_MERGED.bed \
  -s $CONTRAHOME/Test_Files/P0667T_GATKrealigned_duplicates_marked.bam \
  -c $CONTRAHOME/Test_Files/P0667N_GATKrealigned_duplicates_marked.bam \
  -f /fdb/GATK_resource_bundle/b37/human_g1k_v37.fasta \
  -o P0667Test

[user@cn3144 ~]$ exit
salloc.exe: Relinquishing job allocation 46116226
[user@biowulf ~]$
Running a single CONTRA batch job on Biowulf

Create a batch script (e.g.,

# ----  this file is called ---------
module load CONTRA 2>&1 \
  -t /path/to/bed.file \
  -s /path/to/test.bam \
  -c /path/to/control.bam \
  -f /path/to/reference.fasta \
  -o /path/to/output.folder

modify the bolded paths, and submit it like so:

sbatch --mem=8GB --cpus-per-task=2 requires two cpus, one per input BAM file. This example assumes that only 8 GB of memory is required.

Running a swarm of CONTRA batch jobs on Biowulf

If there are multiple sets of alignments, CONTRA can be run as a swarm. Create a swarmfile (e.g. CONTRA.swarm), -t /path/to/bed1.file -s /path/to/test1.bam -c /path/to/control1.bam -f /path/to/reference.fasta -o /path/to/output1.folder -t /path/to/bed2.file -s /path/to/test2.bam -c /path/to/control2.bam -f /path/to/reference.fasta -o /path/to/output2.folder -t /path/to/bed3.file -s /path/to/test3.bam -c /path/to/control3.bam -f /path/to/reference.fasta -o /path/to/output3.folder -t /path/to/bed4.file -s /path/to/test4.bam -c /path/to/control4.bam -f /path/to/reference.fasta -o /path/to/output4.folder -t /path/to/bed5.file -s /path/to/test5.bam -c /path/to/control5.bam -f /path/to/reference.fasta -o /path/to/output5.folder

again after modifying the paths as with the batch job above, and submit it like so:

swarm --module CONTRA -g 8 -t 2 -f CONTRA.swarm