Biowulf High Performance Computing at the NIH
Canvas on Biowulf

Canvas is a tool for calling copy number variants (CNVs) from human DNA sequencing data. It can work either with germline data, or paired tumor/normal samples. Its primary input is aligned reads (in .bam format), and its primary output is a report (in a .vcf file) giving the copy number status of the genome.

Canvas is developed by Illumina.

Canvas is developed on a Windows platform. It is built as a Singularity container on Biowulf. The script will start up the Singularity container, and run your command within it, as shown in the following examples. If you follow the examples described on the Illumina site, note that your command on Biowulf should start with the Canvas.dll command -- you do not need 'dotnet' or '/CanvasDIR/'.


Important Notes

Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

Allocate an interactive session and run the program. Sample session:

[user@biowulf]$ sinteractive -c 16 --mem 40g
salloc.exe: Pending job allocation 46116226
salloc.exe: job 46116226 queued and waiting for resources
salloc.exe: job 46116226 has been allocated resources
salloc.exe: Granted job allocation 46116226
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn3144 are ready for job

[user@cn3144 ~]$ mkdir -p /data/$USER/CANVAS_TEST

[user@cn3144 ~]$ cd /data/$USER/CANVAS_TEST

[user@cn3144 ~]$ cp /fdb/platinum_genomes/strelka_vcf/variants.vcf.gz .

[user@cn3144 ~]$ module load Canvas

[user@cn3144 ~]$ Canvas.dll -h
Usage: Canvas.exe [MODE] [OPTIONS]+
Available modes:
	Germline-WGS - CNV calling of a germline sample from whole genome sequencing data
	Somatic-Enrichment - CNV calling of a somatic sample from targeted sequencing data
	Somatic-WGS - CNV calling of a somatic sample from whole genome sequencing data
	Tumor-normal-enrichment - CNV calling of a tumor/normal pair from targeted sequencing data
	SmallPedigree-WGS - CNV calling of a small pedigree from whole genome sequencing data
  -h, --help                 show this message and exit
  -v, --version              print version and exit

[user@cn3144 ~]$ Canvas.dll SmallPedigree-WGS -b /fdb/platinum_genomes/bam/sorted.bam \
-r /fdb/Canvas/hg19/Sequence/kmer.fa \
-g /fdb/Canvas/hg19/Sequence/WholeGenomeFasta \
--sample-b-allele-vcf=/data/$USER/CANVAS_TEST/variants.vcf.gz \
-f /fdb/Canvas/hg19/Sequence/filter13.bed \
-o /data/$USER/CANVAS_TEST/out

[user@cn3144 ~]$ exit
salloc.exe: Relinquishing job allocation 46116226
[user@biowulf ~]$

Batch job
Most jobs should be run as batch jobs.

Create a batch input file (e.g. For example:

set -e
module load Canvas
Canvas.dll Germline-WGS -b /fdb/platinum_genomes/bam/sorted.bam \ 
-r /fdb/Canvas/hg19/Sequence/kmer.fa \
-g /fdb/Canvas/hg19/Sequence/WholeGenomeFasta \
--sample-b-allele-vcf=/data/teacher/canvas/variants.vcf.gz \
-f /fdb/Canvas/hg19/Sequence/filter13.bed \
-o /data/teacher/canvas/out 

Submit this job using the Slurm sbatch command.

sbatch --cpus-per-task=16 --mem=30g --time 4:00:00
Swarm of Jobs
A swarm of jobs is an easy way to submit a set of independent commands requiring identical resources.

Create a swarmfile (e.g. canvas.swarm). For example:

Canvas.dll Germline-WGS -b sample1.bam --sample-b-allele-vcf=sample1.vcf -o sample1 ...
Canvas.dll Germline-WGS -b sample2.bam --sample-b-allele-vcf=sample2.vcf -o sample2 ...
Canvas.dll Germline-WGS -b sample3.bam --sample-b-allele-vcf sample3.vcf -o sample3 ...

[...] rest of the required options (see batch script above) 

Submit this job using the swarm command.

swarm -f canvas.swarm [-g 30] [-t 16] --module Canvas
-g # Number of Gigabytes of memory required for each process (1 line in the swarm command file)
-t # Number of threads/CPUs required for each process (1 line in the swarm command file).
--module Canvas Loads the Canvas module for each subjob in the swarm