High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
Illumina Canvas Copy Number Variant Caller on Biowulf

Canvas is a tool for calling copy number variants (CNVs) from human DNA sequencing data. It can work either with germline data, or paired tumor/normal samples. Its primary input is aligned reads (in .bam format), and its primary output is a report (in a .vcf file) giving the copy number status of the genome.

Canvas is developed by Illumina. [Canvas website]

Canvas is developed on a Windows platform. It is built as a Singularity container on Biowulf. Within the container, Canvas is installed in /usr/local/Canvas-1.25/. The script Canvas.sh will start up the Singularity container, and run your command within it, as shown in the following examples. If you follow the examples described on the Illumina site, note that your command on Biowulf should start with the Canvas.dll command -- you do not need 'dotnet' or '/CanvasDIR/'.

On Helix

Canvas usually requires significant CPU and memory, and so should not be run on Helix.

Batch job on Biowulf

Create a batch input file (e.g. template.sh), which uses the input file 'template.in'. For example:

#!/bin/bash

module load Canvas

export BS=/data/$USER/illumina-basespace/

Canvas.dll SmallPedigree-WGS --bam=$BS/father.bam --bam=$BS/mother.bam --bam=$BS/child1.bam\
   --mother=mother --father=father --proband=child1 -r $BS/kmer.fa \
   -g /fdb/Canvas/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta \
   --sample-b-allele-vcf $BS/Pedigree.vcf.gz -f $BS/filter13.bed -o /data/$USER/Canvas/out \
   --ploidy-vcf="$BS/MultiSamplePloidy.vcf"

Submit this job using the Slurm sbatch command.

sbatch --cpus-per-task=24 --mem=50g --time=6:00:00 canvas.sh

Note: if you get an error about running out of memory, try

export COMPlus_gcAllowVeryLargeObjects=1
before running Canvas. See this github thread for more info.

Swarm of Jobs on Biowulf

Create a swarmfile (e.g.myfile.swarm). For example:

Canvas.dll SmallPedigree-WGS --bam=file1a.bam --bam=file1b.bam --bam=file1c.bam ......
Canvas.dll SmallPedigree-WGS --bam=file2a.bam --bam=file2b.bam --bam=file2c.bam ......
Canvas.dll SmallPedigree-WGS --bam=file3a.bam --bam=file3b.bam --bam=file3c.bam ......

Submit this job using the swarm command.

swarm -f  myfile.swarm -t 24 -g 20 --module Canvas  --time=6:00:00 
Interactive job on Biowulf

Allocate an interactive job and run Canvas on there. Sample session: (user input in bold)

[user@biowulf] sinteractive --cpus-per-task=24 --mem=50g
salloc.exe: Pending job allocation 44234078
salloc.exe: job 44234078 queued and waiting for resources
salloc.exe: job 44234078 has been allocated resources
salloc.exe: Granted job allocation 44234078
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn3261 are ready for job

[user@cn3261 ~]$ module load Canvas
[+] Loading Canvas 1.25 on cn3261
[+] Loading singularity 2.3.1 on cn3261

[[user@cn3261 ~]$ export BS=/scratch/$USER/illumina-bs

[user@cn3261 ~]$ Canvas.dll SmallPedigree-WGS --bam=$BS/father.bam --bam=$BS/mother.bam --bam=$BS/child1.bam \
    --mother=mother --father=father --proband=child1 -r $BS/kmer.fa \
    -g $BS/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta \
    --sample-b-allele-vcf $BS/Pedigree.vcf.gz -f $BS/filter13.bed -o /data/$USER/Canvas/gHapMixDemo \
    --ploidy-vcf="$BS/MultiSamplePloidy.vcf"
WARNING: Bind file source does not exist on host: /etc/resolv.conf
2017-06-29T19:26:10,Running Canvas SmallPedigree-WGS 1.25.0.49+master
2017-06-29T19:26:10,Command-line arguments: SmallPedigree-WGS --bam=/scratch/$USER/illumina-bs//father.bam --bam=/scratch/$USER/illumina-bs//mother.bam --bam=/scratch/$USER/illumina-bs//child1.bam --mother=mother --father=father --proband=child1 -r /scratch/$USER/illumina-bs//kmer.fa -g /scratch/$USER/illumina-bs//Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta --sample-b-allele-vcf /scratch/$USER/illumina-bs//Pedigree.vcf.gz -f /scratch/$USER/illumina-bs//filter13.bed -o /data/$USER/Canvas/gHapMixDemo --ploidy-vcf=/scratch/$USER/illumina-bs//MultiSamplePloidy.vcf
2017-06-29T19:26:10,Running checkpoint 01: CanvasSNV
2017-06-29T19:26:10,Running checkpoint 02: CanvasBin
Invoking 24 processor jobs...for sample father
CanvasSNV start for sample father
2017-06-29T19:26:10,Launch process: /usr/bin/dotnet /usr/local/Canvas-1.25/CanvasSNV.dll  -c chr15 -v /scratch/$USER/illumina-bs/Pedigree.vcf.gz -b /scratch/$USER/illumina-bs/father.bam -o /data/$USER/Canvas/gHapMixDemo/TempCNV_father/chr15-father.SNV.txt.gz -n father
2017-06-29T19:26:10,Launch process: /usr/bin/dotnet /usr/local/Canvas-1.25/CanvasSNV.dll  -c chr21 -v /scratch/$USER/illumina-bs/Pedigree.vcf.gz -b /scratch/$USER/illumina-bs/father.bam -o /data/$USER/Canvas/gHapMixDemo/TempCNV_father/chr21-father.SNV.txt.gz -n father
[...]

[user@cn3261 ~]$ exit
Documentation