lancet on NIH HPC Systems

Lancet is a somatic variant caller (SNVs and indels) for short read data. Lancet uses a localized micro-assembly strategy to detect somatic mutation with high sensitivity and accuracy on a tumor/normal pair. Lancet is based on the colored de Bruijn graph assembly paradigm where tumor and normal reads are jointly analyzed within the same graph. On-the-fly repeat composition analysis and self-tuning k-mer strategy are used together to increase specificity in regions characterized by low complexity sequences. Lancet requires the raw reads to be aligned with BWA (See BWA description for more info)

Narzisi G, Corvelo A, Arora K, Bergmann E, Shah M, Musunuri R, Emde AK, Robine N, Vacic V, Zody MC. Lancet: genome-wide somatic variant calling using localized colored DeBruijn graphs. (2017) doi: bioRxiv 196311;

Batch job on Biowulf

Create a batch input file (e.g. template.sh), which uses the input file 'template.in'. For example:

#!/bin/bash
module load lancet

cd /data/$USER/mydir
lancet --tumor T.bam --normal N.bam --ref ref.fa --reg 22:1-51304566 --num-threads $SLURM_CPUS_PER_TASK > out.vcf

Submit this job using the Slurm sbatch command.

sbatch --cpus-per-task=8 template.sh
Swarm of Jobs on Biowulf

Create a swarmfile (e.g. template.swarm). For example:

lancet --tumor T1.bam --normal N1.bam --ref ref.fa --reg 22:1-51304566 --num-threads $SLURM_CPUS_PER_TASK --reg 22:1-51304566 > out1.vcf
lancet --tumor T2.bam --normal N2.bam --ref ref.fa --reg 22:1-51304566 --num-threads $SLURM_CPUS_PER_TASK --reg 22:1-51304566 > out2.vcf
lancet --tumor T3.bam --normal N3.bam --ref ref.fa --reg 22:1-51304566 --num-threads $SLURM_CPUS_PER_TASK  --reg 22:1-51304566 > out3.vcf
[...]

Submit this job using the swarm command.

swarm -t 8 -f template.swarm 
where '-t 8' indicates that each lancet command should spawn off 8 threads.
Interactive job on Biowulf

Sample session (user input in bold):

biowulf$ sinteractive --cpus-per-task=8
salloc.exe: Pending job allocation 50862939
salloc.exe: job 50862939 queued and waiting for resources
salloc.exe: job 50862939 has been allocated resources
salloc.exe: Granted job allocation 50862939
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn3133 are ready for job

cn3133 $  module load lancet
[+] Loading lancet version 1.0.1 ...

cn3133 $  lancet --tumor 231ESRP.25K.rep-1.bam --normal 231ESRP.25K.rep-2.bam --ref /fdb/genome/hg19/hg19.fa --num-threads 8 --reg 22:1-51304566
0 total windows to process

starting thread 1 on 0 windows
starting thread 2 on 0 windows
starting thread 3 on 0 windows
Process reads
Process reads
Process reads
starting thread 4 on 0 windows
starting thread 5 on 0 windows
Process reads
starting thread 6 on 0 windows
Process reads
starting thread Process reads
7 on 0 windows
starting thread 8Process reads
 on 0 windows
Process reads
Main: completed thread id :1 exiting with status :0
Main: completed thread id :2 exiting with status :0
Main: completed thread id :3 exiting with status :0
Main: completed thread id :4 exiting with status :0
Main: completed thread id :5 exiting with status :0
Main: completed thread id :6 exiting with status :0
Main: completed thread id :7 exiting with status :0
Main: completed thread id :8 exiting with status :0
Merge variants
Total # of skipped windows: 0 (-nan%)
- # of windows with SNVs only: 0
- # of windows with indels only: 0
- # of windows with softclips only: 0
- # of windows with indels or softclips: 0
- # of windows with SNVs or indels: 0
- # of windows with SNVs or softclips: 0
- # of windows with SNVs or indels or softclips: 0
Export variants to VCF file
##fileformat=VCFv4.1
##fileDate=Mon Oct  2 12:45:06 2017
##source=lancet 1.0.1 (beta), September 30 2017
##reference=/fdb/genome/hg19/hg19.fa
##INFO=
##INFO=
##INFO=
[....]
cn3133 $  exit
exit
salloc.exe: Relinquishing job allocation 50862939
biowulf$ 
Documentation