Lancet is a somatic variant caller (SNVs and indels) for short read data. Lancet uses a localized micro-assembly strategy to detect somatic mutation with high sensitivity and accuracy on a tumor/normal pair. Lancet is based on the colored de Bruijn graph assembly paradigm where tumor and normal reads are jointly analyzed within the same graph. On-the-fly repeat composition analysis and self-tuning k-mer strategy are used together to increase specificity in regions characterized by low complexity sequences. Lancet requires the raw reads to be aligned with BWA (See BWA description for more info)
Narzisi G, Corvelo A, Arora K, Bergmann E, Shah M, Musunuri R, Emde AK, Robine N, Vacic V, Zody MC. Lancet: genome-wide somatic variant calling using localized colored DeBruijn graphs. (2017) doi: bioRxiv 196311;
Create a batch input file (e.g. template.sh), which uses the input file 'template.in'. For example:
#!/bin/bash module load lancet cd /data/$USER/mydir lancet --tumor T.bam --normal N.bam --ref ref.fa --reg 22:1-51304566 --num-threads $SLURM_CPUS_PER_TASK > out.vcf
Submit this job using the Slurm sbatch command.
sbatch --cpus-per-task=8 template.sh
Create a swarmfile (e.g. template.swarm). For example:
lancet --tumor T1.bam --normal N1.bam --ref ref.fa --reg 22:1-51304566 --num-threads $SLURM_CPUS_PER_TASK --reg 22:1-51304566 > out1.vcf lancet --tumor T2.bam --normal N2.bam --ref ref.fa --reg 22:1-51304566 --num-threads $SLURM_CPUS_PER_TASK --reg 22:1-51304566 > out2.vcf lancet --tumor T3.bam --normal N3.bam --ref ref.fa --reg 22:1-51304566 --num-threads $SLURM_CPUS_PER_TASK --reg 22:1-51304566 > out3.vcf [...]
Submit this job using the swarm command.
swarm -t 8 -f template.swarmwhere '-t 8' indicates that each lancet command should spawn off 8 threads.
Sample session (user input in bold):
biowulf$ sinteractive --cpus-per-task=8 salloc.exe: Pending job allocation 50862939 salloc.exe: job 50862939 queued and waiting for resources salloc.exe: job 50862939 has been allocated resources salloc.exe: Granted job allocation 50862939 salloc.exe: Waiting for resource configuration salloc.exe: Nodes cn3133 are ready for job cn3133 $ module load lancet [+] Loading lancet version 1.0.1 ... cn3133 $ lancet --tumor 231ESRP.25K.rep-1.bam --normal 231ESRP.25K.rep-2.bam --ref /fdb/genome/hg19/hg19.fa --num-threads 8 --reg 22:1-51304566 0 total windows to process starting thread 1 on 0 windows starting thread 2 on 0 windows starting thread 3 on 0 windows Process reads Process reads Process reads starting thread 4 on 0 windows starting thread 5 on 0 windows Process reads starting thread 6 on 0 windows Process reads starting thread Process reads 7 on 0 windows starting thread 8Process reads on 0 windows Process reads Main: completed thread id :1 exiting with status :0 Main: completed thread id :2 exiting with status :0 Main: completed thread id :3 exiting with status :0 Main: completed thread id :4 exiting with status :0 Main: completed thread id :5 exiting with status :0 Main: completed thread id :6 exiting with status :0 Main: completed thread id :7 exiting with status :0 Main: completed thread id :8 exiting with status :0 Merge variants Total # of skipped windows: 0 (-nan%) - # of windows with SNVs only: 0 - # of windows with indels only: 0 - # of windows with softclips only: 0 - # of windows with indels or softclips: 0 - # of windows with SNVs or indels: 0 - # of windows with SNVs or softclips: 0 - # of windows with SNVs or indels or softclips: 0 Export variants to VCF file ##fileformat=VCFv4.1 ##fileDate=Mon Oct 2 12:45:06 2017 ##source=lancet 1.0.1 (beta), September 30 2017 ##reference=/fdb/genome/hg19/hg19.fa ##INFO=##INFO= ##INFO= [....] cn3133 $ exit exit salloc.exe: Relinquishing job allocation 50862939 biowulf$