High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
smoove: structural variant calling and genotyping with existing tools, but, smoothly

smoove simplifies and speeds calling and genotyping SVs for short reads. It also improves specificity by removing many spurious alignment signals that are indicative of low-level noise and often contribute to spurious calls.

Documentation
Important Notes

Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

Allocate an interactive session and run the program. Sample session:

[user@biowulf]$ sinteractive
[user@cn3200 ~]$module load smoove 
[+] Loading singularity  on cn3200 
[+] Loading smoove 0.2.1  ...

[user@cn3200 ~]$ mkdir out_dir
[user@cn3200 ~]$ REF=$SMOOVE_DATA/subset.fa.gz
[user@cn3200 ~]$ export TMPDIR=`pwd`

[user@cn3200 ~]$ smoove -h
smoove version: 0.2.1

smoove calls several programs. Those with 'Y' are found on your $PATH. Only those with '*' are required.

 *[Y] bgzip [ sort   -> (compress) ->   index ]
 *[Y] gsort [(sort)  ->  compress   ->  index ]
 *[Y] tabix [ sort   ->  compress   -> (index)]
 *[Y] lumpy
 *[Y] lumpy_filter
 *[Y] samtools
 *[Y] svtyper
 *[Y] mosdepth [extra filtering of split and discordant files for better scaling]

  [Y] duphold [(optional) annotate calls with depth changes]
  [Y] svtools [only needed for large cohorts].

Available sub-commands are below. Each can be run with -h for additional help.

call     : call lumpy (and optionally svtyper)
merge    : merge and sort (using svtools) calls from multiple samples
genotype : parallelize svtyper on an input VCF
paste    : square final calls from multiple samples (each with same number of variants)
annotate : annotate a VCF with gene and quality of SV call
hipstr   : run hipSTR in parallel
cnvnator : run cnvnator in parallel
duphold  : run duphold in parallel (this can be done by adding a flag to call or genotype)

[user@cn3200 ~]$ smoove call -o out_dir --processes 2 -x --genotype --fasta $REF --name NA24385-svs-cram --excludechroms '~^GL,~^HLA,~_random,~^chrUn,~alt,~decoy' $SMOOVE_DATA/NA24385_chr1.cram
[smoove] 2018/11/13 19:39:17 starting with version 0.1.11
[smoove] 2018/11/13 19:39:17 calculating bam stats for 1 bams
[smoove] 2018/11/13 19:39:18 [samfaipath] build FASTA index...
[smoove]: 2018/11/13 19:39:19 finished process: lumpy-filter (lumpy_filter -f /usr/local/apps/smoove/0.2.1/test_data/subset.fa.gz /usr/local/apps/smoove/0.2.1/tes) in user-time:696.679ms system-time:165.358ms
[smoove] 2018/11/13 19:39:19 done calculating bam stats
[smoove] 2018/11/13 19:39:19 removed 0 alignments out of 16201 (0.00%) with depth > 800 or from excluded chroms from NA24385_chr1.disc.bam in 0 seconds
[smoove] 2018/11/13 19:39:19 removed 834 alignments out of 16201 (5.15%) that were bad interchromosomals or flanked-splitters from NA24385_chr1.disc.bam
[smoove] 2018/11/13 19:39:19 removed 5135 singletons out of 15367 reads (33.42%) from NA24385_chr1.disc.bam in 0 seconds
[smoove] 2018/11/13 19:39:19 removed 0 alignments out of 13212 (0.00%) with depth > 800 or from excluded chroms from NA24385_chr1.split.bam in 0 seconds
[smoove] 2018/11/13 19:39:19 removed 1205 alignments out of 13212 (9.12%) that were bad interchromosomals or flanked-splitters from NA24385_chr1.split.bam
[smoove] 2018/11/13 19:39:19 removed 4617 singletons out of 12007 reads (38.45%) from NA24385_chr1.split.bam in 0 seconds
[smoove] 2018/11/13 19:39:19 starting lumpy
[smoove] 2018/11/13 19:39:19 wrote lumpy command to out_dir/NA24385-svs-cram-lumpy-cmd.sh
[smoove] 2018/11/13 19:39:19 writing sorted, indexed file to out_dir/NA24385-svs-cram-smoove.genotyped.vcf.gz
[smoove] 2018/11/13 19:39:19 excluding variants with all unknown or homozygous reference genotypes
[smoove] 2018/11/13 19:39:19 > gsort version 0.0.6
[smoove] 2018/11/13 19:39:19 657	
[smoove] 2018/11/13 19:39:19 0
[smoove] 2018/11/13 19:39:19 1	1000000
[smoove] 2018/11/13 19:39:26 wrote sorted, indexed file to out_dir/NA24385-svs-cram-smoove.genotyped.vcf.gz

[user@cn3200 ~]$ smoove call -o out_dir --processes 2 --fasta $REF --name NA24385-svs-cram --excludechroms '~^GL,~^HLA,~_random,~^chrUn,~alt,~decoy' $SMOOVE_DATA/NA24385_chr1.cram
[smoove] 2018/11/13 19:42:20 starting with version 0.1.11
[smoove] 2018/11/13 19:42:20 calculating bam stats for 1 bams
[smoove] 2018/11/13 19:42:21 done calculating bam stats
[smoove] 2018/11/13 19:42:21 removed 0 alignments out of 10232 (0.00%) with depth > 800 or from excluded chroms from NA24385_chr1.disc.bam in 0 seconds
[smoove] 2018/11/13 19:42:21 removed 0 alignments out of 10232 (0.00%) that were bad interchromosomals or flanked-splitters from NA24385_chr1.disc.bam
[smoove] 2018/11/13 19:42:21 removed 0 singletons out of 10232 reads (0.00%) from NA24385_chr1.disc.bam in 0 seconds
[smoove] 2018/11/13 19:42:21 removed 0 alignments out of 7390 (0.00%) with depth > 800 or from excluded chroms from NA24385_chr1.split.bam in 0 seconds
[smoove] 2018/11/13 19:42:21 removed 0 alignments out of 7390 (0.00%) that were bad interchromosomals or flanked-splitters from NA24385_chr1.split.bam
[smoove] 2018/11/13 19:42:21 removed 0 singletons out of 7390 reads (0.00%) from NA24385_chr1.split.bam in 0 seconds
[smoove] 2018/11/13 19:42:21 starting lumpy
[smoove] 2018/11/13 19:42:21 wrote lumpy command to out_dir/NA24385-svs-cram-lumpy-cmd.sh
[smoove] 2018/11/13 19:42:21 657	
[smoove] 2018/11/13 19:42:21 0
[smoove] 2018/11/13 19:42:21 1	1000000
[smoove] 2018/11/13 19:42:22 wrote to out_dir/NA24385-svs-cram-smoove.vcf.gz

[user@cn3200 ~]$ smoove merge --fasta $REF -o out_dir --name NA24385-svs-cram out_dir/NA24385-svs-cram-smoove.vcf.gz
[smoove] 2018/11/13 19:43:05 starting with version 0.1.11
[smoove] 2018/11/13 19:43:05 merging 1 files
[smoove] 2018/11/13 19:43:05 finished sorting 1 files; merge starting.
[smoove] 2018/11/13 19:43:09 wrote sites file to out_dir/NA24385-svs-cram.sites.vcf.gz

[user@cn3200 ~]$ smoove genotype -x --fasta $REF -o out_dir --name NA24385-svs-cram-g --vcf out_dir/NA24385-svs-cram.sites.vcf.gz $SMOOVE_DATA/NA24385_chr1.cram
[smoove] 2018/11/13 19:44:22 starting with version 0.1.11
[smoove] 2018/11/13 19:44:22 writing sorted, indexed file to out_dir/NA24385-svs-cram-g-smoove.genotyped.vcf.gz
[smoove] 2018/11/13 19:44:22 > gsort version 0.0.6
[smoove] 2018/11/13 19:44:28 wrote sorted, indexed file to out_dir/NA24385-svs-cram-g-smoove.genotyped.vcf.gz

End the interactive session:
[user@cn3200 ~]$ exit
salloc.exe: Relinquishing job allocation 46116226
[user@biowulf ~]$