High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
SVseq on Biowulf & Helix

SVseq2 takes BAM file with soft-clip signature as input, is faster then SVseq1 and is calling both deletions and insertions.

Reference: An improved approach for accurate and efficient calling of structural variations with low-coverage sequence data , Jin ZhangEmail author, Jiayin Wang and Yufeng Wu.

There may be multiple versions of SVseq available. An easy way of selecting the version is to use modules. To see the modules available, type

module avail SVseq

To select a module use

module load SVseq/[version]

where [version] is the version of choice.

Note that most genome reference data is available in /fdb/igenomes/

On Helix

Sample session:

[user@helix ~]$ module load SVseq
[+] Loading samtools 1.3.1 ...
[+] Loading SVseq 2_2 ...

[user@helix ~]$ SVseq2_2 -r /fdb/igenomes/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta/genome.fa -b 2_7035GAAXX.bam -c chr21
chrM	1
chr1	16572
[...]
Batch job on Biowulf
Create a batch script similar to the following example:

#! /bin/bash
# this file is svseq.sh

module load SVseq
cd /data/$USER/mydir

SVseq2_2 -r /fdb/igenomes/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta/genome.fa -b 2_7035GAAXX.bam -c chr21 --o  SVseq.out

And submit to the queue with sbatch:

biowulf$ sbatch --mem=5g svseq.sh
The output from this run will appear in a file called SVseq.out, and any standard output/error in slurm-#####.out, where ##### is the job number.

Swarm of jobs on Biowulf

Create a swarm command file similar to the following example:

# this file is svseq.swarm
SVseq2_2 -r /fdb/igenomes/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta/genome.fa \
	-b 2_7035GAAXX.bam -c chr21 --o 2_7035GAAXX.svseq
SVseq2_2 -r /fdb/igenomes/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta/genome.fa \
	-b KM12.bam -c chr21 --o KM12.svseq
SVseq2_2 -r /fdb/igenomes/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta/genome.fa \
	-b UO-31.bam -c chr21 --o UO-31.svseq

And submit to the queue with swarm

biowulf$ swarm -g 5 -f svseq.swarm --module SVseq/2_2
Documentation
------------- SVseq2_2 README file -----------------------
SVseq2 is a newer version of SVseq1. 
(Actually quite different in terms of the basic utilities. This version doesn't use BWT anymore, but uses focal regions in finding deletions. However, SVseq2 also uses pair-end information, which is similar to SVseq1.) 

SVseq2 takes BAM file with soft-clip signature as input, is faster then SVseq1 and is calling both deletions and insertions.

usage:

(1) Calling deletions:
./SVseq2 -r reference -b bam_file_list -c chr --o output_file[result.txt] --c cut_off[3] --is insert std

must have:
-r: the reference in fasta format
-b: list of BAM file names
-c: chromesome Name

optional:
--c cut off
--o out put file name
--is insert size and standard deviation (If provided, then all the files in the list has the same insert size and std; If not provided, SVseq2 tests the values for each file.)

(2) Calling insertions:
./SVseq2 -insertion -b bam_file_list -c chr --o output_file [result.txt] --ci cut_off[3]

must have:
-b: list of BAM file names
-c: chromesome Name

optional:
--ci cut off
--o out put file name



For deletion calling, there is another option -nosplit2 which requires the program only use type I pattern.

SVseq2_2 ignores the repeats annotations (SVseq2_0_1 awares those) in the reads mapping step. In deed, some SVs can be caused by repeats. 

SVseq2_2 also fixes a bug. Prevoius, when a split read is partially mapped to the first chromosome at very a small coordinate, the focal region could be pointed to minus locations. This can cause a overflow and stops the program when the program performs other checkings.