High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
ScanIndel on Biowulf & Helix

Description

ScanIndel is a small python program that ties together a number of tools ( bwa, blat, samtools, freebayes, inchworm) to detect insertions and deletions (indels) in NGS data.

There may be multiple versions of ScanIndel available. An easy way of selecting the version is to use modules. To see the modules available, type

module avail ScanIndel 

To select a module use

module load scanindel/[version]

where [version] is the version of choice.

ScanIndel uses bwa mem to align reads and is hardwired to use 8 threads. Therefore batch and interactive jobs should be submitted with --cpus-per-task=8.

Environment variables set

Dependencies

All dependencies are loaded automatically by the scanindel environment module.

References

Documentation

On Helix

Set up the environment and copy the example data

helix$ module load scanindel
helix$ cp -r $SCANINDEL_TESTDATA .
helix$ cd example
helix$ # set up a directory for blat
helix$ mkdir hg19
helix$ ln -s /fdb/genomebrowser/gbdb/hg19/hg19.2bit hg19

And run scanindel

helix$ ScanIndel.py -F 0 -p input_data/config.txt -i input_data/sample.txt

Note that while it is ok to run small test data sets such as this on helix, please use batch jobs or interactive sessions to run real data.

Batch job on Biowulf

Create a batch script similar to the following example:

#! /bin/bash
#SBATCH --mem=10g
#SBATCH --cpus-per-task=8
# this file is scanindel_job.sh

module load scanindel || exit 1
ScanIndel.py -F 0 -p input_data/config.txt -i input_data/sample.txt

Submit to the queue with sbatch:

b2$ sbatch scanindel_job.sh
Swarm of jobs on Biowulf

Create a swarm command file similar to the following example:

# this file is scanindel_jobs.swarm
ScanIndel.py -F 0 -p input_data/config.txt -i input_data/sample1.txt
ScanIndel.py -F 0 -p input_data/config.txt -i input_data/sample2.txt
ScanIndel.py -F 0 -p input_data/config.txt -i input_data/sample3.txt
ScanIndel.py -F 0 -p input_data/config.txt -i input_data/sample4.txt

And submit to the queue with swarm

b2$ swarm -f scanindel_jobs.swarm -t 8 -g 10
Interactive job on Biowulf

Allocate an interactive session with sinteractive and use as described above

b2$ sinteractive 
node$ module load scanindel
node$ ...
node$ exit
b2$