ScanIndel on Biowulf
ScanIndel is a small python program that ties together a number of third party tools to detect indels. From the ScanIndel GitHub page:
ScanIndel is a python program to detect indels (insertions and deletions) from NGS data by re-align and de novo assemble soft clipped reads.
References:
Documentation
Important Notes
- Module Name: scanindel (see the modules page for more information)
- ScanIndel uses bwa mem to align reads and is hardwired to use 8 threads. Therefore batch and interactive jobs should be submitted with --cpus-per-task=8.
- Example files in
$SCANINDEL_TEST_DATA
Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.
Allocate an interactive session and run the program. Sample session:
[user@biowulf]$ sinteractive --gres=lscratch:10 --mem=30g --cpus-per-task=8 salloc.exe: Pending job allocation 46116226 salloc.exe: job 46116226 queued and waiting for resources salloc.exe: job 46116226 has been allocated resources salloc.exe: Granted job allocation 46116226 salloc.exe: Waiting for resource configuration salloc.exe: Nodes cn3144 are ready for job [user@cn3144]$ module load scanindel [user@cn3144]$ cd /lscratch/$SLURM_JOB_ID [user@cn3144]$ cp -r $SCANINDEL_TEST_DATA . [user@cn3144]$ cd example [user@cn3144]$ mkdir hg19 [user@cn3144]$ ln -s /fdb/genomebrowser/gbdb/hg19/hg19.2bit hg19 [user@cn3144]$ cat input_data/config.txt #name=path bwa=/fdb/igenomes/Homo_sapiens/UCSC/hg19/Sequence/BWAIndex/genome.fa blat=hg19 freebayes=/fdb/igenomes/Homo_sapiens/UCSC/hg19/Sequence/BWAIndex/genome.fa [user@cn3144]$ ScanIndel.py -F 0 -p input_data/config.txt -i input_data/sample.txt ... [user@cn3144]$ ls -lh ... -rw-r--r-- 1 user group 8.4K May 21 14:27 test.assembly.indel.vcf -rw-r--r-- 1 user group 868 May 21 14:27 test.contigs.bam -rw-r--r-- 1 user group 28K May 21 14:27 test.contigs.bam.bai -rw-r--r-- 1 user group 8.4K May 21 14:27 test.mapping.indel.vcf -rw-r--r-- 1 user group 0 May 21 14:27 test.merged.indel.vcf -rw-r--r-- 1 user group 160K May 21 14:26 test.reads.bam -rw-r--r-- 1 user group 29K May 21 14:26 test.reads.bam.bai [user@cn3144]$ exit salloc.exe: Relinquishing job allocation 46116226 [user@biowulf]$
Batch job
Most jobs should be run as batch jobs.
Create a batch input file (e.g. ScanIndel.sh) similar to the following example:
#! /bin/bash module load scanindel/1.3 || exit 1 ScanIndel.py -F 0 -p input_data/config.txt -i input_data/sample.txt
Submit this job using the Slurm sbatch command.
sbatch --cpus-per-task=8 --mem=30g ScanIndel.sh
Swarm of Jobs
A swarm of jobs is an easy way to submit a set of independent commands requiring identical resources.
Create a swarmfile (e.g. ScanIndel.swarm). For example:
ScanIndel.py -F 0 -p input_data/config.txt -i input_data/sample1.txt ScanIndel.py -F 0 -p input_data/config.txt -i input_data/sample2.txt ScanIndel.py -F 0 -p input_data/config.txt -i input_data/sample3.txt ScanIndel.py -F 0 -p input_data/config.txt -i input_data/sample4.txt
Submit this job using the swarm command.
swarm -f ScanIndel.swarm -g 30 -t 8 --module ScanIndel/1.3where
-g # | Number of Gigabytes of memory required for each process (1 line in the swarm command file) |
-t # | Number of threads/CPUs required for each process (1 line in the swarm command file). |
--module ScanIndel | Loads the ScanIndel module for each subjob in the swarm |