High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
Pindel on Biowulf & Helix

Pindel can detect breakpoints of large deletions, medium sized insertions, inversions, tandem duplications and other structural variants at single-based resolution from next-gen sequence data. It uses a pattern growth approach to identify the breakpoints of these variants from paired-end short reads.

Sample file can be copied from:

$ cd /data/$USER/
$ cp -r /usr/local/apps/pindel/demo .

NOTE: Pindel by default put temporary files under /tmp which is not good for systems. Therefore, before submitting jobs, users need to create a directory in their private area (such as /scratch/$USER/pindel) for the temporary files and add the following line to the end of ~/.bashrc file:
export TMPDIR=/scratch/$USER/pindel/tmp/
if [ ! -d $TMPDIR ];then
mkdir -p $TMPDIR
fi 
  
Running on Helix

$ module load pindel
$ cd /data/$USER/demo
$ pindel -i simulated_config.txt -f simulated_reference.fa -o outfile -c ALL

Running a single batch job on Biowulf

1. Create a script file. The file will contain the lines similar to the lines below.

#!/bin/bash


module load pindel
cd /data/$USER/demo
pindel -i simulated_config.txt -f simulated_reference.fa -o outfile -c ALL

2. Submit the script on biowulf:

$ sbatch jobscript

If more momory is required (default 4gb), specify --mem=Mg, for example --mem=10g:

$ sbatch --mem=10g jobscript

Running a swarm of jobs on Biowulf

Setup a swarm command file:

  cd /data/$USER/dir1; pindel -i simulated_config.txt -f simulated_reference.fa -o outfile -c ALL
  cd /data/$USER/dir2; pindel -i simulated_config.txt -f simulated_reference.fa -o outfile -c ALL
  cd /data/$USER/dir3; pindel -i simulated_config.txt -f simulated_reference.fa -o outfile -c ALL
	[......]
  

Submit the swarm file, -f specify the swarmfile name, and --module will be loaded the required module for each command line in the file:

  $ swarm -f swarmfile --module pindel

If more memory is needed for each line of commands, the below example allocate 10g for each command:

  $ swarm -f swarmfile -g 10 --module pindel

For more information regarding running swarm, see swarm.html

Running an interactive job on Biowulf

It may be useful for debugging purposes to run jobs interactively. Such jobs should not be run on the Biowulf login node. Instead allocate an interactive node as described below, and run the interactive job there.

biowulf$ sinteractive 
salloc.exe: Granted job allocation 16535

cn999$ module load pindel
cn999$ cd /data/$USER/demo
cn999$ pindel -i simulated_config.txt -f simulated_reference.fa -o outfile -c ALL
[...etc...]

cn999$ exit
exit

biowulf$

Make sure to exit the job once finished.

If more memory is needed, use --mem. For example

biowulf$ sinteractive --mem=8g

Documentation

https://trac.nbic.nl/pindel/wiki