High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
GenomeTools on Biowulf & Helix

The GenomeTools genome analysis system is a collection of bioinformatics tools(in the realm of genome informatics) combined into a single binary named gt. It is based on a C library named “libgenometools” which consists of several modules.

Running on Helix
$ module load genometools
$ cd /data/$USER/
$ gt genomediff [options] (INDEX | -indexname NAME SEQFILE SEQFILE [...])

Running a single batch job on Biowulf

1. Create a script file. The file will contain the lines similar to the lines below.

#!/bin/bash


module load genometools
cd /data/$USER/
gt genomediff [options] (INDEX | -indexname NAME SEQFILE SEQFILE [...])

2. Submit the script on biowulf:

$ sbatch jobscript

For more memory requirement (default 4gb), use --mem flag:

$ sbatch --mem=10g jobscript

Running a swarm of jobs on Biowulf

Setup a swarm command file:

  cd /data/$USER/dir1; gt genomediff [options] (INDEX | -indexname NAME SEQFILE SEQFILE [...])
  cd /data/$USER/dir2; gt genomediff [options] (INDEX | -indexname NAME SEQFILE SEQFILE [...])
  cd /data/$USER/dir3; gt genomediff [options] (INDEX | -indexname NAME SEQFILE SEQFILE [...])
	[......]
  

Submit the swarm file:

  $ swarm -f swarmfile --module genometools

-f: specify the swarmfile name
--module: load the required module for each command line in the file

To allocate more memory, use -g flag:

  $ swarm -f swarmfile -g 20 --module genometools

For more information regarding running swarm, see swarm.html

Running an interactive job on Biowulf

It may be useful for debugging purposes to run jobs interactively. Such jobs should not be run on the Biowulf login node. Instead allocate an interactive node as described below, and run the interactive job there.

biowulf$ sinteractive
salloc.exe: Granted job allocation 16535

cn999$ module load genometools
cn999$ cd /data/$USER/dir
cn999$ gt genomediff [options] (INDEX | -indexname NAME SEQFILE SEQFILE [...])
[...etc...]

cn999$ exit
exit

biowulf$

Make sure to exit the job once finished.

If more memory is needed, use --mem. For example

biowulf$ sinteractive --mem=20g

Documentation

http://genometools.org/manuals.html