High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
Cgatools on Biowulf & Helix

The Complete Genomics Analysis Tools (cgatools) is an open source project to provide tools for downstream analysis of Complete Genomics data. The general areas of functionality include genome comparison, format conversion, and reference tools.

Example files can be downloaded from
ftp://ftp2.completegenomics.com/

The reference files can be downloaded from
ftp://ftp.completegenomics.com/ReferenceFiles/

Running on Helix

$ module load cgatools
$ cd /data/$USER/Examples
$ cgatools fasta2crr --input build36.fa.bz2 --output build36.crr

Running a single batch job on Biowulf

1. Create a script file similar to the lines below.

#!/bin/bash

module load cgatools
cd /data/$USER/Examples
cgatools fasta2crr --input build36.fa.bz2 --output build36.crr

2. Submit the script on biowulf:

$ sbatch jobscript

For more memory requirement (default 4gb), use --mem flag:

$ sbatch --mem=10g jobscript

Running a swarm of jobs on Biowulf

Setup a swarm command file:

  cd /data/$USER/dir1; cgatools fasta2crr --input build36.fa.bz2 --output build36.crr
  cd /data/$USER/dir2; cgatools fasta2crr --input build36.fa.bz2 --output build36.crr
  cd /data/$USER/dir3; cgatools fasta2crr --input build36.fa.bz2 --output build36.crr
	[......]

Submit the swarm file:

  $ swarm -f swarmfile --module cgatools

-f: specify the swarmfile name
--module: set environmental variables for each command line in the file

To allocate more memory, use -g flag:

  $ swarm -f swarmfile -g 10 --module cgatools

-g: allocate more memory

For more information regarding running swarm, see swarm.html

Running an interactive job on Biowulf

It may be useful for debugging purposes to run jobs interactively. Such jobs should not be run on the Biowulf login node. Instead allocate an interactive node as described below, and run the interactive job there.

biowulf$ sinteractive 
salloc.exe: Granted job allocation 16535

cn999$ module load cgatools
cn999$ cd /data/$USER/Examples
cn999$ cgatools fasta2crr --input build36.fa.bz2 --output build36.crr[...etc...]

cn999$ exit
exit

biowulf$

Make sure to exit the job once finished.

If more memory is needed, use --mem flag. For example

biowulf$ sinteractive --mem=10g

Documentation

http://cgatools.sourceforge.net/