High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
Mafft on Biowulf & Helix

Description

MAFFT is a multiple sequence alignment program for unix-like operating systems.  It offers a range of multiple alignment methods, L-INS-i (accurate; for alignment of <∼200 sequences), FFT-NS-2 (fast; for alignment of <∼10,000 sequences), etc.

Running on Helix

1. Example files can be downloaded first:
$ cd /data/$USER/dir
$ wget http://mafft.cbrc.jp/alignment/software/ex1

2. To run mafft:

$ cd /data/$USER/dir
$ module load mafft
$ mafft ex1 > out1

Running a single batch job on Biowulf

1. Create a batch script along the following lines:

#!/bin/bash 

# this script is called myscript

module load mafft
cd /data/$USER/dir
mafft input > output

2. on the biowulf login node, submit the job:

$ sbatch myscript

Running a swarm of batch jobs on Biowulf

1. Create a swarm file along the following lines:

cd dir1; mafft input > output
cd dir2; mafft input > output
cd dir3; mafft input > output
[....]

Submit this swarm with:

swarm -f swarmfile --module mafft

For more information regarding running swarm, see swarm.html

 

Running an interactive job on Biowulf

It may be useful for debugging purposes to run jobs interactively. Such jobs should not be run on the Biowulf login node. Instead allocate an interactive node as described below, and run the interactive job there.

biowulf$ sinteractive 
salloc.exe: Granted job allocation 16535
slurm stepprolog here!
                                             Begin slurm taskprolog!
End slurm taskprolog!
cn999$ module load mafft
cn999$ cd /data/$USER/dir
cn999$ mafft input > output
[...etc...]

cn999$ exit
exit

biowulf$

Make sure to exit the job once you have finished your run.

If more memory is needed it can be requested with --mem. For example

biowulf$ sinteractive --mem=8g

Documentation

http://mafft.cbrc.jp/alignment/software/