High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
ClustalW & ClustalO on Biowulf & Helix

Clustal W is a general purpose multiple alignment program for DNA or proteins. The sensitivity of the commonly used progressive multiple sequence alignment method has been greatly improved for the alignment of divergent protein sequences. It is designed to be run interactively, or to assign options via the command line.

Clustal Omega is a new development to the Clustal family, which offers a significant increase in scalability over previous versions, allowing hundreds of thousands of sequences to be aligned in only a few hours. It will also make use of multiple processors, where present. In addition, the quality of alignments is superior to previous versions, as measured by a range of popular benchmarks.

ClustalW is no longer being maintained or updated by its developers. ClustalO is actively being maintained. ClustalO can also run in multi-threaded mode, which may make your sequence alignments faster.

ClustalW references
Clustal-Omega references.

A A web interface to ClustalW and other multiple sequence alignment programs is available on our systems. It has been developed in-house, and allows the user to do a multiple sequence alignment using several different programs and compare the results.

ClustalW & ClustalO on Helix

Use 'module load clustalw' or 'module load clustalo' to set up the environment appropriately for each program. Then type 'clustalw' or 'clustalo' to run the program.

click to expand/collapse Sample session with Clustalw:

click to expand/collapse Sample session with ClustalO:

ClustalW/ClustalO batch job on Biowulf

The following example uses the test data in /usr/local/apps/clustalo/test_data/test1.fa.
Set up a batch script along the following lines:

#!/bin/bash

module load clustalw

# module load clustalo

cd /data/$USER/mydir

clustalw -INFILE=test1.fa -ALIGN 

#clustalo -i test1.fa -o test1.out --threads=$SLURM_CPUS_PER_TASK

The batch script above can be used as a model for either ClustalW or ClustalO runs. As written above, the clustalo commands are commented out.

Submit this job with

sbatch --mem=2g  --cpus-per-task=2 jobscript

From previous runs we know that the input file /usr/local/apps/clustalo/test_data/test1.fa requires 1.7 GB to process. Therefore the job is submitted requesting 2 GB of memory (--mem=2g).

Note that clustalo has a --threads option, but in many cases the application will not multi-thread for much of its run. Therefore it is not recommended that you set this value to more than 2 or 4.

Running ClustalW/ClustalO interactively on Biowulf

Allocate an interactive node with 'sinteractive' and then run the commands. Sample session:

[susanc@biowulf]$ sinteractive --mem=5g --cpus-per-task=2
salloc.exe: Granted job allocation 14738
slurm stepprolog here!
Begin slurm taskprolog!
End slurm taskprolog!

[susanc@p23]$ clustalo -i globins630.fa -o clustalo.out --threads=$SLURM_CPUS_PER_TASK

[susanc@p23]$ exit
exit
slurm stepepilog here!
salloc.exe: Relinquishing job allocation 14738
salloc.exe: Job allocation 14738 has been revoked.

ClustalW Help

click to expand/collapse Output from 'clustalw -help'

Clustal-Omega Help

click to expand/collapse Output from 'clustalo --help'

Documentation

Documentation for ClustalW and Clustal Omega at clustal.org