High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
mfold on Biowulf & Helix

Description

MFOLD performs RNA and DNA secondary structure prediction using nearest neighbor thermodynamic rules. It can accept input in Genbank, GCG and Fasta formats.

References

See the mfold references page on the mfold site for a detailed list of references.

Web sites

On Helix

mfold can process a single sequence in genbank, embl, or fasta format. Fasta files with more than one sequence will result in errors. As an example, let's analyze a yeast tRNA-phe (using a simple constraint file with forces base pairing between the first 7 NTs and their complementary bases from the known structure):

helix$ module load mfold
helix$ mfold SEQ=/usr/local/apps/mfold/TEST_DATA/Phe-GAA-1-1-nointron.fa \
    AUX=/usr/local/apps/mfold/TEST_DATA/Phe-GAA-1-1-nointron.constraint \
    NA=RNA
mfold version 3.6
REUSE= NO
Constraint file is  /usr/local/apps/mfold/TEST_DATA/Phe-GAA-1-1-nointron.constraint
5Z3eI2.pnt created.
Sequence length is 73
Folding at 37 degrees using version 3.0 dat files.
10,20,30,40,50,60,70,
End of Fill
Save file created using nafold.
Minimum folding energy is -24.00 kcal/mol.
Energy increment is 1.20 kcal/mol.
H-num file created from plot file.
1,2,3,4,5,6,7,
Suboptimal foldings created.
Energy dot plot created.
Free energies re-evaluated using efn2 and added to ct file.
1       2       3       4       5       6       7
Structure plots generated.
All done.

mfold will generate output files in different formats for the energy dot plot, the optimal structure, and suboptimal structures (if any). The command above, for example, produces these two graphs amongst others:

Energy dot plot
Optimal structure

The loop containing the 'GAA' anticodon is at the top. See the output section of the mfold manual for more details

Batch job on Biowulf

The following batch script will analyze the first 10 sequences in the fasta file, one at a time:

#! /bin/bash
# filename: mfold_batch.sh
set -e

module load emboss || exit 1
module load mfold  || exit 2

mkdir -p mfold.out
cd mfold.out
MFA=/usr/local/apps/mfold/TEST_DATA/sacCer1-tRNAs.fa
for sname in $(grep '>' $MFA | grep -v '\?' | awk 'NR < 11 {print substr($1, 2)}'); do
    echo "processing $sname"
    mkdir -p $sname
    cd $sname
    seqret -outseq ${sname}.fa ${MFA}:${sname}
    mfold SEQ=${sname}.fa NA=RNA
    cd ..
done

A batch process is then started with

biowulf$ sbatch mfold_batch.sh
612486
Swarm of jobs on Biowulf

To process a number of sequences in parallel, set up a swarmfile like this:

mfold SEQ=Saccharomyces_cerevisiae_Ala-AGC-1-10.fa NA=RNA
mfold SEQ=Saccharomyces_cerevisiae_Ala-AGC-1-11.fa NA=RNA
mfold SEQ=Saccharomyces_cerevisiae_Ala-AGC-1-1.fa NA=RNA
mfold SEQ=Saccharomyces_cerevisiae_Ala-AGC-1-2.fa NA=RNA

and start your swarm with

biowulf$ swarm -f swarmfile --module mfold
613208
Interactive job on Biowulf

To use mfold interactively, allocated an interactive session with

biowulf$ sinteractive
salloc.exe: Granted job allocation 614363
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn0016 are ready for job
cn0016$ mfold SEQ=Saccharomyces_cerevisiae_Ala-AGC-1-2.fa NA=RNA
[...]
cn0016$ exit
biowulf$
Documentation