Biowulf High Performance Computing at the NIH
randfold on Biowulf

Description

Randfold computes the probability that, for a given RNA sequence, the Minimum Free Energy (MFE) of the secondary structure is different from a distribution of MFE computed with random sequences obtained by permuting the input sequence.

RandFold is not a parallel program. Small numbers of Randfold jobs, or interactive Randfold runs, can be run on helix or on biowulf interactive nodes. If you have many Randfold jobs to run, the swarm utility is recommended.

References

Web sites

Running randfold on Helix

Example: running randfold on a single miRNA sequence


helix$ module load randfold
helix$ randfold
FATAL: Usage: randfold <method> <file name> <number of randomizations>

methods:
-s simple mononucleotide shuffling
-d dinucleotide shuffling
-m markov chain 1 shuffling

Example: randfold -d let7.tfa 999

helix$ cat > cel-let7.fa <<EOF
>cel-let-7 Caenorhabditis elegans let-7 precursor RNA
UACACUGUGGAUCCGGUGAGGUAGUAGGUUGUAUAGUUUGGAAUAUUACCACCGGUGAAC
UAUGCAAUUUUCUACCUUACCGGAGACAGAACUCUUCGA
EOF
helix$ randfold -d cel-let7.fa 999
cel-let-7       -42.90  0.001000
Running a single randfold batch job on Biowulf2

Running randfold on a set of mouse miRNA sequences in series (randfold will process one sequence at a time).

First, let's create an example input data set - all putative mouse miRNA hairpins from mirBase:


biowulf2$ cd /data/$USER/test_data/randfold
biowulf2$ wget "ftp://mirbase.org/pub/mirbase/CURRENT/hairpin.fa.gz"
biowulf2$ gunzip hairpin.fa.gz
biowulf2$ module load emboss
biowulf2$ seqret -outseq mouse_hairpins.fa hairpin.fa:mmu-*

Then set up a batch file


#! /bin/bash
#SBATCH --job-name=randfold
set -e

module load randfold
inf=/data/$USER/test_data/randfold/mouse_hairpins.fa
outf=/data/$USER/test_data/randfold/mouse_hairpins.fa.randfold
randfold -d $inf 99 > $outf
The batch script is submitted for processing with

sbatch randfold_batch_script.sh
Running a swarm of randfold batch jobs on Biowulf2

Again, first create a set of input files. This time, one hairpin per file so swarm can parallelize over all hairpins:


biowulf2$ cd /data/$USER/test_data/randfold
biowulf2$ wget "ftp://mirbase.org/pub/mirbase/CURRENT/hairpin.fa.gz"
biowulf2$ gunzip hairpin.fa.gz
biowulf2$ module load emboss
biowulf2$ mkdir mouse_hairpins
biowulf2$ seqret -osdirectory2 mouse_hairpins -ossingle2 -auto hairpin.fa:mmu-*

Then set up a swarm file


randfold -d mouse_hairpins/mmu-let-7a-1.fasta 999 > mouse_hairpins/mmu-let-7a-1.fasta.randfold
randfold -d mouse_hairpins/mmu-let-7a-2.fasta 999 > mouse_hairpins/mmu-let-7a-2.fasta.randfold
randfold -d mouse_hairpins/mmu-let-7b.fasta 999 > mouse_hairpins/mmu-let-7b.fasta.randfold
randfold -d mouse_hairpins/mmu-let-7c-1.fasta 999 > mouse_hairpins/mmu-let-7c-1.fasta.randfold
randfold -d mouse_hairpins/mmu-let-7c-2.fasta 999 > mouse_hairpins/mmu-let-7c-2.fasta.randfold
randfold -d mouse_hairpins/mmu-let-7d.fasta 999 > mouse_hairpins/mmu-let-7d.fasta.randfold
randfold -d mouse_hairpins/mmu-let-7e.fasta 999 > mouse_hairpins/mmu-let-7e.fasta.randfold
randfold -d mouse_hairpins/mmu-let-7f-1.fasta 999 > mouse_hairpins/mmu-let-7f-1.fasta.randfold
randfold -d mouse_hairpins/mmu-let-7f-2.fasta 999 > mouse_hairpins/mmu-let-7f-2.fasta.randfold
randfold -d mouse_hairpins/mmu-let-7g.fasta 999 > mouse_hairpins/mmu-let-7g.fasta.randfold

And run it with swarm's default settings


biowulf2$ swarm -f swarmfile --module randfold
Running an interactive job on Biowulf2

After starting an interactive sesssion on a compute node with sinteractive, randfold is used as described above. For example


biowulf2$ sinteractive
salloc.exe: Granted job allocation nnnnnn
srun: error: x11: no local DISPLAY defined, skipping
cn0147$ module load randfold
cn0147$ randfold -d cel-let7.fa 999
cel-let-7       -42.90  0.001000
cn0147$ exit
biowulf2$