High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
fastphylo on Biowulf & Helix

Description

fastphylo contains tools for the efficient estimation of nucleotide/protein distance matrices (fastdist, fastprot, fastprot_mpi) fand reconstructing phylogenies from distance matrices (fnj) with the neighbor joining algorithm.

The distance programs can take fasta, phylip, or xml format input. The neighbor joining program fnj can read phylip, xml, or binary distance matrices.

There may be multiple versions of fastphylo available. An easy way of selecting the version is to use modules. To see the modules available, type

module avail fastphylo 

To select a module use

module load fastphylo/[version]

where [version] is the version of choice.

Environment variables set

References

Documentation

On Helix

Create a distance matrix in xml format:

helix$ cat > seq.phylip <<EOF
   3   13
Alpha     AAC GTGG
Beta      AAG GTCG
Gamma     CAG TTCG
          CCAC AT
          CCAC AC
          CCAC AA
EOF

helix$ fastdist -I phylip -O phylip seq.phylip
    3
Alpha       0.000000  0.299650  0.733169
Beta        0.299650  0.000000  0.309520
Gamma       0.733169  0.309520  0.000000
helix$ fastdist -I phylip seq.phylip > dm.xml
helix$ fnj -O newick --bootstraps=100 dm.xml
(Gamma,Beta,Alpha);

The tools read from stdin if no file name is provided, so they can be pipelined easily:

helix$ fastdist -I phylip seq.phylip | fnj -O newick 
(Gamma,Beta,Alpha);
Batch job on Biowulf

For this example we will calculate a neighbor joining tree of ~900 polyproteins from North American Dengue isolates obtained from the NCBI Virus Variation Resource and aligned with muscle. Protein distance matrices are bootstrapped 100 times. Create a batch script similar to the following example:

#! /bin/bash
# this is file prot_nj.sh

module load fastphylo || exit 1
TD=/usr/local/apps/fastphylo/TEST_DATA
ALN=20160113_Dengue_prot_NAmerica_noX_aln_small

fastprot -I fasta -O xml -b 100 ${TD}/${ALN}.fa > ${ALN}.dm.xml
fnj -m BIONJ ${ALN}.dm.xml > ${ALN}.tree.xml

Submit to the queue with sbatch:

b2$ sbatch prot_nj.sh
Swarm of jobs on Biowulf

Create a swarm command file similar to the following example:

fastprot -b 100 /usr/local/apps/fastphylo/TEST_DATA/20160113_Dengue_prot_NAmerica_noX_aln_small.fa \
  fnj -m BIONJ > prot_tree.xml
fastdist -b 100 /usr/local/apps/fastphylo/TEST_DATA/20160113_Dengue_nt_NAmerica_noN_aln_small.fa \
  fnj -m BIONJ > nt_tree.xml

And submit to the queue with swarm

b2$ swarm -f fastphylo.swarm
Interactive job on Biowulf

Allocate an interactive session with sinteractive and use as described above

b2$ sinteractive 
node$ module load fastphylo/r131
[+] Loading fastphylo r131
node$ fastprot -b 200 /usr/local/apps/fastphylo/TEST_DATA/20160113_Dengue_prot_NAmerica_noX_aln_small.fa \
  | fnj -m BIONJ \
  > tree.xml
node$ exit
b2$