High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
Phylip on Biowulf & Helix

PHYLIP (PHYlogeny Inference Package) is a package of programs for inferring phylogenies (evolutionary trees). Methods that are available in the package include parsimony, distance matrix, and likelihood methods, including bootstrapping and consensus trees. Data types that can be handled include molecular sequences, gene frequencies, restriction sites and fragments, distance matrices, and discrete characters.

PHYLIP was developed at the University of Washington. PHYLIP website.

PHYLIP is a single-threaded program, and is intended to be used interactively on helix. The PHYLIP package consists of a large number of individual programs. They are installed in /usr/local/phylip/exe. List of PHYLIP programs.

On Helix

Sample session

helix% module load phylip helix% dnapars dnapars: can't find input file "infile" Please enter a new file name> seq DNA parsimony algorithm, version 3.65 Setting for this run: U Search for best tree? Yes S Search option? More thorough search V Number of trees to save? 10000 J Randomize input order of sequences? No. Use input order O Outgroup root? No, use as outgroup species 1 T Use Threshold parsimony? No, use ordinary parsimony N Use Transversion parsimony? No, count all steps W Sites weighted? No M Analyze multiple data sets? No I Input sequences interleaved? Yes 0 Terminal type (IBM PC, ANSI, none)? ANSI 1 Print out the data at start of run No 2 Print indications of progress of run Yes 3 Print out tree Yes 4 Print out steps in each site No 5 Print sequences at all nodes of tree No 6 Write out trees onto tree file? Yes Y to accept these or type the letter for one to change Y Adding species: 1. Archaeopt 2. Hesperorni 3. Baluchithe 4. B. virgini 5. Brontosaur 6. B.subtilis Doing global rearrangements on all trees tied for best !-----------! ........... ........... Collapsing best trees Output written to file "outfile" Tree also written onto file "outtree" Done. % ls outfile outtree seq % cat outfile DNA parsimony algorithm, version 3.65 One most parsimonious tree found: +---------------B.subtilis +-------------4 | +----------Brontosaur | | +-B. virgini | +-------------3 1--------2 +------Baluchithe | | | +-----------Hesperorni | +-----------Archaeopt requires a total of 21.000 between and length ------- --- ------ 1 4 0.230769 4 B.subtilis 0.269231 4 Brontosaur 0.192308 1 2 0.153846 2 3 0.230769 3 B. virgini 0.038462 3 Baluchithe 0.115385 2 Hesperorni 0.192308 1 Archaeopt 0.192308
Batch job on Biowulf

Set up a batch job along the following lines:


cd /data/$USER/somedir

module load phylip/3.696

dnaml << EOF

Basically, you need to provide Phylip with the same input that it would expect if you ran the program interactively. You can include the parameters directly in the batch script, as in the example above. Or you can put these parameters into a file (e.g. 'phylip_input') and include:

dnaml < phylip_input > output
in your batch script.

Submit this job with:

sbatch  batch_script
If the job requires more than the default 4 GB of memory, allocate the appropriate memory with:
sbatch --mem=#g  batch_script
Interactive job on Biowulf

Allocate an interactive node with 'sinteractive' and run the program as in the Helix example above.

Sample session:

[susanc@biowulf ~]$ sinteractive --mem=2g
salloc.exe: Granted job allocation 17440
slurm stepprolog here!
                                            Begin slurm taskprolog!
End slurm taskprolog!

[susanc@p20 ~]$ cp -r /usr/local/apps/phylip/example .

[susanc@p20 ~]$ cd example

[susanc@p20 example]$ module load phylip

[susanc@p20 example]$ dnapars
dnapars: can't find input file "infile"
Please enter a new file name> seq

[susanc@p20 example]$ exit
                                                   slurm stepepilog here!
salloc.exe: Relinquishing job allocation 17440
salloc.exe: Job allocation 17440 has been revoked.

[susanc@biowulf ~]$ 

A primer to phylogenetic analysis using PHYLIP, by Jarno Tuimala. (PDF)
Documentation for all PHYLIP programs