PhyloBayes: a Bayesian software package for phylogenetic reconstruction and molecular dating

Quick Links

PhyloBayes is a software package which can be used for conducting Bayesian phylogenetic reconstruction and molecular dating analyses, using a large variety of amino acid replacement and nucleotide substitution models, including empirical mixtures or non-parametric models, as well as alternative clock relaxation processes.

References:

Lartillot N., Philippe H.
A Bayesian Mixture Model for Across-Site Heterogeneities in the Amino-Acid Replacement Process.
Molecular Biology and Evolution 2004 21(6): 1095-1109.
Lartillot N., Philippe H.
Computing Bayes factors using thermodynamic integration.
Systematic Biology 2006 55:195-207.
Lartillot N., Brinkmann H., Philippe H.
Suppression of long-branch attraction artefacts in the animal phylogeny using a site-heterogeneous model.
BMC Evolutionary Biology 2007 Feb 8;7 Suppl 1:S4.
Nicolas Lartillot, Thomas Lepage and Samuel Blanquart
PhyloBayes 3: a Bayesian software package for phylogenetic reconstruction and molecular dating
Bioinformatics 2009 25(17): 2286–2288.

Documentation

Important Notes

Module Name: pysamstats (see the modules page for more information)
Unusual environment variables set
- PHYLOBAYES_HOME installation directory
- PHYLOBAYES_BIN executable directory
- PHYLOBAYES_DATA sample data dorectory

Interactive job

Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

Allocate an interactive session and run the program. Sample session:

[user@biowulf]$ sinteractive --mem=4g
[user@cn3406 ~]$ module load phylobayes 
[+] Loading phylobayes  4.1c

The PhyloBayes package includes the following executables:

[user@cn3406]$  ls $PHYLOBAYES_BIN
ancestral  bpcomp  pb     readcv   readpb     stoppb   subsample  tracecomp
bf         cvrep   ppred  readdiv  stopafter  subdata  sumcv      tree2ps

To display a usage message for any of the executables, type its name. For example:

[user@cn3406]$ pb

initialising random
seed was : 993557

pb [options] <chainname>
	creates a new chain, sampling from the posterior distribution, conditional on specified data

data options:
	-d <filename>    : file containing an alignment in phylip or nexus format; dna, rna or amino acids

tree:
	-t <treefile>     : starts from specified tree
	-T <treefile>     : chain run under fixed, specified tree
	-r <outgroup>     : re-root the tree (useful under clock models)

substitution model:
	mixture configuration
		-cat         : infinite mixture of profiles of equilibirium frequencies (Dirichlet Process)
		-ncat <ncat> : finite mixture of profiles of equilibirium frequencies
		-catfix <pr> : specifying a fixed set of pre-defined categories
		-qmmfix <mt> : specifying a fixed set of pre-defined matrices
	choice of relative rates of substitution
		-lg          : Le and Gascuel 2008
		-wag         : Whelan and Goldman 2001
		-jtt         : Jones, Taylor, Thornton 1992
		-mtrev       : Hadachi and Hasegawa 1996
		-mtzoa       : Rota Stabelli et al 2009
		-mtart       : Rota Stabelli et al 2009
		-gtr         : General Time Reversible
		-poisson     : Poisson matrix, all relative rates equal to 1 (Felsenstein 1981)
	underlying across-site rate variations
		-uni         : uniform rates across sites
		-dgam <ncat> : discrete gamma. ncat = number of categories (4 by default)
		-cgam        : continuous gamma distribution
		-ratecat     : rates across sites modelled using a Dirichlet process

relaxed clock models:
	-cl  : strict molecular clock
	-ln  : log normal (Thorne et al, 1998)
	-cir : CIR process (Lepage et al, 2007)
	-wn  : white noise (flexible but non-autocorrelated clock)
	-ugam: independent gamma multipliers (Drummond et al, 2006)

priors on divergence times (default: uniform):
	-dir : dirichlet
	-bd  : birth death

	-cal <calibrations> : impose a set of calibrations
	-sb                 : soft bounds
	-rp  <mean> <stdev> : impose a gamma prior on root age

additional options
	-x <every> <until>  : saving frequency, and chain length
	-f                  : forcing checks
	-s                  : "saveall" option. without it, only the trees are saved

pb <name>
	starts an already existing chain

see manual for details

[user@cn3406]$ readpb
initialising random
seed was : 399252

readpb [-x <burnin> <every> <until>] <chainname> 

	defaults : burnin = 0, every = 1, until the end

additional options:
	-c <cutoff> : collapses all groups with posterior probability lower than cutoff
	-m          : posterior distribution of the number of modes
	-ss         : mean posterior site-specific stationaries
	-r          : mean posterior site-specific rates (continuous gamma only)

	-ncat <n>   : defines number of bins for rate histogram (default 20)

	-cl         : mode clustering
		-ms: cluster min size (default : 10)
		-md: aggregating distance threshold (default : 0.03)

	-ps         : postscript output for tree (requires LateX), or for site-specific profiles

Here is how one can run the pb executable on test data:

[user@cn3406 ~]$ cp -r $PHYLOBAYES_DATA/* .
[user@cn3406 ~]$ cd moc
[user@cn3406 ~]$ pb -d moc.ali -T moc.tree -r bikont.outgroup -cal calib -ln -rp 2000 2000 mocln1
initialising random
seed was : 360313


error : cannot find data file moc.ali

[user@cn3406 test_dir]$ cd moc
[user@cn3406 moc]$ pb -d moc.ali -T moc.tree -r bikont.outgroup -cal calib -ln -rp 2000 2000 mocln1
 
initialising random
seed was : 809586


((((Ascaridida:0.374371,Diplogaste:0.374371):0.100692,(Chelicerat:0.273771,Mammalia:0.273771):0.201292):0.502348,(((Actinopter:0.827009,((Urochordat:0.099633,Drosophila:0.099633):0.604053,((Hymenopter:0.091704,Lepidopter:0.091704):0.138185,Caenorhabd:0.229889):0.473797):0.123322):0.090506,(((Tylenchida:0.395718,Strongyloi:0.395718):0.001518,((Spirurida:0.105024,Trichoceph:0.105024):0.249117,Platyhelmi:0.354142):0.043094):0.484948,((Choanoflag:0.297959,Basidiomyc:0.297959):0.345423,Sordariale:0.643383):0.238801):0.035331):0.057157,((Candida:0.769711,(Schizosacc:0.2549,Saccharomy:0.2549):0.514811):0.188829,Dictyostel:0.95854):0.016133):0.002737):0,((((stramenopi:0.184185,Trypanosoz:0.184185):0.2215,Bryophyta:0.405685):0.25628,(Rhodophyta:0.344971,(Ciliophora:0.117303,Plasmodium:0.117303):0.227668):0.316994):0.129341,(Cryptospor:0.75897,(((Sarcocysti:0.326341,(Piroplasmi:0.313463,Trypanosom:0.313463):0.012877):0.177239,(Leishmania:0.39955,(Arabidopsi:0.047498,Liliopsida:0.047498):0.352052):0.10403):0.220903,Chlorophyt:0.724484):0.034486):0.032337):0);
draw empirical mode
calibrations
Bryophyta	Arabidopsi	61
Liliopsida	Arabidopsi	62
Actinopter	Mammalia	53
Schizosacc	Sordariale	55
Chelicerat	Drosophila	49
Drosophila	Lepidopter	50
36	71

phylobayes version 4.1

init seed : 809586

data file : moc.ali
number of taxa : 36
number of sites: 7954

fast computation / high memory use 

fixed tree topology: moc.tree
outgroup : bikont.outgroup

branch lengths ~ iid gamma of mean mu and variance epsilon*mu^2
mu ~ exponential of mean 0.1
epsilon ~ exponential of mean 1

CAT model
   Dirichlet process of equilibrium frequency profiles
   flexible prior on profiles:
   profile ~ iid from a Dirichlet of center pi0 and concentration delta
   pi0 ~ uniform
   delta ~ exponential of mean Nstate (=20 on amino-acid data, 4 on nucleotide data)

relative exchange rates:
uniform (Poisson or Felsenstein81 processes, Felsenstein 1981)

rates ~ gamma of mean 1 and variance 1/alpha
4 discrete categories (Yang 1994)
alpha ~ exponential of mean 1

lognormal autocorrelated relaxed clock
nu ~ exponential of mean 1

calibrations : calib
hard bounds
lower bounds as in Paml 4.2 (truncated Cauchy), p = 0.1 and c = 1
prior on root age: gamma of mean 2000 and standard deviation 2000

WARNING: it is always advised to check the prior on divergence times in the presence of calibrations, by using the -prior option

prior on divergence times : uniform

(((((Trypanosoz,Trypanosom),Leishmania),(((((Plasmodium,Piroplasmi),Sarcocysti),Cryptospor),Ciliophora),stramenopi)),((((Arabidopsi,Liliopsida),Bryophyta),Chlorophyt),Rhodophyta)),(Dictyostel,(((((Candida,Saccharomy),Sordariale),Schizosacc),Basidiomyc),((((Mammalia,Actinopter),Urochordat),((((Hymenopter,Lepidopter),Drosophila),Chelicerat),((((((Caenorhabd,Diplogaste),(Ascaridida,Spirurida)),Tylenchida),Strongyloi),Trichoceph),Platyhelmi))),Choanoflag))));
initial log prior     : -7903.22
initial log likelihood: 404214
chain started

draw incremental dp modes
...

End the interactive session:

[user@cn3406 ~]$ exit
salloc.exe: Relinquishing job allocation 46116226
[user@biowulf ~]$