PhyloBayes is a software package which can be used for conducting Bayesian phylogenetic reconstruction and molecular dating analyses, using a large variety of amino acid replacement and nucleotide substitution models, including empirical mixtures or non-parametric models, as well as alternative clock relaxation processes.
Allocate an interactive session and run the program. Sample session:
[user@biowulf]$ sinteractive --mem=4g [user@cn3406 ~]$ module load phylobayes [+] Loading phylobayes 4.1cThe PhyloBayes package includes the following executables:
[user@cn3406]$ ls $PHYLOBAYES_BIN ancestral bpcomp pb readcv readpb stoppb subsample tracecomp bf cvrep ppred readdiv stopafter subdata sumcv tree2psTo display a usage message for any of the executables, type its name. For example:
[user@cn3406]$ pb initialising random seed was : 993557 pb [options] <chainname> creates a new chain, sampling from the posterior distribution, conditional on specified data data options: -d <filename> : file containing an alignment in phylip or nexus format; dna, rna or amino acids tree: -t <treefile> : starts from specified tree -T <treefile> : chain run under fixed, specified tree -r <outgroup> : re-root the tree (useful under clock models) substitution model: mixture configuration -cat : infinite mixture of profiles of equilibirium frequencies (Dirichlet Process) -ncat <ncat> : finite mixture of profiles of equilibirium frequencies -catfix <pr> : specifying a fixed set of pre-defined categories -qmmfix <mt> : specifying a fixed set of pre-defined matrices choice of relative rates of substitution -lg : Le and Gascuel 2008 -wag : Whelan and Goldman 2001 -jtt : Jones, Taylor, Thornton 1992 -mtrev : Hadachi and Hasegawa 1996 -mtzoa : Rota Stabelli et al 2009 -mtart : Rota Stabelli et al 2009 -gtr : General Time Reversible -poisson : Poisson matrix, all relative rates equal to 1 (Felsenstein 1981) underlying across-site rate variations -uni : uniform rates across sites -dgam <ncat> : discrete gamma. ncat = number of categories (4 by default) -cgam : continuous gamma distribution -ratecat : rates across sites modelled using a Dirichlet process relaxed clock models: -cl : strict molecular clock -ln : log normal (Thorne et al, 1998) -cir : CIR process (Lepage et al, 2007) -wn : white noise (flexible but non-autocorrelated clock) -ugam: independent gamma multipliers (Drummond et al, 2006) priors on divergence times (default: uniform): -dir : dirichlet -bd : birth death -cal <calibrations> : impose a set of calibrations -sb : soft bounds -rp <mean> <stdev> : impose a gamma prior on root age additional options -x <every> <until> : saving frequency, and chain length -f : forcing checks -s : "saveall" option. without it, only the trees are saved pb <name> starts an already existing chain see manual for details [user@cn3406]$ readpb initialising random seed was : 399252 readpb [-x <burnin> <every> <until>] <chainname> defaults : burnin = 0, every = 1, until the end additional options: -c <cutoff> : collapses all groups with posterior probability lower than cutoff -m : posterior distribution of the number of modes -ss : mean posterior site-specific stationaries -r : mean posterior site-specific rates (continuous gamma only) -ncat <n> : defines number of bins for rate histogram (default 20) -cl : mode clustering -ms: cluster min size (default : 10) -md: aggregating distance threshold (default : 0.03) -ps : postscript output for tree (requires LateX), or for site-specific profilesHere is how one can run the pb executable on test data:
[user@cn3406 ~]$ cp -r $PHYLOBAYES_DATA/* . [user@cn3406 ~]$ cd moc [user@cn3406 ~]$ pb -d moc.ali -T moc.tree -r bikont.outgroup -cal calib -ln -rp 2000 2000 mocln1 initialising random seed was : 360313 error : cannot find data file moc.ali [user@cn3406 test_dir]$ cd moc [user@cn3406 moc]$ pb -d moc.ali -T moc.tree -r bikont.outgroup -cal calib -ln -rp 2000 2000 mocln1 initialising random seed was : 809586 ((((Ascaridida:0.374371,Diplogaste:0.374371):0.100692,(Chelicerat:0.273771,Mammalia:0.273771):0.201292):0.502348,(((Actinopter:0.827009,((Urochordat:0.099633,Drosophila:0.099633):0.604053,((Hymenopter:0.091704,Lepidopter:0.091704):0.138185,Caenorhabd:0.229889):0.473797):0.123322):0.090506,(((Tylenchida:0.395718,Strongyloi:0.395718):0.001518,((Spirurida:0.105024,Trichoceph:0.105024):0.249117,Platyhelmi:0.354142):0.043094):0.484948,((Choanoflag:0.297959,Basidiomyc:0.297959):0.345423,Sordariale:0.643383):0.238801):0.035331):0.057157,((Candida:0.769711,(Schizosacc:0.2549,Saccharomy:0.2549):0.514811):0.188829,Dictyostel:0.95854):0.016133):0.002737):0,((((stramenopi:0.184185,Trypanosoz:0.184185):0.2215,Bryophyta:0.405685):0.25628,(Rhodophyta:0.344971,(Ciliophora:0.117303,Plasmodium:0.117303):0.227668):0.316994):0.129341,(Cryptospor:0.75897,(((Sarcocysti:0.326341,(Piroplasmi:0.313463,Trypanosom:0.313463):0.012877):0.177239,(Leishmania:0.39955,(Arabidopsi:0.047498,Liliopsida:0.047498):0.352052):0.10403):0.220903,Chlorophyt:0.724484):0.034486):0.032337):0); draw empirical mode calibrations Bryophyta Arabidopsi 61 Liliopsida Arabidopsi 62 Actinopter Mammalia 53 Schizosacc Sordariale 55 Chelicerat Drosophila 49 Drosophila Lepidopter 50 36 71 phylobayes version 4.1 init seed : 809586 data file : moc.ali number of taxa : 36 number of sites: 7954 fast computation / high memory use fixed tree topology: moc.tree outgroup : bikont.outgroup branch lengths ~ iid gamma of mean mu and variance epsilon*mu^2 mu ~ exponential of mean 0.1 epsilon ~ exponential of mean 1 CAT model Dirichlet process of equilibrium frequency profiles flexible prior on profiles: profile ~ iid from a Dirichlet of center pi0 and concentration delta pi0 ~ uniform delta ~ exponential of mean Nstate (=20 on amino-acid data, 4 on nucleotide data) relative exchange rates: uniform (Poisson or Felsenstein81 processes, Felsenstein 1981) rates ~ gamma of mean 1 and variance 1/alpha 4 discrete categories (Yang 1994) alpha ~ exponential of mean 1 lognormal autocorrelated relaxed clock nu ~ exponential of mean 1 calibrations : calib hard bounds lower bounds as in Paml 4.2 (truncated Cauchy), p = 0.1 and c = 1 prior on root age: gamma of mean 2000 and standard deviation 2000 WARNING: it is always advised to check the prior on divergence times in the presence of calibrations, by using the -prior option prior on divergence times : uniform (((((Trypanosoz,Trypanosom),Leishmania),(((((Plasmodium,Piroplasmi),Sarcocysti),Cryptospor),Ciliophora),stramenopi)),((((Arabidopsi,Liliopsida),Bryophyta),Chlorophyt),Rhodophyta)),(Dictyostel,(((((Candida,Saccharomy),Sordariale),Schizosacc),Basidiomyc),((((Mammalia,Actinopter),Urochordat),((((Hymenopter,Lepidopter),Drosophila),Chelicerat),((((((Caenorhabd,Diplogaste),(Ascaridida,Spirurida)),Tylenchida),Strongyloi),Trichoceph),Platyhelmi))),Choanoflag)))); initial log prior : -7903.22 initial log likelihood: 404214 chain started draw incremental dp modes ...End the interactive session:
[user@cn3406 ~]$ exit salloc.exe: Relinquishing job allocation 46116226 [user@biowulf ~]$