Pplacer places query sequences on a fixed reference phylogenetic tree to maximize phylogenetic likelihood or posterior probability according to a reference alignment. Pplacer is designed to be fast, to give useful information about uncertainty, and to offer advanced visualization and downstream analysis.
Allocate an interactive session and run the program.
Sample session (user input in bold):
[user@biowulf]$ sinteractive salloc.exe: Pending job allocation 46116226 salloc.exe: job 46116226 queued and waiting for resources salloc.exe: job 46116226 has been allocated resources salloc.exe: Granted job allocation 46116226 salloc.exe: Waiting for resource configuration salloc.exe: Nodes cn3144 are ready for job [user@cn3144 ~]$ mkdir /data/$USER/pplacer; cd /data/$USER/pplacer [user@cn3144 pplacer]$ tar xvzf /usr/local/apps/pplacer/tutorial.tar.gz [...] [user@cn3144 pplacer]$ cd fhcrc-microbiome-demo-730d268 [user@cn3144 pplacer]$ module avail pplacer ----------------------------------------- /usr/local/lmod/modulefiles -------------------------- pplacer/1.1 [user@cn3144 pplacer]$ module load pplacer [+] Loading pplacer 1.1 [user@cn3144 pplacer]$ sh pplacer_demo.sh # Phylogenetic placement # ---------------------- # This makes p4z1r2.jplace, which is a "place" file in JSON format. Place files # contain information about collections of phylogenetic placements on a tree. # You may notice that one of the arguments to this command is # `vaginal_16s.refpkg`, which is a "reference package". Reference packages are # simply an organized collection of files including a reference tree, reference # alignment, and taxonomic information. We have the beginnings of a # [database](http://microbiome.fhcrc.org/apps/refpkg/) of reference packages # and some [software](http://github.com/fhcrc/taxtastic) for putting them # together. pplacer -c vaginal_16s.refpkg src/p4z1r36.fasta Running pplacer v1.1.alpha19-0-g807f6f3 analysis on src/p4z1r36.fasta... Found reference sequences in given alignment file. Using those for reference alignment. Pre-masking sequences... sequence length cut from 2196 to 275. Determining figs... figs disabled. Allocating memory for internal nodes... done. Caching likelihood information on reference tree... done. Pulling exponents... done. Preparing the edges for baseball... done. warning: rank below_species not represented in the lineage of any sequence in reference package vaginal_16s. [...] pause Please press return to continue... # That's it for the demo. For further information, please consult the # [pplacer documentation](http://matsen.github.com/pplacer/). echo "Thanks!" Thanks! [user@cn3144 ~]$ exit salloc.exe: Relinquishing job allocation 46116226 [user@biowulf ~]$
Create a batch input file (e.g. pplacer.sh). For example:
#!/bin/bash set -e module load pplacer mkdir /data/$USER/pplacer cd /data/$USER/pplacer tar xvzf /usr/local/apps/pplacer/tutorial.tar.gz cd fhcrc-microbiome-demo-730d268 pplacer -c vaginal_16s.refpkg src/p4z1r36.fasta
Submit this job using the Slurm sbatch command.
sbatch [--mem=#] pplacer.sh
Create a swarmfile (e.g. pplacer.swarm). For example:
guppy kr_heat -c vaginal_16s.refpkg/ file1a.jplace file2a.jplace guppy kr_heat -c vaginal_16s.refpkg/ file1b.jplace file2b.jplace guppy kr_heat -c vaginal_16s.refpkg/ file1c.jplace file2c.jplace
Submit this job using the swarm command.
swarm -f pplacer.swarm [-g #] --module pplacerwhere
-g # | Number of Gigabytes of memory required for each process (1 line in the swarm command file) |
--module pplacer | Loads the pplacer module for each subjob in the swarm |