Efficient software for phylogenomic inference
The IQ-TREE software was created as the successor of IQPNNI and TREE-PUZZLE (thus the name IQ-TREE). IQ-TREE was motivated by the rapid accumulation of phylogenomic data, leading to a need for efficient phylogenomic software that can handle a large amount of data and provide more complex models of sequence evolution. To this end, IQ-TREE can utilize multicore computers and distributed parallel computing to speed up the analysis. IQ-TREE automatically performs checkpointing to resume an interrupted analysis.
Allocate an interactive session and run the program. Sample session:
[user@biowulf]$ sinteractive --mem=6g -c 8 salloc.exe: Pending job allocation 46116226 salloc.exe: job 46116226 queued and waiting for resources salloc.exe: job 46116226 has been allocated resources salloc.exe: Granted job allocation 46116226 salloc.exe: Waiting for resource configuration salloc.exe: Nodes cn3144 are ready for job [user@cn3144 ~]$ cd /data/$USER [user@cn3144 user]$ module load iqtree [user@cn3144 user]$ cp $IQTREE_TEST_DATA/* . [user@cn3144 user]$ iqtree2 -s example.phy -nt $SLURM_CPUS_PER_TASK IQ-TREE multicore version 2.1.2 COVID-edition for Linux 64-bit built Oct 22 2020 Developed by Bui Quang Minh, James Barbetti, Nguyen Lam Tung, Olga Chernomor, Heiko Schmidt, Dominik Schrempf, Michael Woodhams. Host: cn3144 (AVX512, FMA3, 377 GB RAM) Command: iqtree2 -s example.phy -nt 8 Seed: 830607 (Using SPRNG - Scalable Parallel Random Number Generator) Time: Mon Oct 26 10:33:27 2020 Kernel: AVX+FMA - 8 threads (8 CPU cores detected) Reading alignment file example.phy ... Phylip format detected Alignment most likely contains DNA/RNA sequences Alignment has 17 sequences with 1998 columns, 1152 distinct patterns 1009 parsimony-informative, 303 singleton sites, 686 constant sites [...] Total number of iterations: 102 CPU time used for tree search: 34.761 sec (0h:0m:34s) Wall-clock time used for tree search: 4.757 sec (0h:0m:4s) Total CPU time used: 44.325 sec (0h:0m:44s) Total wall-clock time used: 6.151 sec (0h:0m:6s) Analysis results written to: IQ-TREE report: example.phy.iqtree Maximum-likelihood tree: example.phy.treefile Likelihood distances: example.phy.mldist Screen log file: example.phy.log [user@cn3144 user]$ exit salloc.exe: Relinquishing job allocation 46116226 [user@biowulf ~]$
Create a batch input file (e.g. iqtree.sh). For example:
#!/bin/bash module load iqtree iqtree2 -s example.phy -nt $SLURM_CPUS_PER_TASK
Submit this job using the Slurm sbatch command.
sbatch --cpus-per-task=8 --mem=6g iqtree.sh
Create a swarmfile (e.g. iqtree.swarm). For example:
iqtree2 -s example1.phy -nt $SLURM_CPUS_PER_TASK iqtree2 -s example2.phy -nt $SLURM_CPUS_PER_TASK iqtree2 -s example3.phy -nt $SLURM_CPUS_PER_TASK
Submit this job using the swarm command.
swarm -f iqtree.swarm -g 6 -t 8 --module iqtreewhere
-g # | Number of Gigabytes of memory required for each process (1 line in the swarm command file) |
-t # | Number of threads/CPUs required for each process (1 line in the swarm command file). |
--module iqtree | Loads the iqtree module for each subjob in the swarm |