From the metaphlan documentation:
MetaPhlAn is a computational tool for profiling the composition of microbial communities (Bacteria, Archaea and Eukaryotes) from metagenomic shotgun sequencing data (i.e. not 16S) with species-level. With the newly added StrainPhlAn module, it is now possible to perform accurate strain-level microbial profiling. MetaPhlAn relies on ~1.1M unique clade-specific marker genes identified from ~100,000 reference genomes (~99,500 bacterial and archaeal and ~500 eukaryotic), allowing:
- unambiguous taxonomic assignments;
- accurate estimation of organismal relative abundance;
- species-level resolution for bacteria, archaea, eukaryotes and viruses;
- strain identification and tracking;
- orders of magnitude speedups compared to existing methods;
- metagenomic strain-level population genomics
Metaphlan installations include strainphlan, phylophlan, and hclust2.
$METAPHLAN_TEST_DATA
Allocate an interactive session and run the program. Sample session:
[user@biowulf]$ sinteractive -c4 --mem=20g --gres=lscratch:10 salloc.exe: Pending job allocation 46116226 salloc.exe: job 46116226 queued and waiting for resources salloc.exe: job 46116226 has been allocated resources salloc.exe: Granted job allocation 46116226 salloc.exe: Waiting for resource configuration salloc.exe: Nodes cn3144 are ready for job [user@cn3144]$ cd /lscratch/$SLURM_JOB_ID [user@cn3144]$ module load metaphlan [user@cn3144]$ mkdir fasta [user@cn3144]$ cp ${METAPHLAN_TEST_DATA:-none}/*.fasta.gz fasta [user@cn3144]$ ls -lh fasta -rw-r--r-- 1 user group 690K Nov 4 13:11 SRS014459-Stool.fasta.gz -rw-r--r-- 1 user group 608K Nov 4 13:11 SRS014464-Anterior_nares.fasta.gz -rw-r--r-- 1 user group 704K Nov 4 13:11 SRS014470-Tongue_dorsum.fasta.gz -rw-r--r-- 1 user group 748K Nov 4 13:11 SRS014472-Buccal_mucosa.fasta.gz -rw-r--r-- 1 user group 696K Nov 4 13:11 SRS014476-Supragingival_plaque.fasta.gz -rw-r--r-- 1 user group 687K Nov 4 13:11 SRS014494-Posterior_fornix.fasta.gz [user@cn3144]$ for f in fasta/*.gz do name=$(basename $f .fasta.gz) metaphlan --nproc $SLURM_CPUS_PER_TASK --bowtie2out ${name}.bt2.bz2 \ --input_type fasta --bowtie2db $METAPHLAN_DB $f > ${name}_profile.txt done [user@cn3144]$ merge_metaphlan_tables.py *_profile.txt > merged_abundance_table.txt [user@cn3144]$ hclust2.py -c bbcry --top 0.25 \ -l --in merged_abundance_table.txt \ --out abundance_heatmap.png [user@cn3144]$ exit salloc.exe: Relinquishing job allocation 46116226 [user@biowulf]$
Create a batch input file (e.g. metaphlan.sh). For example:
#!/bin/bash module load metaphlan || exit 1 METAPHLAN_DB=/fdb/metaphlan/vOct22 fastq=/path/to/sample.fastq.gz name=$(dirname $fastq)/$(basename $fastq .fastq.gz) metaphlan --nproc $SLURM_CPUS_PER_TASK --bowtie2out ${name}.bt2.bz2 \ --input_type fastq --bowtie2db $METAPHLAN_DB $fastq > ${name}_profile.txt
Submit this job using the Slurm sbatch command.
sbatch --cpus-per-task=6 --mem=12g metaphlan.sh
Create a swarmfile (e.g. metaphlan.swarm). For example:
metaphlan --nproc $SLURM_CPUS_PER_TASK --bowtie2out sample1.bt2.bz2 \ --input_type fastq --bowtie2db $METAPHLAN_DB sample1.fastq.gz > sample1_profile.txt metaphlan --nproc $SLURM_CPUS_PER_TASK --bowtie2out sample2.bt2.bz2 \ --input_type fastq --bowtie2db $METAPHLAN_DB sample2.fastq.gz > sample2_profile.txt metaphlan --nproc $SLURM_CPUS_PER_TASK --bowtie2out sample3.bt2.bz2 \ --input_type fastq --bowtie2db $METAPHLAN_DB sample3.fastq.gz > sample3_profile.txt
Submit this job using the swarm command.
swarm -f metaphlan.swarm -g 12 -t 6 --module metaphlanwhere
-g # | Number of Gigabytes of memory required for each process (1 line in the swarm command file) |
-t # | Number of threads/CPUs required for each process (1 line in the swarm command file). |
--module metaphlan | Loads the metaphlan module for each subjob in the swarm |