metaWRAP: a flexible pipeline for genome-resolved metagenomic data analysis

metaWRAP: a flexible pipeline for genome-resolved metagenomic data analysis

Quick Links

MetaWRAP is a modular pipeline for shotgun metagenomic data analysis. It deploys state-of-the-art software to handle metagenomic data processing starting from raw sequencing reads and ending in metagenomic bins and their analysis. It includes hybrid algorithms that leverage the strengths of a variety of software to extract and refine high-quality bins from metagenomic data through bin consolidation and reassembly.

References:

Gherman V. Uritskiy, Jocelyne DiRuggiero & James Taylor,
MetaWRAP—a flexible pipeline for genome-resolved metagenomic data analysis
BMC Microbiome 6, Article number: 158 (2018).

Documentation

Important Notes

Module Name: metawrap (see the modules page for more information)
Unusual environment variables set
- MW_HOME installation directory
- MW_BIN executable directory
- MW_SRC source code directory
- MW_DATA sample data directory

Interactive job

Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

Allocate an interactive session and run the program. Sample session:

[user@biowulf]$ sinteractive --mem=64g --gres=lscratch:20 -c48 --time=24:00:00
[user@cn0911 ~]$module load metawrap   
[+] Loading blast 2.14.0+  ...
[+] Loading bwa 0.7.17 on cn4272
[+] Loading bowtie  2-2.3.5
[+] Loading kraken  1.1
[+] Loading kronatools  2.8.1  on cn4272
[+] Loading perl 5.34.0 on cn4272
[+] Loading samtools 1.9  ...
[+] Loading singularity  4.0.1  on cn4272
[+] Loading salmon  1.7.0
[+] Loading prokka  1.14.6
[+] Loading metabat  2.15  on cn4272
[+] Loading quast  5.2.0  on cn4272
[+] Loading cutadapt  4.4
[+] Loading eigen 3.4.0-1092574b  ...
[+] Loading trimgalore  0.6.6  ...
[+] Loading fastqc  0.11.8
[+] Loading checkm2  1.0.2
[+] Loading megahit, version 1.2.9...
[+] Loading spades  3.15.5
[+] Loading gcc  11.3.0  ...
[+] Loading HDF5  1.12.2
[+] Loading netcdf  4.9.0
[+] Loading openmpi/4.1.3/gcc-11.3.0  ...
[+] Loading pandoc  2.18  on cn4272
[+] Loading pcre2  10.40
[+] Loading R 4.3.0
[+] Loading metawrap 1.3.2  ...
[user@cn0911 ~]$mkdir /data/$USER/metawrap && cd /data/$USER/metawrap
[user@cn0911 ~]$cp -r $MW_SRC/* .

Download sample data, unzip the data and place them into folder RAW_READS:

[user@cn0911 ~]$cp $MW_DATA/* .
[user@cn0911 ~]$gunzip *.gz
[user@cn0911 ~]$mkdir RAW_READS && mv *fastq RAW_READS
[user@cn0911 ~]$ls RAW_READS
ERR011347_1.fastq
ERR011347_2.fastq
ERR011348_1.fastq
ERR011348_2.fastq
ERR011349_1.fastq
ERR011349_2.fastq

Perform the analysis steps that are described in more details in the Usage Tutorial:

Step 1: Run metaWRAP-Read_qc to trim the reads and remove human contamination:

[user@cn0911 ~]$mkdir -p READ_QC/011347 
[user@cn0911 ~]$metawrap read_qc -1 RAW_READS/ERR011347_1.fastq -2 RAW_READS/ERR011347_2.fastq -t 48 -o READ_QC/011347
...
real    31m13.819s
user    29m14.221s
sys     2m51.263s
[user@cn0911 ~]$mkdir -p READ_QC/011348 
[user@cn0911 ~]$metawrap read_qc -1 RAW_READS/ERR011348_1.fastq -2 RAW_READS/ERR011348_2.fastq -t 48 -o READ_QC/011348
...
real    17m41.812s
user    17m25.779s
sys     1m35.917s
[user@cn0911 ~]$mkdir -p READ_QC/113498 
[user@cn0911 ~]$metawrap read_qc -1 RAW_READS/ERR011349_1.fastq -2 RAW_READS/ERR011349_2.fastq -t 48 -o READ_QC/011349
...
real    39m34.887s
user    39m51.397s
sys     1m34.431s

Step 2: Assemble the metagenomes with the metaWRAP-Assembly module:

[user@cn0911 ~]$mkdir -p CLEAN_READS ASSEMBLY
[user@cn0911 ~]$for i in READ_QC/*; do 
                	b=${i#*/}
	                mv ${i}/final_pure_reads_1.fastq CLEAN_READS/ERR${b}_1.fastq
	                mv ${i}/final_pure_reads_2.fastq CLEAN_READS/ERR${b}_2.fastq
                   done 
[user@cn0911 ~]$cat CLEAN_READS/ERR*_1.fastq > CLEAN_READS/ALL_READS_1.fastq
[user@cn0911 ~]$cat CLEAN_READS/ERR*_2.fastq > CLEAN_READS/ALL_READS_2.fastq
[user@cn0911 ~]$metawrap assembly -1 CLEAN_READS/ALL_READS_1.fastq \
                                     -2 CLEAN_READS/ALL_READS_2.fastq \
                                     -m 200 -t 48 --metaspades -o ASSEMBLY
...
real    55m28.629s
user    966m36.557s
sys     19m15.560s

Step 3: Run Kraken module on both reads and the assembly:

[user@cn0911 ~]$metawrap kraken -o KRAKEN -t 48 -s 1000000 CLEAN_READS/ERR*fastq ASSEMBLY/final_assembly.fasta
...
real    91m23.363s
user    10m51.474s
sys     57m11.215s

Step 4: Bin the co-assembly with three different algorithms with the Binning module:

[user@cn0911 ~]$metawrap binning -o INITIAL_BINNING -t 48 -a ASSEMBLY/final_assembly.fasta \
                   --metabat2 --maxbin2 --concoct CLEAN_READS/ERR*fastq 
...
real    8m42.032s
user    17m53.586s
sys     9m8.455s

Step 5: Consolidate bin sets with the Bin_refinement module:

[user@cn0911 ~]$metawrap bin_refinement -o BIN_REFINEMENT -t 48 \
                                           -A INITIAL_BINNING/metabat2_bins/ \
                                           -B INITIAL_BINNING/maxbin2_bins/ \
                                           -C INITIAL_BINNING/concoct_bins/ \
                                           -c 50 \
                                           -x 10 
...
real    45m13.304s
user    116m11.825s
sys     5m22.803s

Step 6: Visualize the community and the extracted bins with the Blobology module:

[user@cn0911 ~]$metawrap blobology -a ASSEMBLY/final_assembly.fasta \
                                      -t 48 -o BLOBOLOGY \
                                      --bins BIN_REFINEMENT/metawrap_50_10_bins CLEAN_READS/ERR*fastq 
...
real    57m33.652s
user    185m26.082s
sys     166m38.753s

Step 7: Find the abundaces of the draft genomes (bins) across the samples:

[user@cn0911 ~]$metawrap quant_bins -b BIN_REFINEMENT/metawrap_50_10_bins \
                                       -o QUANT_BINS \
                                       -a ASSEMBLY/final_assembly.fasta CLEAN_READS/ERR*fastq 
...
real    5m41.060s
user    6m58.301s
sys     0m7.729s

Step 8: Re-assemble the consolidated bin set with the Reassemble_bins module:

[user@cn0911 ~]$metawrap reassemble_bins -o BIN_REASSEMBLY \
                                            -1 CLEAN_READS/ALL_READS_1.fastq \
                                            -2 CLEAN_READS/ALL_READS_2.fastq \
                                            -t 48 -m 800 -c 50 -x 10 \
                                            -b BIN_REFINEMENT/metawrap_50_10_bins 
...
real    106m54.790s
user    496m15.997s
sys     45m18.069s

Step 9: Determine the taxonomy of each bin with the Classify_bins module:

[user@cn0911 ~]$metawrap classify_bins -b BIN_REASSEMBLY/reassembled_bins \
                                          -o BIN_CLASSIFICATION -t 48

Step 10: Functionally annotate bins with the Annotate_bins module

[user@cn0911 ~]$metaWRAP annotate_bins -o FUNCT_ANNOT -t 48 -b BIN_REASSEMBLY/reassembled_bins/

End the interactive session:

[user@cn0911 ~]$ exit
salloc.exe: Relinquishing job allocation 46116226
[user@biowulf ~]$