metaWRAP: a flexible pipeline for genome-resolved metagenomic data analysis

MetaWRAP is a modular pipeline for shotgun metagenomic data analysis. It deploys state-of-the-art software to handle metagenomic data processing starting from raw sequencing reads and ending in metagenomic bins and their analysis. It includes hybrid algorithms that leverage the strengths of a variety of software to extract and refine high-quality bins from metagenomic data through bin consolidation and reassembly.

References:

Documentation
Important Notes

Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

Allocate an interactive session and run the program. Sample session:

[user@biowulf]$ sinteractive --mem=4g --gres=lscratch:20 -c8
[user@cn0911 ~]$module load metawrap   
[+] Loading blast 2.14.0+  ...
[+] Loading bwa 0.7.17 on cn4272
[+] Loading bowtie  2-2.3.5
[+] Loading kraken  1.1
[+] Loading kronatools  2.8.1  on cn4272
[+] Loading perl 5.34.0 on cn4272
[+] Loading samtools 1.9  ...
[+] Loading singularity  4.0.1  on cn4272
[+] Loading salmon  1.7.0
[+] Loading prokka  1.14.6
[+] Loading metabat  2.15  on cn4272
[+] Loading quast  5.2.0  on cn4272
[+] Loading cutadapt  4.4
[+] Loading eigen 3.4.0-1092574b  ...
[+] Loading trimgalore  0.6.6  ...
[+] Loading fastqc  0.11.8
[+] Loading checkm2  1.0.2
[+] Loading megahit, version 1.2.9...
[+] Loading spades  3.15.5
[+] Loading gcc  11.3.0  ...
[+] Loading HDF5  1.12.2
[+] Loading netcdf  4.9.0
[+] Loading openmpi/4.1.3/gcc-11.3.0  ...
[+] Loading pandoc  2.18  on cn4272
[+] Loading pcre2  10.40
[+] Loading R 4.3.0
[+] Loading metawrap 1.3.2  ...
[user@cn0911 ~]$mkdir /data/$USER/metawrap && cd /data/$USER/metawrap
[user@cn0911 ~]$cp -r $MW_SRC/* . 
Download sample data, unzip the data and place them into folder RAW_READS:
[user@cn0911 ~]$cp $MW_DATA/* .
[user@cn0911 ~]$gunzip *.gz
[user@cn0911 ~]$mkdir RAW_READS && mv *fastq RAW_READS
[user@cn0911 ~]$ls RAW_READS
ERR011347_1.fastq
ERR011347_2.fastq
ERR011348_1.fastq
ERR011348_2.fastq
ERR011349_1.fastq
ERR011349_2.fastq
Perform the analysis steps that are described in more details in the Usage Tutorial:

Step 1: Run metaWRAP-Read_qc to trim the reads and remove human contamination:
[user@cn0911 ~]$mkdir READ_QC
[user@cn0911 ~]$metawrap read_qc -1 RAW_READS/ERR011347_1.fastq -2 RAW_READS/ERR011347_2.fastq -t 24 -o READ_QC/ERR011347
[user@cn0911 ~]$metawrap read_qc -1 RAW_READS/ERR011348_1.fastq -2 RAW_READS/ERR011348_2.fastq -t 24 -o READ_QC/ERR011348
[user@cn0911 ~]$metawrap read_qc -1 RAW_READS/ERR011349_1.fastq -2 RAW_READS/ERR011349_2.fastq -t 24 -o READ_QC/ERR011349
[user@cn0911 ~]$mkdir CLEAN_READS
[user@cn0911 ~]$for i in READ_QC/*; do 
                	b=${i#*/}
                	mv ${i}/final_pure_reads_1.fastq CLEAN_READS/${b}_1.fastq
                	mv ${i}/final_pure_reads_2.fastq CLEAN_READS/${b}_2.fastq
                   done 
Step 2: Assemble the metagenomes with the metaWRAP-Assembly module:
[user@cn0911 ~]$cat CLEAN_READS/ERR*_1.fastq > CLEAN_READS/ALL_READS_1.fastq
[user@cn0911 ~]$cat CLEAN_READS/ERR*_2.fastq > CLEAN_READS/ALL_READS_2.fastq
[user@cn0911 ~]$metawrap assembly -1 CLEAN_READS/ALL_READS_1.fastq \
                                     -2 CLEAN_READS/ALL_READS_2.fastq \
                                     -m 200 -t 96 --use-metaspades -o ASSEMBLY
Step 3: Run Kraken module on both reads and the assembly:
[user@cn0911 ~]$metawrap kraken -o KRAKEN -t 96 -s 1000000 CLEAN_READS/ERR*fastq ASSEMBLY/final_assembly.fasta
Step 4: Bin the co-assembly with three different algorithms with the Binning module:
[user@cn0911 ~]$metawrap binning -o INITIAL_BINNING -t 96 -a ASSEMBLY/final_assembly.fasta \
                   --metabat2 --maxbin2 --concoct CLEAN_READS/ERR*fastq 
Step 5: Consolidate bin sets with the Bin_refinement module:
[user@cn0911 ~]$metawrap bin_refinement -o BIN_REFINEMENT -t 96 \
                                           -A INITIAL_BINNING/metabat2_bins/ \
                                           -B INITIAL_BINNING/maxbin2_bins/ \
                                           -C INITIAL_BINNING/concoct_bins/ \
                                           -c 50 -x 10 
Step 6: Visualize the community and the extracted bins with the Blobology module:
[user@cn0911 ~]$metawrap blobology -a ASSEMBLY/final_assembly.fasta \
                                      -t 96 -o BLOBOLOGY \
                                      --bins BIN_REFINEMENT/metawrap_50_10_bins CLEAN_READS/ERR*fastq 
Step 7: Find the abundaces of the draft genomes (bins) across the samples:
[user@cn0911 ~]$metawrap quant_bins -b BIN_REFINEMENT/metawrap_50_10_bins \
                                       -o QUANT_BINS \
                                       -a ASSEMBLY/final_assembly.fasta CLEAN_READS/ERR*fastq 
Step 8: Re-assemble the consolidated bin set with the Reassemble_bins module:
[user@cn0911 ~]$metawrap reassemble_bins -o BIN_REASSEMBLY \
                                            -1 CLEAN_READS/ALL_READS_1.fastq \
                                            -2 CLEAN_READS/ALL_READS_2.fastq \
                                            -t 96 -m 800 -c 50 -x 10 \
                                            -b BIN_REFINEMENT/metawrap_50_10_bins 
Step 9: Determine the taxonomy of each bin with the Classify_bins module:
[user@cn0911 ~]$metawrap classify_bins -b BIN_REASSEMBLY/reassembled_bins \
                                          -o BIN_CLASSIFICATION -t 48
Step 10: Functionally annotate bins with the Annotate_bins module
[user@cn0911 ~]$metaWRAP annotate_bins -o FUNCT_ANNOT -t 96 -b BIN_REASSEMBLY/reassembled_bins/
End the interactive session:
[user@cn0911 ~]$ exit
salloc.exe: Relinquishing job allocation 46116226
[user@biowulf ~]$