metaWRAP: a flexible pipeline for genome-resolved metagenomic data analysis
MetaWRAP is a modular pipeline for shotgun metagenomic data analysis. It deploys state-of-the-art software to handle metagenomic data processing starting from raw sequencing reads and ending in metagenomic bins and their analysis. It includes hybrid algorithms that leverage the strengths of a variety of software to extract and refine high-quality bins from metagenomic data through bin consolidation and reassembly.
References:
- Gherman V. Uritskiy, Jocelyne DiRuggiero & James Taylor,
MetaWRAP—a flexible pipeline for genome-resolved metagenomic data analysis
BMC Microbiome 6, Article number: 158 (2018).
Documentation
Important Notes
- Module Name: metawrap (see the modules page for more information)
- Unusual environment variables set
- MW_HOME installation directory
- MW_BIN executable directory
- MW_SRC source code directory
- MW_DATA sample data directory
Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.
Allocate an interactive session and run the program. Sample session:
[user@biowulf]$ sinteractive --mem=64g --gres=lscratch:20 -c48 --time=24:00:00 [user@cn0911 ~]$module load metawrap [+] Loading blast 2.14.0+ ... [+] Loading bwa 0.7.17 on cn4272 [+] Loading bowtie 2-2.3.5 [+] Loading kraken 1.1 [+] Loading kronatools 2.8.1 on cn4272 [+] Loading perl 5.34.0 on cn4272 [+] Loading samtools 1.9 ... [+] Loading singularity 4.0.1 on cn4272 [+] Loading salmon 1.7.0 [+] Loading prokka 1.14.6 [+] Loading metabat 2.15 on cn4272 [+] Loading quast 5.2.0 on cn4272 [+] Loading cutadapt 4.4 [+] Loading eigen 3.4.0-1092574b ... [+] Loading trimgalore 0.6.6 ... [+] Loading fastqc 0.11.8 [+] Loading checkm2 1.0.2 [+] Loading megahit, version 1.2.9... [+] Loading spades 3.15.5 [+] Loading gcc 11.3.0 ... [+] Loading HDF5 1.12.2 [+] Loading netcdf 4.9.0 [+] Loading openmpi/4.1.3/gcc-11.3.0 ... [+] Loading pandoc 2.18 on cn4272 [+] Loading pcre2 10.40 [+] Loading R 4.3.0 [+] Loading metawrap 1.3.2 ... [user@cn0911 ~]$mkdir /data/$USER/metawrap && cd /data/$USER/metawrap [user@cn0911 ~]$cp -r $MW_SRC/* .Download sample data, unzip the data and place them into folder RAW_READS:
[user@cn0911 ~]$cp $MW_DATA/* . [user@cn0911 ~]$gunzip *.gz [user@cn0911 ~]$mkdir RAW_READS && mv *fastq RAW_READS [user@cn0911 ~]$ls RAW_READS ERR011347_1.fastq ERR011347_2.fastq ERR011348_1.fastq ERR011348_2.fastq ERR011349_1.fastq ERR011349_2.fastqPerform the analysis steps that are described in more details in the Usage Tutorial:
Step 1: Run metaWRAP-Read_qc to trim the reads and remove human contamination:
[user@cn0911 ~]$mkdir -p READ_QC/011347 [user@cn0911 ~]$metawrap read_qc -1 RAW_READS/ERR011347_1.fastq -2 RAW_READS/ERR011347_2.fastq -t 48 -o READ_QC/011347 ... real 31m13.819s user 29m14.221s sys 2m51.263s [user@cn0911 ~]$mkdir -p READ_QC/011348 [user@cn0911 ~]$metawrap read_qc -1 RAW_READS/ERR011348_1.fastq -2 RAW_READS/ERR011348_2.fastq -t 48 -o READ_QC/011348 ... real 17m41.812s user 17m25.779s sys 1m35.917s [user@cn0911 ~]$mkdir -p READ_QC/113498 [user@cn0911 ~]$metawrap read_qc -1 RAW_READS/ERR011349_1.fastq -2 RAW_READS/ERR011349_2.fastq -t 48 -o READ_QC/011349 ... real 39m34.887s user 39m51.397s sys 1m34.431sStep 2: Assemble the metagenomes with the metaWRAP-Assembly module:
[user@cn0911 ~]$mkdir -p CLEAN_READS ASSEMBLY [user@cn0911 ~]$for i in READ_QC/*; do b=${i#*/} mv ${i}/final_pure_reads_1.fastq CLEAN_READS/ERR${b}_1.fastq mv ${i}/final_pure_reads_2.fastq CLEAN_READS/ERR${b}_2.fastq done [user@cn0911 ~]$cat CLEAN_READS/ERR*_1.fastq > CLEAN_READS/ALL_READS_1.fastq [user@cn0911 ~]$cat CLEAN_READS/ERR*_2.fastq > CLEAN_READS/ALL_READS_2.fastq [user@cn0911 ~]$metawrap assembly -1 CLEAN_READS/ALL_READS_1.fastq \ -2 CLEAN_READS/ALL_READS_2.fastq \ -m 200 -t 48 --metaspades -o ASSEMBLY ... real 55m28.629s user 966m36.557s sys 19m15.560sStep 3: Run Kraken module on both reads and the assembly:
[user@cn0911 ~]$metawrap kraken -o KRAKEN -t 48 -s 1000000 CLEAN_READS/ERR*fastq ASSEMBLY/final_assembly.fasta ... real 91m23.363s user 10m51.474s sys 57m11.215sStep 4: Bin the co-assembly with three different algorithms with the Binning module:
[user@cn0911 ~]$metawrap binning -o INITIAL_BINNING -t 48 -a ASSEMBLY/final_assembly.fasta \ --metabat2 --maxbin2 --concoct CLEAN_READS/ERR*fastq ... real 8m42.032s user 17m53.586s sys 9m8.455sStep 5: Consolidate bin sets with the Bin_refinement module:
[user@cn0911 ~]$metawrap bin_refinement -o BIN_REFINEMENT -t 48 \ -A INITIAL_BINNING/metabat2_bins/ \ -B INITIAL_BINNING/maxbin2_bins/ \ -C INITIAL_BINNING/concoct_bins/ \ -c 50 \ -x 10 ... real 45m13.304s user 116m11.825s sys 5m22.803sStep 6: Visualize the community and the extracted bins with the Blobology module:
[user@cn0911 ~]$metawrap blobology -a ASSEMBLY/final_assembly.fasta \ -t 48 -o BLOBOLOGY \ --bins BIN_REFINEMENT/metawrap_50_10_bins CLEAN_READS/ERR*fastq ... real 57m33.652s user 185m26.082s sys 166m38.753sStep 7: Find the abundaces of the draft genomes (bins) across the samples:
[user@cn0911 ~]$metawrap quant_bins -b BIN_REFINEMENT/metawrap_50_10_bins \ -o QUANT_BINS \ -a ASSEMBLY/final_assembly.fasta CLEAN_READS/ERR*fastq ... real 5m41.060s user 6m58.301s sys 0m7.729sStep 8: Re-assemble the consolidated bin set with the Reassemble_bins module:
[user@cn0911 ~]$metawrap reassemble_bins -o BIN_REASSEMBLY \ -1 CLEAN_READS/ALL_READS_1.fastq \ -2 CLEAN_READS/ALL_READS_2.fastq \ -t 48 -m 800 -c 50 -x 10 \ -b BIN_REFINEMENT/metawrap_50_10_bins ... real 106m54.790s user 496m15.997s sys 45m18.069sStep 9: Determine the taxonomy of each bin with the Classify_bins module:
[user@cn0911 ~]$metawrap classify_bins -b BIN_REASSEMBLY/reassembled_bins \ -o BIN_CLASSIFICATION -t 48Step 10: Functionally annotate bins with the Annotate_bins module
[user@cn0911 ~]$metaWRAP annotate_bins -o FUNCT_ANNOT -t 48 -b BIN_REASSEMBLY/reassembled_bins/End the interactive session:
[user@cn0911 ~]$ exit salloc.exe: Relinquishing job allocation 46116226 [user@biowulf ~]$