MetaWRAP is a modular pipeline for shotgun metagenomic data analysis. It deploys state-of-the-art software to handle metagenomic data processing starting from raw sequencing reads and ending in metagenomic bins and their analysis. It includes hybrid algorithms that leverage the strengths of a variety of software to extract and refine high-quality bins from metagenomic data through bin consolidation and reassembly.
Allocate an interactive session and run the program. Sample session:
[user@biowulf]$ sinteractive --mem=64g --gres=lscratch:20 -c48 --time=24:00:00 [user@cn0911 ~]$module load metawrap [+] Loading blast 2.14.0+ ... [+] Loading bwa 0.7.17 on cn4272 [+] Loading bowtie 2-2.3.5 [+] Loading kraken 1.1 [+] Loading kronatools 2.8.1 on cn4272 [+] Loading perl 5.34.0 on cn4272 [+] Loading samtools 1.9 ... [+] Loading singularity 4.0.1 on cn4272 [+] Loading salmon 1.7.0 [+] Loading prokka 1.14.6 [+] Loading metabat 2.15 on cn4272 [+] Loading quast 5.2.0 on cn4272 [+] Loading cutadapt 4.4 [+] Loading eigen 3.4.0-1092574b ... [+] Loading trimgalore 0.6.6 ... [+] Loading fastqc 0.11.8 [+] Loading checkm2 1.0.2 [+] Loading megahit, version 1.2.9... [+] Loading spades 3.15.5 [+] Loading gcc 11.3.0 ... [+] Loading HDF5 1.12.2 [+] Loading netcdf 4.9.0 [+] Loading openmpi/4.1.3/gcc-11.3.0 ... [+] Loading pandoc 2.18 on cn4272 [+] Loading pcre2 10.40 [+] Loading R 4.3.0 [+] Loading metawrap 1.3.2 ... [user@cn0911 ~]$mkdir /data/$USER/metawrap && cd /data/$USER/metawrap [user@cn0911 ~]$cp -r $MW_SRC/* .Download sample data, unzip the data and place them into folder RAW_READS:
[user@cn0911 ~]$cp $MW_DATA/* . [user@cn0911 ~]$gunzip *.gz [user@cn0911 ~]$mkdir RAW_READS && mv *fastq RAW_READS [user@cn0911 ~]$ls RAW_READS ERR011347_1.fastq ERR011347_2.fastq ERR011348_1.fastq ERR011348_2.fastq ERR011349_1.fastq ERR011349_2.fastqPerform the analysis steps that are described in more details in the Usage Tutorial:
[user@cn0911 ~]$mkdir -p READ_QC/011347 [user@cn0911 ~]$metawrap read_qc -1 RAW_READS/ERR011347_1.fastq -2 RAW_READS/ERR011347_2.fastq -t 48 -o READ_QC/011347 ... real 31m13.819s user 29m14.221s sys 2m51.263s [user@cn0911 ~]$mkdir -p READ_QC/011348 [user@cn0911 ~]$metawrap read_qc -1 RAW_READS/ERR011348_1.fastq -2 RAW_READS/ERR011348_2.fastq -t 48 -o READ_QC/011348 ... real 17m41.812s user 17m25.779s sys 1m35.917s [user@cn0911 ~]$mkdir -p READ_QC/113498 [user@cn0911 ~]$metawrap read_qc -1 RAW_READS/ERR011349_1.fastq -2 RAW_READS/ERR011349_2.fastq -t 48 -o READ_QC/011349 ... real 39m34.887s user 39m51.397s sys 1m34.431sStep 2: Assemble the metagenomes with the metaWRAP-Assembly module:
[user@cn0911 ~]$mkdir -p CLEAN_READS ASSEMBLY [user@cn0911 ~]$for i in READ_QC/*; do b=${i#*/} mv ${i}/final_pure_reads_1.fastq CLEAN_READS/ERR${b}_1.fastq mv ${i}/final_pure_reads_2.fastq CLEAN_READS/ERR${b}_2.fastq done [user@cn0911 ~]$cat CLEAN_READS/ERR*_1.fastq > CLEAN_READS/ALL_READS_1.fastq [user@cn0911 ~]$cat CLEAN_READS/ERR*_2.fastq > CLEAN_READS/ALL_READS_2.fastq [user@cn0911 ~]$metawrap assembly -1 CLEAN_READS/ALL_READS_1.fastq \ -2 CLEAN_READS/ALL_READS_2.fastq \ -m 200 -t 48 --metaspades -o ASSEMBLY ... real 55m28.629s user 966m36.557s sys 19m15.560sStep 3: Run Kraken module on both reads and the assembly:
[user@cn0911 ~]$metawrap kraken -o KRAKEN -t 48 -s 1000000 CLEAN_READS/ERR*fastq ASSEMBLY/final_assembly.fasta ... real 91m23.363s user 10m51.474s sys 57m11.215sStep 4: Bin the co-assembly with three different algorithms with the Binning module:
[user@cn0911 ~]$metawrap binning -o INITIAL_BINNING -t 48 -a ASSEMBLY/final_assembly.fasta \ --metabat2 --maxbin2 --concoct CLEAN_READS/ERR*fastq ... real 8m42.032s user 17m53.586s sys 9m8.455sStep 5: Consolidate bin sets with the Bin_refinement module:
[user@cn0911 ~]$metawrap bin_refinement -o BIN_REFINEMENT -t 48 \ -A INITIAL_BINNING/metabat2_bins/ \ -B INITIAL_BINNING/maxbin2_bins/ \ -C INITIAL_BINNING/concoct_bins/ \ -c 50 \ -x 10 ... real 45m13.304s user 116m11.825s sys 5m22.803sStep 6: Visualize the community and the extracted bins with the Blobology module:
[user@cn0911 ~]$metawrap blobology -a ASSEMBLY/final_assembly.fasta \ -t 48 -o BLOBOLOGY \ --bins BIN_REFINEMENT/metawrap_50_10_bins CLEAN_READS/ERR*fastq ... real 57m33.652s user 185m26.082s sys 166m38.753sStep 7: Find the abundaces of the draft genomes (bins) across the samples:
[user@cn0911 ~]$metawrap quant_bins -b BIN_REFINEMENT/metawrap_50_10_bins \ -o QUANT_BINS \ -a ASSEMBLY/final_assembly.fasta CLEAN_READS/ERR*fastq ... real 5m41.060s user 6m58.301s sys 0m7.729sStep 8: Re-assemble the consolidated bin set with the Reassemble_bins module:
[user@cn0911 ~]$metawrap reassemble_bins -o BIN_REASSEMBLY \ -1 CLEAN_READS/ALL_READS_1.fastq \ -2 CLEAN_READS/ALL_READS_2.fastq \ -t 48 -m 800 -c 50 -x 10 \ -b BIN_REFINEMENT/metawrap_50_10_bins ... real 106m54.790s user 496m15.997s sys 45m18.069sStep 9: Determine the taxonomy of each bin with the Classify_bins module:
[user@cn0911 ~]$metawrap classify_bins -b BIN_REASSEMBLY/reassembled_bins \ -o BIN_CLASSIFICATION -t 48Step 10: Functionally annotate bins with the Annotate_bins module
[user@cn0911 ~]$metaWRAP annotate_bins -o FUNCT_ANNOT -t 48 -b BIN_REASSEMBLY/reassembled_bins/End the interactive session:
[user@cn0911 ~]$ exit salloc.exe: Relinquishing job allocation 46116226 [user@biowulf ~]$