QIIME2 on Biowulf
QIIME 2 is a powerful, extensible, and decentralized microbiome analysis package with a focus on data and analysis transparency. QIIME 2 enables researchers to start an analysis with raw DNA sequence data and finish with publication-quality figures and statistical results.
References:
- Bolyen E, Rideout JR, Dillon MR, Bokulich NA, et al. Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2 Nature Biotechnology 37: 852–857, 2019
Documentation
Important Notes
- Module Name: qiime or QIIME (see the modules page for more information)
- Some QIIME2 commands can utilize multple threads with --p-n-threads
- Newer releases of QIIME2 come in three distributions: amplicon, metagenome, and tiny. On Biowulf, starting from 2024.5, we provide the amplicon and metgenome distributions, the former being the default of the two. You can load the other with module load qiime/2-2024.5/metagenome
- QIIME2 is moving away from community driven plugins. If you wish to use plugins not available by default, please install QIIME in a conda environment in your data directory
- QIIME2 installations on Biowulf are container-based. To work with QIIME's python API in a python shell or a python script, a special executable is exposed, called qiime2-python
- Environment variables:
- $QIIME_EXAMPLES -- example files for tutorials
- Users connecting to Biowulf with X11-forwarding enabled, may encounter an error with
matplotlib not outputing .qzv files (using qiime2 demux summarize for example). The workaround
documented here is to
run the following before running qiime2 commands:
mkdir -p $HOME/.config/matplotlib/ echo "backend: Agg" >> $HOME/.config/matplotlib/matplotlibrc
Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.
Allocate an interactive session and run the program. Sample session (user input in bold):
[user@biowulf]$ sinteractive salloc.exe: Pending job allocation 46116226 salloc.exe: job 46116226 queued and waiting for resources salloc.exe: job 46116226 has been allocated resources salloc.exe: Granted job allocation 46116226 salloc.exe: Waiting for resource configuration salloc.exe: Nodes cn3144 are ready for job [user@cn3144 ~]$ module load QIIME [user@cn3144 ~]$ cp -R $QIIME_EXAMPLES/qiime2-moving-pictures-tutorial . [user@cn3144 ~]$ cd qiime2-moving-pictures-tutorial [user@cn3144 ~]$ qiime tools import --type EMPSingleEndSequences \ --input-path emp-single-end-sequences \ --output-path emp-single-end-sequences.qza [user@cn3144 ~]$ qiime demux emp-single \ --i-seqs emp-single-end-sequences.qza \ --m-barcodes-file sample-metadata.tsv \ --m-barcodes-column BarcodeSequence \ --output-dir demux-summarize-out \ --o-per-sample-sequences demux.qza [user@cn3144 ~]$ qiime demux summarize \ --i-data demux.qza \ --o-visualization demux.qzv [user@cn3144 ~]$ qiime dada2 denoise-single \ --i-demultiplexed-seqs demux.qza \ --p-trim-left 0 \ --p-trunc-len 120 \ --o-representative-sequences rep-seqs-dada2.qza \ --output-dir denoise-single-out \ --o-table table-dada2.qza [user@cn3144 ~]$ exit salloc.exe: Relinquishing job allocation 46116226 [user@biowulf ~]$
Batch job
Most jobs should be run as batch jobs.
Create a batch input file (e.g. QIIME.sh). For example:
#!/bin/bash module load QIIME cp -R $QIIME_EXAMPLES/qiime2-atacama-tutorial . cd qiime2-atacama-tutorial qiime tools import \ --type EMPPairedEndSequences \ --input-path emp-paired-end-sequences \ --output-path emp-paired-end-sequences.qza qiime demux emp-paired \ --verbose \ --m-barcodes-file sample-metadata.tsv \ --m-barcodes-category BarcodeSequence \ --i-seqs emp-paired-end-sequences.qza \ --o-per-sample-sequences demux \ --p-rev-comp-mapping-barcodes qiime demux summarize \ --verbose \ --i-data demux.qza \ --o-visualization demux.qzv qiime dada2 denoise-paired \ --verbose \ --p-n-threads $SLURM_CPUS_PER_TASK \ --i-demultiplexed-seqs demux.qza \ --o-table table \ --o-representative-sequences rep-seqs \ --p-trim-left-f 13 \ --p-trim-left-r 13 \ --p-trunc-len-f 150 \ --p-trunc-len-r 150 qiime feature-table summarize \ --verbose \ --i-table table.qza \ --o-visualization table.qzv \ --m-sample-metadata-file sample-metadata.tsv qiime feature-table tabulate-seqs \ --verbose \ --i-data rep-seqs.qza \ --o-visualization rep-seqs.qzv
Submit this job using the Slurm sbatch command.
sbatch --cpus-per-task=8 --mem=20g --gres:lscratch:20 --time=4:00:00 QIIME.sh
Swarm of Jobs
A swarm of jobs is an easy way to submit set of independent commands requiring identical resources.
Create a swarmfile (e.g. QIIME.swarm). For example:
qiime dada2 denoise-single \ --p-n-threads $SLURM_CPUS_PER_TASK \ --p-trim-left 13 \ --p-trunc-len 150 \ --i-demultiplexed-seqs fmt-tutorial-demux-1.qza \ --o-representative-sequences rep-seqs-1.qza \ --o-table table-1.qza qiime dada2 denoise-single \ --p-n-threads $SLURM_CPUS_PER_TASK \ --p-trim-left 13 \ --p-trunc-len 150 \ --i-demultiplexed-seqs fmt-tutorial-demux-2.qza \ --o-representative-sequences rep-seqs-2.qza \ --o-table table-2.qza qiime dada2 denoise-single \ --p-n-threads $SLURM_CPUS_PER_TASK \ --p-trim-left 13 \ --p-trunc-len 150 \ --i-demultiplexed-seqs fmt-tutorial-demux-3.qza \ --o-representative-sequences rep-seqs-3.qza \ --o-table table-3.qza qiime dada2 denoise-single \ --p-n-threads $SLURM_CPUS_PER_TASK \ --p-trim-left 13 \ --p-trunc-len 150 \ --i-demultiplexed-seqs fmt-tutorial-demux-4.qza \ --o-representative-sequences rep-seqs-4.qza \ --o-table table-4.qza
Submit this job using the swarm command.
swarm -f QIIME.swarm -g 10 -t 8 --module QIIMEwhere
-g # | Number of Gigabytes of memory required for each process (1 line in the swarm command file) |
-t # | Number of threads/CPUs required for each process (1 line in the swarm command file). |
--module QIIME | Loads the QIIME module for each subjob in the swarm |