High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
QIIME

QIIME logo QIIME (pronounced "chime") stands for Quantitative Insights Into Microbial Ecology. QIIME is an open source software package for comparison and analysis of microbial communities, primarily based on high-throughput amplicon sequencing data (such as SSU rRNA) generated on a variety of platforms, but also supporting analysis of other types of data (such as shotgun metagenomic data). QIIME takes users from their raw sequencing output through initial analyses such as OTU picking, taxonomic assignment, and construction of phylogenetic trees from representative sequences of OTUs, and through downstream statistical analysis, visualization, and production of publication-quality graphics. QIIME has been applied to single studies based on billions of sequences from thousands of samples.

References:

NOTE: QIIME2 is now available on Biowulf!

There are multiple versions of QIIME available. An easy way of selecting the version is to use modules. To see the modules available, type

module avail qiime

To select a module, type

module load qiime/[ver]

where [ver] is the version of choice.

Environment variables set:

How To Use

Warning: QIIME is a complex pipelining application. It is not fully tested on Helix Systems. There may be some features that are not implemented yet, so please report any anomalous or missing behavior to staff@hpc.nih.gov

QIIME is a collection of a multitude of scripts and executables. You could run the scripts directly from the commandline, but it is generally easier to create a script. A good place to start is by following the tutorial available at http://qiime.org/tutorials/.

The default configuration for QIIME is set via the file $QIIME_CONFIG_FP. You may override these defaults by creating your own version of this file in your /home directory:

cp $QIIME_CONFIG_FP ~/.qiime_config

and then edit the ~/.qiime_config file to your liking before running your QIIME scripts.

Parallel QIIME:

There are a number of scripts within QIIME that can parallelize. The default number of parallel steps is set by the jobs_to_start value in the qiime_config file. This value can be overidden using the -O option (in most cases):

pick_closed_reference_otus.py -a -O 4 ...

It is important to allocate one extra cpu when running QIIME on Biowulf!. For example, if you intend to run a parallelized script on 8 cpus, you need to allocate 9 cpus. Otherwise, you may have a poller process leftover, and the job will never end.

For more information, see http://qiime.org/tutorials/parallel_qiime.html.

Scratch Directory:

QIIME makes extensive use of temporary diskspace. The location of this space is determined by the configuration directive temp_dir. By default, this is set to /scratch/$USER, where $USER is your username.

When running on Biowulf, it is advantageous to allocate local scratch space, rather than use the shared global /scratch space, as I/O operations are 2-3x faster using local scratch. This is done by first modifying your ~/.qiime_config file:

$ cat ~/.qiime_config
temp_dir	/lscratch/$SLURM_JOBID

then allocating the space with the sbatch command (for example allocating 50GB of local scratch space):

sbatch ... other options ... --gres=lscratch:50 [ batch script ]

See /docs/userguide.html#local for more information on using local scratch space on a node.

Standard Reference Files:

Once the QIIME module is loaded and your environment is set, then standard reference files can be found in:

$ ls /fdb/QIIME

These include Silva, OTUS, and UNITE files.

USEARCH:

USEARCH is a sequence search and clustering tool. By default, QIIME will use an older, 32-bit version of USEARCH as its method when the option -m usearch is given. We have a license for the 64-bit version of usearch. To run methods with newer versions, include

-m usearch61

in your QIIME commands.

Newer, faster versions of USEARCH are not completely compatible with QIIME. However, USEARCH can be used stand-alone. See here for information on using USEARCH on HPC @ NIH.

Visualizing Results:

Some of the output files will be html format. These can be visualized most easily by mounting your /home or /data areas on your desktop machine, and then opening them with a web browser of your choice. Other file formats may require third-party applications for display.

On Helix

Sample session:

$ module load qiime
$ pick_de_novo_otus.py -i $PWD/seqs.fna -o $PWD/derep_uc/ -p $PWD/dereplication_params.txt
Batch job on Biowulf

Create a batch input file (e.g. qiime.sh). For example:

#!/bin/bash
module load qiime
compare_categories.py --method anosim \
-i unweighted_unifrac_dm.txt -m map.txt \
-c HOST_SUBJECT_ID -o anosim_out -n 999

Submit this job using the Slurm sbatch command.

$ sbatch --cpus-per-task=1 qiime.sh
Interactive job on Biowulf

Make sure to allocate at least one extra cpu when running parallel jobs. For example, if you intend on using 8 cpus in a parallel job, allocate 9.

$ sinteractive --cpus-per-task=9 --mem-per-cpu=2g --gres=lscratch:50

Load the module and run the commands:

module load qiime
parallel_assign_taxonomy_blast.py -O 8 -i $PWD/inseqs.fasta -t $PWD/id_to_tax.txt -r $PWD/refseqs.fasta -o $PWD/blast_assigned_taxonomy/

NOTE: The command qiime tools view launches a web browser to display results. This does not function correctly when run on the cluster. Users are advised to only run this command from a login host, such as Helix or Biowulf.

Documentation