High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
Picrust on Biowulf & Helix

PICRUSt (pronounced “pie crust”) is a bioinformatics software package designed to predict metagenome functional content from marker gene (e.g., 16S rRNA) surveys and full genomes.

Example file can be copied from /usr/local/apps/picrust/tutorials

Running on Helix

$ module load picrust
$ cp /usr/local/apps/picrust/tutorials /data/$USER/
$ cd /data/$USER/tutorials 
$ unzip picrust_starting_files.zip
$ cd picrust_starting_files
$ format_tree_and_trait_table.py -t GG_tree.nwk -i IMG_16S_counts.tab -m GG_to_IMGv350.txt -o format/16S/

Running a single batch job on Biowulf

1. Create a script file similar to the lines below.

#!/bin/bash

module load picrust
cd /data/$USER/picrust_starting_files
format_tree_and_trait_table.py -t GG_tree.nwk -i IMG_16S_counts.tab -m GG_to_IMGv350.txt -o format/16S/

2. Submit the script on biowulf:

$ sbatch jobscript

For more memory requirement (default 4gb), use --mem flag:

$ sbatch --mem=10g jobscript

Running a swarm of jobs on Biowulf

Setup a swarm command file:

  cd /data/$USER/dir1; format_tree_and_trait_table.py -t GG_tree.nwk -i IMG_16S_counts.tab -m GG_to_IMGv350.txt -o format/16S/
  cd /data/$USER/dir2; format_tree_and_trait_table.py -t GG_tree.nwk -i IMG_16S_counts.tab -m GG_to_IMGv350.txt -o format/16S/
  cd /data/$USER/dir3; format_tree_and_trait_table.py -t GG_tree.nwk -i IMG_16S_counts.tab -m GG_to_IMGv350.txt -o format/16S/
	[......]

Submit the swarm file:

  $ swarm -f swarmfile --module picrust

-f: specify the swarmfile name
--module: load the required module for each command line in the file

For more memory requirement (default 1.5gb each line of commands), use -g flag :

  $ swarm -f swarmfile -g 10 --module picrust

For more information regarding running swarm, see swarm.html

Running an interactive job on Biowulf

It may be useful for debugging purposes to run jobs interactively. Such jobs should not be run on the Biowulf login node. Instead allocate an interactive node as described below, and run the interactive job there.

biowulf$ sinteractive 
salloc.exe: Granted job allocation 16535

cn999$ module load picrust
cn999$ cd /data/$USER/dir
cn999$ format_tree_and_trait_table.py -t GG_tree.nwk -i IMG_16S_counts.tab -m GG_to_IMGv350.txt -o format/16S/

cn999$ exit

biowulf$

Make sure to exit the job once finished.

If more memory is needed, use --mem flag. For example

biowulf$ sinteractive --mem=10g

Documentation

http://picrust.github.io/picrust/tutorials/genome_prediction.html#genome-prediction-tutorial