Biowulf High Performance Computing at the NIH
Picrust on Biowulf

PICRUSt (pronounced “pie crust”) is a bioinformatics software package designed to predict metagenome functional content from marker gene (e.g., 16S rRNA) surveys and full genomes.

Example file can be copied from /usr/local/apps/picrust/tutorials

Interactive Job on Biowulf


[user@biowulf]$ sinteractive
salloc.exe: Pending job allocation 46116226
salloc.exe: job 46116226 queued and waiting for resources
salloc.exe: job 46116226 has been allocated resources
salloc.exe: Granted job allocation 46116226
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn3144 are ready for job

[user@cn3144 ~]$ module load picrust

[user@cn3144 ~]$ cp /usr/local/apps/picrust/tutorials /data/$USER/

[user@cn3144 ~]$ cd /data/$USER/tutorials

[user@cn3144 ~]$ unzip picrust_starting_files.zip

[user@cn3144 ~]$ cd picrust_starting_files

[user@cn3144 ~]$ format_tree_and_trait_table.py -t GG_tree.nwk -i IMG_16S_counts.tab -m GG_to_IMGv350.txt -o format/16S/

[user@cn3144 ~]$ exit
salloc.exe: Relinquishing job allocation 46116226
[user@biowulf ~]$

Running a single batch job on Biowulf

1. Create a script file similar to the lines below.

#!/bin/bash

module load picrust
cd /data/$USER/picrust_starting_files
format_tree_and_trait_table.py -t GG_tree.nwk -i IMG_16S_counts.tab -m GG_to_IMGv350.txt -o format/16S/

2. Submit the script on biowulf:

$ sbatch jobscript

For more memory requirement (default 4gb), use --mem flag:

$ sbatch --mem=10g jobscript

Running a swarm of jobs on Biowulf

Setup a swarm command file:

  cd /data/$USER/dir1; format_tree_and_trait_table.py -t GG_tree.nwk -i IMG_16S_counts.tab -m GG_to_IMGv350.txt -o format/16S/
  cd /data/$USER/dir2; format_tree_and_trait_table.py -t GG_tree.nwk -i IMG_16S_counts.tab -m GG_to_IMGv350.txt -o format/16S/
  cd /data/$USER/dir3; format_tree_and_trait_table.py -t GG_tree.nwk -i IMG_16S_counts.tab -m GG_to_IMGv350.txt -o format/16S/
	[......]

Submit the swarm file:

  $ swarm -f swarmfile --module picrust

-f: specify the swarmfile name
--module: load the required module for each command line in the file

For more memory requirement (default 1.5gb each line of commands), use -g flag :

  $ swarm -f swarmfile -g 10 --module picrust

For more information regarding running swarm, see swarm.html

Documentation

http://picrust.github.io/picrust/tutorials/genome_prediction.html#genome-prediction-tutorial