High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
PBSuite

The PBSuite contains two projects created for analysis of Pacific Biosciences long-read sequencing data: PBHoney and PBJelly.

PBHoney is an implementation of two variant-identification approaches designed to exploit the high mappability of long reads (i.e., greater than 10,000 bp). PBHoney considers both intra-read discordance and soft-clipped tails of long reads to identify structural variants.

PBJelly is a highly automated pipeline that aligns long sequencing reads (such as PacBio RS reads or long 454 reads in fasta format) to high-confidence draft assembles. PBJelly fills or reduces as many captured gaps as possible to produce upgraded draft genomes.

References:

There may be multiple versions of pbsuite available. An easy way of selecting the version is to use modules. To see the modules available, type

module avail pbsuite

To select a module, type

module load pbsuite/[ver]

where [ver] is the version of choice.

Environment variables set:

Interactive job on Biowulf

PBHoney

sinteractive
module load pbsuite
cp $PBSUITE_HOME/docs/honeyExample/* .
sh workflow.sh

PBJelly

sinteractive
module load pbsuite
cp -r $PBSUITE_HOME/docs/jellyExample/* .
sed -i "s|/__PATH__/_TO_/jellyExample|$PWD|g" Protocol.xml
for stage in setup mapping support extraction assembly output
do
    Jelly.py $stage Protocol.xml
done
# check your results
summarizeAssembly.py jelly.out.fasta
Batch job on Biowulf

Create a batch input file (e.g. pbjelly.sh). For example:

#!/bin/sh
module load pbsuite

cp -r $PBSUITE_HOME/docs/jellyExample/* .
sed -i "s|/__PATH__/_TO_/jellyExample|$PWD|g" Protocol.xml
for stage in setup mapping support extraction assembly output
do
    Jelly.py $stage Protocol.xml
done

Submit this job using the Slurm sbatch command.

sbatch pbjelly.sh
Documentation