Biowulf High Performance Computing at the NIH
SMRT Analysis on Biowulf

SMRT® Analysis is a bioinformatics software suite available for analysis of DNA sequencing data from Pacific Biosciences’ SMRT technology. Users can choose from a variety of analysis protocols that utilize PacBio® and third-party tools. Analysis protocols include de novo genome assembly, cDNA mapping, DNA base-modification detection, and long-amplicon analysis to determine phased consensus sequences.

Important Notes

Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

This is a sample interactive session of the lambda phage site acceptance test done on the local node using pbcromwell. (user input in bold):

[teacher@biowulf ~]$ sinteractive --cpus-per-task=12
salloc.exe: Pending job allocation 43027948
salloc.exe: job 43027948 queued and waiting for resources
salloc.exe: job 43027948 has been allocated resources
salloc.exe: Granted job allocation 43027948
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn3109 are ready for job
srun: error: x11: no local DISPLAY defined, skipping
[teacher@cn3109 smrtanalysis]$ module load smrtanalysis
[+] Loading smrtanalysis
[teacher@cn3109 ~]$ mkdir /data/$USER/smrtanalysis
[teacher@cn3109 ~]$ cd !$
[teacher@cn3109 smrtanalysis]$ pbcromwell configure
[INFO] 2019-10-30 18:32:18,797Z [pbcromwell.cli] Using pbcommand v1.9.2
[INFO] 2019-10-30 18:32:18,798Z [pbcromwell.cli] completed setting up logger with 
[INFO] 2019-10-30 18:32:18,798Z [pbcromwell.cli] log opts {'file_name': None, 'level': 20}
[WARNING] 2019-10-30 18:32:18,798Z [pbcromwell.cli] No database port specified - will run with in-memory DB
[INFO] 2019-10-30 18:32:18,800Z [pbcromwell.cli] Wrote config file to cromwell.conf
[INFO] 2019-10-30 18:32:18,801Z [pbcromwell.cli] exiting with return code 0 in 0.00 sec.
[teacher@cn3109 smrtanalysis]$ ls
[teacher@cn3109 smrtanalysis]$ pbcromwell show-workflows
[INFO] 2019-10-30 18:33:09,770Z [pbcromwell.cli] Using pbcommand v1.9.2
[INFO] 2019-10-30 18:33:09,770Z [pbcromwell.cli] completed setting up logger with 
[INFO] 2019-10-30 18:33:09,770Z [pbcromwell.cli] log opts {'file_name': None, 'level': 20}
[INFO] 2019-10-30 18:33:09,770Z [pbcromwell.cli] SMRT_PIPELINE_BUNDLE_DIR="/usr/local/apps/smrtanalysis/"

cromwell.workflows.pb_hgap4: Assembly (HGAP4)
cromwell.workflows.pb_basemods: Base Modification Analysis
cromwell.workflows.pb_ccs_mapping: CCS with Mapping
cromwell.workflows.pb_ccs: Circular Consensus Sequencing (CCS)
cromwell.workflows.pb_bam2fastx: Convert BAM to FASTX
cromwell.workflows.pb_demux_ccs: Demultiplex Barcodes
cromwell.workflows.pb_demux_subreads: Demultiplex Barcodes
cromwell.workflows.pb_isoseq3: Iso-Seq
cromwell.workflows.pb_isoseq3_ccsonly: Iso-Seq
cromwell.workflows.pb_laa: Long Amplicon Analysis (LAA)
cromwell.workflows.pb_align_ccs: Mapping
cromwell.workflows.pb_assembly_microbial: Microbial Assembly
cromwell.workflows.pb_mv_ccs: Minor Variants Analysis
cromwell.workflows.pb_resequencing: Resequencing
cromwell.workflows.pb_sat: Site Acceptance Test (SAT)
cromwell.workflows.pb_sv_ccs: Structural Variant Calling
cromwell.workflows.pb_sv_clr: Structural Variant Calling

Run 'pbcromwell show-workflow-details ' to display further
information about a workflow.  Note that the cromwell.workflows.
prefix is optional.
[INFO] 2019-10-30 18:33:09,840Z [pbcromwell.cli] exiting with return code 0 in 0.07 sec.
[teacher@cn3109 smrtanalysis]$ pbcromwell show-workflow-details pb_sat
[INFO] 2019-10-30 18:33:36,290Z [pbcromwell.cli] Using pbcommand v1.9.2
[INFO] 2019-10-30 18:33:36,290Z [pbcromwell.cli] completed setting up logger with 
[INFO] 2019-10-30 18:33:36,290Z [pbcromwell.cli] log opts {'file_name': None, 'level': 20}
[INFO] 2019-10-30 18:33:36,290Z [pbcromwell.cli] SMRT_PIPELINE_BUNDLE_DIR="/usr/local/apps/smrtanalysis/"

Pipeline Summary
Pipeline Id: cromwell.workflows.pb_sat
Name       : Site Acceptance Test (SAT)
Description: Cromwell workflow pb_sat
EntryPoints: 2
  eid_ref_dataset -> PacBio.DataSet.ReferenceSet
  eid_subread -> PacBio.DataSet.SubreadSet
Tags       : cromwell, mapping
Task Options:
  dataset_filters =
  downsample_factor = 0

[INFO] 2019-10-30 18:33:36,291Z [pbcromwell.cli] exiting with return code 0 in 0.00 sec.
[teacher@cn3109 smrtanalysis]$ pbcromwell run pb_sat \
 --entry /fdb/smrtanalysis/canneddata/lambdaTINY/m54026_181219_010936_tiny.subreadset.xml \
 --entry /fdb/smrtanalysis/canneddata/referenceset/lambdaNEB/referenceset.xml \
 --config $PWD/cromwell.conf \
 --output-dir sat 

Batch job
Most jobs should be run as batch jobs.

While pbcromwell can be run with the slurm backend to submit jobs while the supervisor process runs in an interactive session, we recommend queuing the supervisor process itself. Create a batch input file (e.g. as follows:

set -e

module load smrtanalysis

### cromwell configuration
pbcromwell configure
# The generated cromwell.conf cannot be used as is. The following command makes the necessary adjustments.
sed -i -r \
  -e 's%(sbatch|scancel|squeue)%/usr/local/slurm/bin/\1%' `# use absolute paths for slurm commands (slurm directory is purged from $PATH by PacBio wrapper)` \
  -e '/job-id-regex/s/(job-id-regex\s+=\s+).*/\1 "(\\\\d+)"/' `# NIH HPC's sbatch has non-standard output` \

### run workflow
pbcromwell run pb_sat \
 --entry /fdb/smrtanalysis/canneddata/lambdaTINY/m54026_181219_010936_tiny.subreadset.xml \
 --entry /fdb/smrtanalysis/canneddata/referenceset/lambdaNEB/referenceset.xml \
 --config $PWD/cromwell.conf \
 --backend slurm \
 --output-dir sat 

Submit this job using the Slurm sbatch command, requesting a walltime of the expected run length of the whole pipeline.

sbatch [--time=#]