SMRT Analysis on Biowulf

SMRT® Analysis is a bioinformatics software suite available for analysis of DNA sequencing data from Pacific Biosciences’ SMRT technology. Users can choose from a variety of analysis protocols that utilize PacBio® and third-party tools. Analysis protocols include de novo genome assembly, cDNA mapping, DNA base-modification detection, and long-amplicon analysis to determine phased consensus sequences.

Documentation
Important Notes

Platform/Version Compatibility

Support for older sequencing instruments is occasionally dropped. The following table shows the latest smrtanalysis versions that support the listed sequencer.
Sequencer First Supporting Version Last Supporting Version
Revio 12.0 (current)
Sequel II/IIe 7.0 13.1
Sequel 4.0 10.2
RS II 2.3 7.0

Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

This is a sample interactive session of the site acceptance test (SAT) done on the local node using pbcromwell. (user input in bold):

[teacher@biowulf ~]$ sinteractive --cpus-per-task=12
salloc.exe: Pending job allocation 43027948
salloc.exe: job 43027948 queued and waiting for resources
salloc.exe: job 43027948 has been allocated resources
salloc.exe: Granted job allocation 43027948
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn3109 are ready for job
srun: error: x11: no local DISPLAY defined, skipping
[teacher@cn3109 smrtanalysis]$ module load smrtanalysis
[+] Loading smrtanalysis 25.3
[teacher@cn3109 ~]$ mkdir /data/$USER/smrtanalysis
[teacher@cn3109 ~]$ cd !$
[teacher@cn3109 smrtanalysis]$ pbcromwell configure
[WARNING] 2024-04-14 21:24:09,447Z [pbcromwell.cli] No database port specified - will run with in-memory DB
[teacher@cn3109 smrtanalysis]$ ls
cromwell.conf
[teacher@cn3109 smrtanalysis]$ pbcromwell show-workflows


cromwell.workflows.pb_ccs: Circular Consensus Sequencing (CCS)
cromwell.workflows.pb_demux_ccs: Demultiplex Barcodes
cromwell.workflows.pb_export_ccs: Export Reads
cromwell.workflows.pb_align_ccs: HiFi Mapping
cromwell.workflows.pb_isoseq: Iso-Seq Analysis
cromwell.workflows.pb_mark_duplicates: Mark PCR Duplicates
cromwell.workflows.pb_microbial_analysis: Microbial Genome Analysis
pacbio.workflows.pb_puretarget_re_panel_v2: PureTarget repeat expansion
cromwell.workflows.pb_segment_reads: Read Segmentation
cromwell.workflows.pb_segment_reads_and_isoseq: Read Segmentation and Iso-Seq
cromwell.workflows.pb_segment_reads_and_sc_isoseq: Read Segmentation and Single-Cell Iso-Seq
cromwell.workflows.pb_sc_isoseq: Single-Cell Iso-Seq
cromwell.workflows.pb_target_enrichment_v2: Target Enrichment
cromwell.workflows.pb_trim_adapters: Trim Ultra-Low Adapters
cromwell.workflows.pb_undo_demux: Undo Demultiplexing
cromwell.workflows.pb_variant_calling: Variant Calling

Run 'pbcromwell show-workflow-details ' to display further
information about a workflow.  Note that the cromwell.workflows.
prefix is optional.

The full SMRT Tools documentation for this command and PacBio
analysis workflows is available online:
https://www.pacb.com/support/documentation

[teacher@cn3109 smrtanalysis]$ pbcromwell run pb_align_ccs \
 --entry $SMRT_DATA/canneddata/ecoli_tiny/*.consensusreadset.xml \
 --entry $SMRT_DATA/canneddata/referenceset/ecoli_pbi_Jan2021_majorStrain/ecoli_pbi_Jan2021_majorStrain.referenceset.xml \
 --config $PWD/cromwell.conf \
 --nproc $SLURM_CPUS_PER_TASK \
 --output-dir sat 

Batch job
Most jobs should be run as batch jobs.

While pbcromwell can be run with the slurm backend to submit jobs while the supervisor process runs in an interactive session, we recommend queuing the supervisor process itself. Create a batch input file (e.g. smrtanalysis.sh) as follows:

#!/bin/bash
set -e

module load smrtanalysis

### cromwell configuration
pbcromwell configure
# The generated cromwell.conf cannot be used as is. The following command makes the necessary adjustments.
sed -i -r \
  -e 's%(sbatch|scancel|squeue)%/usr/local/slurm/bin/\1%' `# use absolute paths for slurm commands (slurm directory is purged from $PATH by PacBio wrapper)` \
  -e '/job-id-regex/s/(job-id-regex\s+=\s+).*/\1 "(\\\\d+)"/' `# NIH HPC's sbatch has non-standard output` \
  cromwell.conf

### run workflow
pbcromwell run pb_align_ccs \
 --entry $SMRT_DATA/canneddata/ecoli_tiny/*.consensusreadset.xml \
 --entry $SMRT_DATA/canneddata/referenceset/ecoli_pbi_Jan2021_majorStrain/ecoli_pbi_Jan2021_majorStrain.referenceset.xml \
 --config $PWD/cromwell.conf \
 --backend slurm \
 --output-dir sat 

Submit this job using the Slurm sbatch command, requesting a walltime of the expected run length of the whole pipeline.

sbatch [--time=#] smrtanalysis.sh