SMRT Analysis on Biowulf

SMRT® Analysis is a bioinformatics software suite available for analysis of DNA sequencing data from Pacific Biosciences’ SMRT technology. Users can choose from a variety of analysis protocols that utilize PacBio® and third-party tools. Analysis protocols include de novo genome assembly, cDNA mapping, DNA base-modification detection, and long-amplicon analysis to determine phased consensus sequences.

Documentation
Important Notes

Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

This is a sample interactive session of the lambda phage site acceptance test done on the local node using pbcromwell. (user input in bold):

[teacher@biowulf ~]$ sinteractive --cpus-per-task=12
salloc.exe: Pending job allocation 43027948
salloc.exe: job 43027948 queued and waiting for resources
salloc.exe: job 43027948 has been allocated resources
salloc.exe: Granted job allocation 43027948
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn3109 are ready for job
srun: error: x11: no local DISPLAY defined, skipping
[teacher@cn3109 smrtanalysis]$ module load smrtanalysis
[+] Loading smrtanalysis 8.0.0.79519
[teacher@cn3109 ~]$ mkdir /data/$USER/smrtanalysis
[teacher@cn3109 ~]$ cd !$
[teacher@cn3109 smrtanalysis]$ pbcromwell configure
[WARNING] 2024-04-14 21:24:09,447Z [pbcromwell.cli] No database port specified - will run with in-memory DB
[teacher@cn3109 smrtanalysis]$ ls
cromwell.conf
[teacher@cn3109 smrtanalysis]$ pbcromwell show-workflows


    cromwell.workflows.pb_detect_methyl: 5mC CpG Detection
    cromwell.workflows.pb_ccs: Circular Consensus Sequencing (CCS)
    cromwell.workflows.pb_demux_ccs: Demultiplex Barcodes
    cromwell.workflows.pb_export_ccs: Export Reads
    cromwell.workflows.pb_assembly_hifi: Genome Assembly
    cromwell.workflows.pb_align_ccs: HiFi Mapping
    cromwell.workflows.pb_target_enrichment: HiFi Target Enrichment
    cromwell.workflows.pb_sars_cov2_kit: HiFiViral SARS-CoV-2 Analysis
    cromwell.workflows.pb_isoseq: Iso-Seq Analysis
    cromwell.workflows.pb_mark_duplicates: Mark PCR Duplicates
    cromwell.workflows.pb_microbial_analysis: Microbial Genome Analysis
    cromwell.workflows.pb_puretarget_re_panel: PureTarget repeat expansion
    cromwell.workflows.pb_segment_reads: Read Segmentation
    cromwell.workflows.pb_segment_reads_and_isoseq: Read Segmentation and Iso-Seq
    cromwell.workflows.pb_segment_reads_and_sc_isoseq: Read Segmentation and Single-Cell Iso-Seq
    cromwell.workflows.pb_sc_isoseq: Single-Cell Iso-Seq
    cromwell.workflows.pb_sv_ccs: Structural Variant Calling
    cromwell.workflows.pb_trim_adapters: Trim Ultra-Low Adapters
    cromwell.workflows.pb_undo_demux: Undo Demultiplexing
    cromwell.workflows.pb_variant_calling: Variant Calling

    Run 'pbcromwell show-workflow-details ' to display further
    information about a workflow.  Note that the cromwell.workflows.
    prefix is optional.

    The full SMRT Tools documentation for this command and PacBio
    analysis workflows is available online:
    https://www.pacb.com/support/documentation


[teacher@cn3109 smrtanalysis]$ pbcromwell run pb_align_subreads \
 --entry /fdb/smrtanalysis/canneddata/lambdaTINY/m54026_181219_010936_tiny.subreadset.xml \
 --entry /fdb/smrtanalysis/canneddata/referenceset/lambdaNEB/referenceset.xml \
 --config $PWD/cromwell.conf \
 --nproc $SLURM_CPUS_PER_TASK \
 --output-dir sat 

Batch job
Most jobs should be run as batch jobs.

While pbcromwell can be run with the slurm backend to submit jobs while the supervisor process runs in an interactive session, we recommend queuing the supervisor process itself. Create a batch input file (e.g. smrtanalysis.sh) as follows:

#!/bin/bash
set -e

module load smrtanalysis

### cromwell configuration
pbcromwell configure
# The generated cromwell.conf cannot be used as is. The following command makes the necessary adjustments.
sed -i -r \
  -e 's%(sbatch|scancel|squeue)%/usr/local/slurm/bin/\1%' `# use absolute paths for slurm commands (slurm directory is purged from $PATH by PacBio wrapper)` \
  -e '/job-id-regex/s/(job-id-regex\s+=\s+).*/\1 "(\\\\d+)"/' `# NIH HPC's sbatch has non-standard output` \
  cromwell.conf

### run workflow
pbcromwell run pb_align_subreads \
 --entry /fdb/smrtanalysis/canneddata/lambdaTINY/m54026_181219_010936_tiny.subreadset.xml \
 --entry /fdb/smrtanalysis/canneddata/referenceset/lambdaNEB/referenceset.xml \
 --config $PWD/cromwell.conf \
 --backend slurm \
 --output-dir sat 

Submit this job using the Slurm sbatch command, requesting a walltime of the expected run length of the whole pipeline.

sbatch [--time=#] smrtanalysis.sh