High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
SMRT Analysis on Biowulf

SMRT® Analysis is a bioinformatics software suite available for analysis of DNA sequencing data from Pacific Biosciences’ SMRT technology. Users can choose from a variety of analysis protocols that utilize PacBio® and third-party tools. Analysis protocols include de novo genome assembly, cDNA mapping, DNA base-modification detection, and long-amplicon analysis to determine phased consensus sequences.

There may be multiple versions of smrtanalysis available. An easy way of selecting the version is to use modules. To see the modules available, type

module avail smrtanalysis

To select a module, type

module load smrtanalysis/[ver]

where [ver] is the version of choice.

Environment variables set:

Example: Lambda Phage Resequencing

This is a sample session of the lambda phage site acceptance test done using pbsmrtpipe, PacBio's workflow manager:

[teacher@biowulf ~]$ sinteractive --cpus-per-task=12
salloc.exe: Pending job allocation 43027948
salloc.exe: job 43027948 queued and waiting for resources
salloc.exe: job 43027948 has been allocated resources
salloc.exe: Granted job allocation 43027948
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn3109 are ready for job
srun: error: x11: no local DISPLAY defined, skipping
[teacher@cn3109 smrtanalysis]$ module load smrtanalysis
[+] Loading smrtanalysis
[teacher@cn3109 ~]$ cd /data/$USER
[teacher@cn3109 smrtanalysis]$ pbsmrtpipe pipeline-id pbsmrtpipe.pipelines.sa3_sat \
-e eid_subread:$SMRT_HOME/canneddata/lambdaTINY/m150404_101626_42267_c100807920800000001823174110291514_s1_p0.subreadset.xml \
-e eid_ref_dataset:$SMRT_HOME/canneddata/referenceset/lambdaNEB/referenceset.xml \
--preset-xml $SMRT_HOME/canneddata/preset_local.xml \
-o lambda
Registry Loaded. Number of ToolContracts:158 FileTypes:62 ChunkOperators:21 Pipelines:53
successfully initialized datastore.
Command: /usr/local/apps/smrtanalysis/ pipeline-id pbsmrtpipe.pipelines.sa3_sat -e eid_subread:/usr/local/apps/smrtanalysis/ -e eid_ref_dataset:/usr/local/apps/smrtanalysis/ --preset-xml /usr/local/apps/smrtanalysis/ -o lambda
Entry Points:

{   'eid_ref_dataset': '/usr/local/apps/smrtanalysis/',
    'eid_subread': '/usr/local/apps/smrtanalysis/'}
Workflow Options:

{   'pbsmrtpipe.options.chunk_mode': False,
    'pbsmrtpipe.options.cluster_manager': None,
    'pbsmrtpipe.options.debug_mode': False,
    'pbsmrtpipe.options.distributed_mode': False,
    'pbsmrtpipe.options.exit_on_failure': False,
    'pbsmrtpipe.options.max_nchunks': 12,
    'pbsmrtpipe.options.max_nproc': 12,
    'pbsmrtpipe.options.max_nworkers': 100,
    'pbsmrtpipe.options.max_total_nproc': 56,
    'pbsmrtpipe.options.progress_status_url': None,
    'pbsmrtpipe.options.tmp_dir': '/tmp'}
Task Options:

{   'genomic_consensus.task_options.algorithm': 'plurality',
    'genomic_consensus.task_options.diploid': False,
    'pbalign.task_options.algorithm_options': '--minMatch 12 --bestn 10 --minPctSimilarity 70.0 --refineConcordantAlignments',
    'pbalign.task_options.concordant': True}
completed setting up job directory resources and logs in /spin1/users/teacher/smrtanalysis/lambda
successfully created job resources.
starting to execute Distributed workflow with assigned job_id 308415
system Linux cn3109 nproc:56
exe'ing workflow Cluster renderer None
Service URI: None  (fqdn = cn3109.hpc.nih.gov)
Max number of Chunks  12 
Max number of nproc   12
Max number of workers 100
tmp dir               /tmp
Unable to find any chunkable tasks from 0 chunk operators.
validating binding graph
successfully validated binding graph.
pbsmrtpipe main process pid=35895 pgroupid=35895 ppid=33105
Starting worker pbcoretools.tasks.filterdataset-0 (1 workers running, 1 total proc in use)
Task was successful TaskResult(task_id='pbcoretools.tasks.filterdataset-0', state='successful', error_message='', run_time_sec=3.67)
Successfully validated outputs of 
Workflow status 3/15 completed/total tasks
Starting worker pbalign.tasks.pbalign-0 (1 workers running, 12 total proc in use)
Task was successful TaskResult(task_id='pbalign.tasks.pbalign-0', state='successful', error_message='', run_time_sec=27.69)
Successfully validated outputs of 
Workflow status 4/15 completed/total tasks
Starting worker pbalign.tasks.consolidate_alignments-0 (1 workers running, 1 total proc in use)
Starting worker pbreports.tasks.summarize_coverage-0 (2 workers running, 2 total proc in use)
Starting worker genomic_consensus.tasks.variantcaller-0 (3 workers running, 14 total proc in use)
Starting worker pbreports.tasks.mapping_stats-0 (4 workers running, 15 total proc in use)
Task was successful TaskResult(task_id='pbalign.tasks.consolidate_alignments-0', state='successful', error_message='', run_time_sec=3.64)
Successfully validated outputs of 
Workflow status 15/15 completed/total tasks
Shutting down.
Completed execution pbsmrtpipe v0.44.8. Workflow was Successful in 67.97 sec (1.13 min) with exit code 0