High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
Plastid on NIH HPC Systems

plastid is a Python library for genomics and sequencing. It includes tools for exploratory data analysis (EDA) as well as a handful of scripts that implement common tasks.

plastid differs from other packages in its design goals. Namely:

  • its intended audience includes both bench and computational biologists. We tried to make it easy to use, and wrote lots of Tutorials

  • It is designed for analyses in which data at each position within a gene or transcript are of interest, such as analysis of ribosome profiling data. To this end, plastid

    • uses Read mapping functions to extract the biology of interest from read alignments – e.g. in the case of ribosome profiling, a ribosomal P-site, in DMS-seq, sites of nucleotide modification, et c. – and turn these into quantitative data, usually numpy arrays of counts at each nucleotide position in a transcript.
    • encapsulates multi-segment features, such as spliced transcripts, as single objects. This facilitates many common tasks, such as converting coordinates between genome and feature-centric spaces.
  • It separates data from its representation on disk by providing consistent interfaces to many of the various file formats, found in the wild.

  • It is designed for expansion to new or unknown assays. Frequently, writing a new mapping rule is sufficient to enable all of plastid‘s tools to interpret data coming from a new assay.

plastid was written by Joshua Dunn in Jonathan Weissman’s lab at UCSF. Versions of it have been used in several publications ([DFB+13][FRJ+15]).

On Helix

Sample session:


[USER@helix ~]$ module load plastid
[USER@helix ~]$ metagene -h
usage: metagene [-h] {generate,count,chart} ...

Batch job on Biowulf

Create a batch input file (e.g. plastid.sh). For example:

#!/bin/bash

cd /data/$USER/dir
module load plastid
metagene ......

Then submit the file on biowulf

sbatch plastid.sh

Please read user guide for more sbatch options

Useful utilities for job monitoring and debugging

Swarm of Jobs on Biowulf

Create a swarmfile (e.g. plastid.swarm). For example:

# this file is called plastid.swarm
cd dir1; megagene command 
cd dir2; metagene command 
cd dir3; metagene command
[...]

Submit this job using the swarm command.

biowulf >$ swarm -f plastid.swarm --module plastid

More options for swarm job can be viewed here.

Interactive job on Biowulf
Allocate an interactive session. Sample session:
[USER@biowulf ~]$ sinteractive 
salloc.exe: Pending job allocation 15194042
salloc.exe: job 15194042 queued and waiting for resources
salloc.exe: job 15194042 has been allocated resources
salloc.exe: Granted job allocation 15194042
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn1719 are ready for job

[USER@cn1719 ~]$ module load plastid

[USER@cn1719 ~]$ metagene command ....
Documentation