Plastid on NIH HPC Systems

plastid is a Python library for genomics and sequencing. It includes tools for exploratory data analysis (EDA) as well as a handful of scripts that implement common tasks.

plastid differs from other packages in its design goals. Namely:

  • its intended audience includes both bench and computational biologists. We tried to make it easy to use, and wrote lots of Tutorials

  • It is designed for analyses in which data at each position within a gene or transcript are of interest, such as analysis of ribosome profiling data. To this end, plastid

    • uses Read mapping functions to extract the biology of interest from read alignments – e.g. in the case of ribosome profiling, a ribosomal P-site, in DMS-seq, sites of nucleotide modification, et c. – and turn these into quantitative data, usually numpy arrays of counts at each nucleotide position in a transcript.
    • encapsulates multi-segment features, such as spliced transcripts, as single objects. This facilitates many common tasks, such as converting coordinates between genome and feature-centric spaces.
  • It separates data from its representation on disk by providing consistent interfaces to many of the various file formats, found in the wild.

  • It is designed for expansion to new or unknown assays. Frequently, writing a new mapping rule is sufficient to enable all of plastid‘s tools to interpret data coming from a new assay.

plastid was written by Joshua Dunn in Jonathan Weissman’s lab at UCSF. Versions of it have been used in several publications ([DFB+13][FRJ+15]).

On Helix

Sample session:

[USER@helix ~]$ module load plastid
[USER@helix ~]$ metagene -h
usage: metagene [-h] {generate,count,chart} ...

Batch job on Biowulf

Create a batch input file (e.g. plastid.sh). For example:


cd /data/$USER/dir
module load plastid
metagene ......

Then submit the file on biowulf

sbatch plastid.sh

Swarm of Jobs on Biowulf

Create a swarmfile (e.g. plastid.swarm). For example:

# this file is called plastid.swarm
cd dir1; megagene command 
cd dir2; metagene command 
cd dir3; metagene command

Submit this job using the swarm command.

biowulf >$ swarm -f plastid.swarm --module plastid

Interactive job on Biowulf
Allocate an interactive session. Sample session:
[USER@biowulf ~]$ sinteractive 
salloc.exe: Pending job allocation 15194042
salloc.exe: job 15194042 queued and waiting for resources
salloc.exe: job 15194042 has been allocated resources
salloc.exe: Granted job allocation 15194042
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn1719 are ready for job

[USER@cn1719 ~]$ module load plastid

[USER@cn1719 ~]$ metagene command ....