plastid is a Python library for genomics and sequencing. It includes tools for exploratory data analysis (EDA) as well as a handful of scripts that implement common tasks.
plastid differs from other packages in its design goals. Namely:
its intended audience includes both bench and computational biologists. We tried to make it easy to use, and wrote lots of Tutorials
It is designed for analyses in which data at each position within a gene or transcript are of interest, such as analysis of ribosome profiling data. To this end, plastid
- uses Read mapping functions to extract the biology of interest from read alignments – e.g. in the case of ribosome profiling, a ribosomal P-site, in DMS-seq, sites of nucleotide modification, et c. – and turn these into quantitative data, usually numpy arrays of counts at each nucleotide position in a transcript.
- encapsulates multi-segment features, such as spliced transcripts, as single objects. This facilitates many common tasks, such as converting coordinates between genome and feature-centric spaces.
It separates data from its representation on disk by providing consistent interfaces to many of the various file formats, found in the wild.
It is designed for expansion to new or unknown assays. Frequently, writing a new mapping rule is sufficient to enable all of plastid‘s tools to interpret data coming from a new assay.
plastid was written by Joshua Dunn in Jonathan Weissman’s lab at UCSF. Versions of it have been used in several publications ([DFB+13][FRJ+15]).
Create a batch input file (e.g. plastid.sh). For example:
#!/bin/bash cd /data/$USER/dir module load plastid metagene ......
Then submit the file on biowulf
sbatch plastid.sh
Please read user guide for more sbatch options
Useful utilities for job monitoring and debugging
Create a swarmfile (e.g. plastid.swarm). For example:
# this file is called plastid.swarm cd dir1; megagene command cd dir2; metagene command cd dir3; metagene command [...]
Submit this job using the swarm command.
biowulf >$ swarm -f plastid.swarm --module plastid
More options for swarm job can be viewed here.
[USER@biowulf ~]$ sinteractive salloc.exe: Pending job allocation 15194042 salloc.exe: job 15194042 queued and waiting for resources salloc.exe: job 15194042 has been allocated resources salloc.exe: Granted job allocation 15194042 salloc.exe: Waiting for resource configuration salloc.exe: Nodes cn1719 are ready for job [USER@cn1719 ~]$ module load plastid [USER@cn1719 ~]$ metagene command ....