High-Performance Computing at the NIH
Scientific Applications on NIH HPC Systems

The NIH HPC staff maintains a large number of scientific programs, packages and databases for our users. Below is a list of system-installed software available on Biowulf and Helix. Click on the application name to get to site-specific instructions on how to run a given package on the cluster, including links to the original application documentation.

In almost all cases, applications are made available through the use of environment modules.

Acemd (3212u1)

ACEMD is a high performance molecular dynamics code for biomolecular systems designed specifically for NVIDIA GPUs. Simple and fast, ACEMD uses very similar commands and input files of NAMD and output files as NAMD or Gromacs.

AMBER (16)

AMBER (Assisted Model Building with Energy Refinement) is a package of molecular simulation programs.

APBS (1.5)

APBS (Adaptive Poisson-Boltzmann Solver) is a software package for the numerical solution of the Poisson-Boltzmann equation (PBE), one of the most popular continuum models for describing electrostatic interactions between molecular solutes in salty, aqueous media.

Autodock (4.2.6)

Autodock is a suite of automated docking tools. It is designed to predict how small molecules, such as substrates or drug candidates, bind to a receptor of known 3D structure.

AutodockVina (1_1_2)

AutoDock Vina is a program for drug discovery, molecular docking and virtual screening, offering multi-core capability, high performance and enhanced accuracy and ease of use. It is closely tied to Autodock.

cactvs (

Cactvs is a general-purpose toolkit for chemical information processing. Its special strengths are a very powerful scripting environment with special Web support features, very good 2D structure layout and rendering functions, a rich set of high-quality I/O modules, extreme extensibility by means of external modules and data definitions, and a powerful lazy computation and data validity maintenance mechanism.

CHARMM (c39b2)

CHARMM is a general and flexible software application for modeling the structure and behavior of molecular systems.

GAMESS (20Apr17-R1-sockets)

GAMESS is a general ab initio quantum chemistry package.

Gaussian (G09 E.01)

Gaussian is a connected system of programs for performing semiempirical and ab initio molecular orbital (MO) calculations.

Gromacs (2016.1)

Gromacs is a versatile package to perform molecular dynamics, i.e. simulate the Newtonian equations of motion for systems with hundreds to millions of particles. It is primarily designed for biochemical molecules like proteins and lipids that have a lot of complicated bonded interactions, but since GROMACS is extremely fast at calculating the nonbonded interactions (that usually dominate simulations) many groups are also using it for research on non-biological systems, e.g. polymers.

LOOS (2.3.1)

LOOS (Lightweight Object-Oriented Structure library) is a code library for developing new molecular dynamics analysis applications. It also has a large number of stand-alone tools for manipulating and analyzing trajectories and molecules.

NAMD (2.10)

NAMD is a parallel molecular dynamics program for UNIX platforms designed for high-performance simulations in structural biology. VMD, the associated molecular visualization program, is also available on both Helix and Biowulf.

Psi4 (1.1)

Psi4 is an ab-initio electronic structure code that supports various methods for calculating energies and gradients of molecular systems.

Q-Chem (4.3)

Q-Chem is a comprehensive ab initio quantum chemistry package for accurate predictions of molecular structures, reactivities, and vibrational, electronic and NMR spectra.

Schrodinger (2017.1)

A limited number of Schrödinger applications are available on the Biowulf cluster through the Molecular Modeling Interest Group. Most are available through the Maestro GUI.

VinaLC (1.1.2)

A very popular PC-based molecular docking program, AutoDock Vina, was modified and parallelized, using an MPI and multithreading hybrid scheme, and potentially can be used in the future on exascale machines, without sacrificing accuracy. The resulting program scales up to more than 15K CPUs with a very low overhead cost.

VMD (1.9.3)

VMD is a molecular visualization program for displaying, animating, and analyzing large biomolecular systems using 3-D graphics and built-in scripting. To use, type vmd at the prompt.

caffe (1.0.0)

Caffe is a deep learning framework made with expression, speed, and modularity in mind. It is developed by the Berkeley Vision and Learning Center (BVLC) and by community contributors.

cuDNN (5.0.5)

The NVIDIA CUDA Deep Neural Network library (cuDNN) is a GPU-accelerated library of primitives for deep neural networks. cuDNN provides highly tuned implementations for standard routines such as forward and backward convolution, pooling, normalization, and activation layers.

digits (5.0)

DIGITS simplifies common deep learning tasks such as managing data, designing and training neural networks on multi-GPU systems, monitoring performance in real time with advanced visualizations, and selecting the best performing model from the results browser for deployment. DIGITS is completely interactive so that data scientists can focus on designing and training networks rather than programming and debugging.

meka (1.9.1)

A Multi-label Extension to WEKA

3DSlicer (4.4.0)

A software platform for the analysis (including registration and interactive segmentation) and visualization (including volume rendering) of medical images and for research in image guided therapy.

AFNI (current)

AFNI (Analysis of Functional NeuroImages) is a set of C programs for processing, analyzing, and displaying functional MRI (FMRI) data - a technique for mapping human brain activity.

ANTs (2.1.0)

Advanced Normalization Tools (ANTs) extracts information from complex datasets that include imaging. Paired with ANTsR (answer), ANTs is useful for managing, interpreting and visualizing multidimensional data.

Bsoft (1.9.0)

Bsoft is a collection of programs and a platform for development of software for image and molecular processing in structural biology. Problems in structural biology are approached with a highly modular design, allowing fast development of new algorithms without the burden of issues such as file I/O. It provides an easily accessible interface, a resource that can be and has been used in other packages.

Caret (5.65)

Caret is a free, open-source, software package for structural and functional analyses of the cerebral and cerebellar cortex. It is largely deprecated by the Connectome Workbench but is needed for the full functionality of that program.

conn (17e)

CONN is a Matlab-based cross-platform software for the computation, display, and analysis of functional connectivity in fMRI (fcMRI).

Tools to browse, download, explore, and analyze data from the Human Connectome Project (HCP). Allows users to compare their own data to that of the HCP.

CTF (5.2.1)

The CTF MEG software has two main roles: - Provide a human-machine interface to the CTF MEG elec- tronics to collect MEG and/or EEG data. - Provide a tool for reviewing and (to a limited extent) ana- lyzing the MEG and/or EEG data acquired by the CTF MEG system.

ctffind (4.1.5)

Programs for finding CTFs of electron micrographs

cvu (0.5.2)

The Connectome Visualization Utility is a free and open source software project designed for the visualization of multi-modal, abstract brain networks.

elastix (4.8)

a toolbox for rigid and nonrigid registration of images.

EMAN (1.9)

EMAN is a suite of scientific image processing tools aimed primarily at the transmission electron microscopy community, though it is beginning to be used in other fields as well.

EMAN2 (2.2)

EMAN2 is a broadly based greyscale scientific image processing suite with a primary focus on processing data from transmission electron microscopes.

Fiji (1.51n)

Fiji Is Just ImageJ. It is a distribution of ImageJ (and ImageJ2) together with Java, Java3D and a lot of plugins.

Frealign (9.11_151031)

Frealign is a program for high-resolution refinement of 3D reconstructions from cryo-EM images of single particles.

Freesurfer (6.0.0)

Freesurfer is a set of automated tools for reconstruction of the brain's cortical surface from structural MRI data, and overlay of functional MRI data onto the reconstructed surface.

FSL (5.0.10)

FSL is a comprehensive library of image analysis and statistical tools for FMRI, MRI and DTI brain imaging data.

FSL_FIX (1.06)

FIX attempts to auto-classify ICA components into good vs bad components, so that the bad components can be removed from the 4D FMRI data. It is related to the FSL suite of image analysis tools.

Huygens (17.04.0-p6)

Huygens is an image restoration, deconvolution, resolution and noise reduction. It can process images from all current optical microscopes, including wide-field, confocal, Nipkow (scanning disk confocal), multiple-photon, and 4Pi microscopes.

IMOD (4.9.4)

IMOD is a set of image processing, modeling and display programs used for tomographic reconstruction and for 3D reconstruction of EM serial sections and optical sections.

MIPAV (7.2.0)

The MIPAV (Medical Image Processing, Analysis, and Visualization) application enables quantitative analysis and visualization of medical images of numerous modalities such as PET, MRI, CT, or microscopy.

MotionCor2 (01302017)

MotionCor2 is a multi-GPU accelerated program that provides iterative, patch-based motion detection combining spatial and temporal constraints and dose weighting for both single particle and tomographic cryo-electon microscopy images.

PEET (1-10-1)

PEET (Particle Estimation for Electron Tomography) is an open-source package for aligning and averaging particles in 3-D subvolumes extracted from tomograms. It seeks the optimal alignment of each particle against a reference volume through several iterations. If PEET and IMOD are both installed, most PEET operations are available from the eTomo graphical user interface in IMOD.

RELION (2.0.6)

RELION (for REgularised LIkelihood OptimisatioN) is a stand-alone computer program for Maximum A Posteriori refinement of (multiple) 3D reconstructions or 2D class averages in cryo-electron microscopy.

ResMap (1.1.4)

ResMap (Resolution Map) is a Python (NumPy/SciPy) application with a Tkinter GUI and a command-line interface. It is a software package for computing the local resolution of 3D density maps studied in structural biology, primarily electron cryo-microscopy (cryo-EM).

SimNIBS (2.0.1)

SimNIBS 2.0 is a free software package for the Simulation of Non-invasive Brain Stimulation. It allows for realistic calculations of the electric field induced by transcranial magnetic stimulation (TMS) and transcranial direct current stimulation (tDCS).

situs (2.7.2)

Situs is a package for the modeling of atomic resolution structures into low-resolution density maps e.g. from electron microscopy, tomography, or small angle X-ray scattering.

SPM Standalone (12-v91)

The SPM software package has been designed for the analysis of brain imaging data sequences. The sequences can be a series of images from different cohorts, or time-series from the same subject. The current release is designed for the analysis of fMRI, PET, SPECT, EEG and MEG. This compiled version does not require a MATLAB license.

TORTOISE (2.5.0)

(Tolerably Obsessive Registration and Tensor Optimization Indolent Software Ensemble) The TORTOISE software package is for processing diffusion MRI data.

ViewBS (20161203)

Tools for exploring and visualizing bisulfite sequencing (BS-seq) data.

AdmixTools (4.1)

ADMIXTOOLS is a software package that supports formal tests of whether admixture occurred, and makes it possible to infer admixture proportions and dates.

AMOS (3.1.0)

AMOS is a collection of tools and class interfaces for the assembly of DNA reads. The package includes a robust infrastructure, modular assembly pipelines, and tools for overlapping, consensus generation, contigging, and assembly manipulation.


ARACNE (Algorithm for the Reconstruction of Accurate Cellular Networks), a novel algorithm, using microarray expression profiles, specifically designed to scale up to the complexity of regulatory networks in mammalian cells, yet general enough to address a wider range of network deconvolution problems.

bali-phy (2.3.8)

BAli-Phy is MCMC software developed by Ben Redelings with Marc Suchard for simultaneous Bayesian estimation of alignment and phylogeny (and other parameters). It handles generic Bayesian modeling via probabilistic programming.

Beagle (4.1)

Beagle is a package for imputing genotypes, inferring haplotype phase, and performing genetic association analysis. BEAGLE is designed to analyze large-scale data sets with hundreds of thousands of markers genotyped on thousands of samples.

BEAST (2.4.4)

BEAST (Bayesian Evolutionary Analysis Sampling Trees) is a cross-platform program for Bayesian MCMC analysis of molecular sequences.

CD-HIT (4.6.1)

CD-HIT is a very widely used program for clustering and comparing protein or nucleotide sequences.

eigensoft (6.1.4)

The EIGENSOFT package combines functionality from population genetics methods and EIGENSTRAT stratification correction method.

fastphylo (r131)

Fast tools for phylogenetics

FastTree (2.1.9)

FastTree infers approximately-maximum-likelihood phylogenetic trees from alignments of nucleotide or protein sequences. FastTree can handle alignments with up to a million of sequences in a reasonable amount of time and memory. For large alignments, FastTree is 100-1,000 times faster than PhyML 3.0 or RAxML 7.

freebayes (1.1.0)

Bayesian haplotype-based polymorphism discovery and genotyping

GCTA (1.26.0)

GCTA (Genome-wide Complex Trait Analysis) is designed to estimate the proportion of phenotypic variance explained by genome- or chromosome-wide SNPs for complex traits.

LDpred (0.6)

LDpred is a Python based software package that adjusts GWAS summary statistics for the effects of linkage disequilibrium (LD).

loki (2.4.7_4)

Loki is a linkage analysis package, primarily for large and complex pedigrees, which uses Markov chain Monte Carlo (MCMC) techniques to avoid many of the computational problems that prevent exact computational methods being used for large pedigrees.

malder (1.0)

MALDER is a Modified version of ALDER that has been modified to allow multiple admixture events. ALDER computes the weighted linkage disequilibrium (LD) statistic for making inference about population admixture described in: Loh P-R, Lipson M, Patterson N, Moorjani P, Pickrell JK, Reich D, and Berger B. Inferring Admixture Histories of Human Populations Using Linkage Disequilibrium. Genetics, 2013.

merlin (1.1.2)

MERLIN uses sparse trees to represent gene flow in pedigrees and is one of the fastest pedigree analysis packages around.

mothur (1.39.5)

mothur is a tool for analyzing 16S rRNA gene sequences generated on multiple platforms as part of microbial ecology projects.

Pascal (2016-01-25)

Pascal (Pathway scoring algorithm) is a program for calculating gene score and pathway score p-values from GWAS-summary statistics.

pedcut (1.19)

A program for cutting complex pedigree into computable sub-pedigrees with user-specified MaxBit size

phase (2.1.1)

infers haplotypes from population genotype data

Phylip (3.696)

Phylip is a package of programs for inferring phylogenies (evolutionary trees). Includes methods for parsimony, distance matrix and likelihood methods.

pplacer (1.1)

Pplacer places query sequences on a fixed reference phylogenetic tree to maximize phylogenetic likelihood or posterior probability according to a reference alignment. Pplacer is designed to be fast, to give useful information about uncertainty, and to offer advanced visualization and downstream analysis.

QIIME (1.9.1)

QIIME is an open source software package for comparison and analysis of microbial communities, primarily based on high-throughput amplicon sequencing data (such as SSU rRNA) generated on a variety of platforms, but also supporting analysis of other types of data (such as shotgun metagenomic data).

RAxML (8.2.4)

RAxML-VI-HPC (randomized axelerated maximum likelihood for high performance computing) is a sequential and parallel program for inference of large phylogenies with maximum likelihood (ML).

shapeit (2.r837)

SHAPEIT is a fast and accurate haplotype inference software

snphylo (20160204)

SNPhylo is a pipeline to generate a phylogenetic tree from huge SNP data.

Solar (8.2.0)

SOLAR is a program for multipoint, oligogenic, variance component linkage analysis in pedigrees of arbitrary size and complexity (Almasy L; Blangero J, 1998).

SPINGO (1.3)

SPINGO is a flexible and stand-alone software dedicated to high-resolution assignment of sequences to species level using partial 16S rRNA gene sequences from any environment.

STAMP (2.1.3)

STAMP is a software package for analyzing taxonomic or metabolic profiles that promotes ‘best practices’ in choosing appropriate statistical techniques and reporting results.

treemix (1.12)

TreeMix is a method for inferring the patterns of population splits and mixtures in the history of a set of populations.

JUMPg (2.3.1)

JUMPg is a proteogenomics software pipeline for analyzing large mass spectrometry (MS) and functional genomics datasets. The pipeline includes customized database building, tag-based database search, peptide-spectrum match filtering, and data visualization.

Mascot (2.5)

The Mascot search engine uses mass spectrometry data to identify proteins from primary sequence databases. Mascot searches can be run directly on the NIH Mascot server at http://biospec.nih.gov, or by using the Mascot daemon on your own desktop PC.

proteowizard (3.0.6994)

The ProteoWizard Library and Tools are a set of modular and extensible open-source, cross-platform tools and software libraries that facilitate proteomics data analysis.

armadillo (6.200.5)

Armadillo is an open source C++ linear algebra library that aims to have a good balance between speed and ease of use. The library supports integer, floating point and complex numbers, as well as a subset of trigonometric and statistics functions. Various matrix decompositions are provided through optional integration with LAPACK, or one its high-performance drop-in replacements, such as MKL from Intel or ACML from AMD.

Comsol (5.3.a)

The COMSOL Multiphysics engineering simulation software environment facilitates all steps in the modeling process − defining your geometry, meshing, specifying your physics, solving, and then visualizing your results.


IBM ILOG CPLEX provides flexible, high-performance mathematical programming solvers for linear programming, mixed integer programming, quadratic programming, and quadratically constrained programming problems. It was originally a C implementation of simplex, but has since grown to include other algorithms. It includes APIs for C++, C#, Java, Python, MATLAB, and Microsoft Excel as well as an IDE based on Eclipse.

ESS (Emacs Speaks Statistics)

Emacs mode for interactive statistical programming and data analysis. Languages supported: the S family (S, S-PLUS and R), SAS, BUGS/JAGS, Stata and XLispStat.

GAUSS (10)

The GAUSS Mathematical and Statistical System is an easy-to-use data analysis environment based on the fast and powerful GAUSS Matrix Programming Language designed for computationally intensive tasks.

graph-tool (2.22)

Graph-tool is an efficient Python module for manipulation and statistical analysis of graphs (a.k.a. networks).

IDL/ENVI (8.4/5.2)

IDL and ENVI are a complete computing environment for the interactive analysis and visualization of data. IDL integrates an array-oriented language with mathematical analysis and graphical display techniques. ENVI is designed for extracting information from geospatial and medical imagery.

IMSL (7.1.0)

IMSL is a widely used library of mathematical and statistical routines in Fortran.

Mathematica (11.0)

Mathematica is an interactive system for doing mathematical computation. It performs numerical, symbolic and graphical computations, and incorporates a high-level programming language.

Matlab (

MATLAB is a high-performance interactive software package for scientific and engineering numeric computation. MATLAB integrates numerical analysis, matrix computation, signal processing, and graphics in an environment where problems and solutions are expressed just as they are written mathematically.

Meep (1.2.1)

Meep (or MEEP) is a free finite-difference time-domain (FDTD) simulation software package developed at MIT to model electromagnetic systems, along with the MPB eigenmode package.

Octave (4.0.3)

GNU Octave is a high-level language, primarily intended for numerical computations. It provides a convenient command line interface for solving linear and nonlinear problems numerically, and for performing other numerical experiments using a language that is mostly compatible with Matlab.

PARI/GP (2.7.4)

PARI/GP is a widely used computer algebra system designed for fast computations in number theory (factorizations, algebraic number theory, elliptic curves...). It also contains a large number of functions for computing with matrices, polynomials, power series, algebraic numbers etc., and a lot of transcendental functions.

R (3.4.0)

R (the R Project) is a language and environment for statistical computing and graphics. R is similar to S, and provides a wide variety of statistical and graphical techniques (linear and nonlinear modelling, statistical tests, time series analysis, classification, clustering, ...).

Rstudio (0.98)

RStudio is a set of integrated tools designed to help you be more productive with R. It includes a console, syntax-highlighting editor that supports direct code execution, as well as tools for plotting, history, debugging and workspace management.

S-Plus (8.0)

S-PLUS is an object-oriented language for data analysis, with many functions for statistical, numerical and graphical techniques.

SAS (9.4)

Base SAS provides a scalable, integrated software environment specially designed for data access, transformation and reporting.

Wolfram Workbench provides sophisticated code editing, navigation, and project management tools for enterprise-class development and deployment. Built on Eclipse, it is specialized for Mathematica and other Wolfram technologies.

Chimera (1.11.2)

Chimera is a highly extensible program for interactive visualization and analysis of molecular structures and related data, including density maps, supramolecular assemblies, sequence alignments, docking results, trajectories, and conformational ensembles.

Coot (0.8.1)

Coot is for macromolecular model building, model completion and validation, particularly suitable for protein modelling using X-ray data.

Cytoscape (3.3.0)

Cytoscape is an open source software platform for visualizing molecular interaction networks and biological pathways and integrating these networks with annotations, gene expression profiles and other state data.

lammps (30Jul16)

LAMMPS is a classical molecular dynamics code, and an acronym for Large-scale Atomic/Molecular Massively Parallel Simulator. It runs on a variety of different computer systems, including single processor systems, distributed-memory machines with MPI, and GPU and Xeon Phi systems. LAMMPS is open source software, released under the GNU General Public License.

OpenBabel (2.4.0)

Open Babel is a chemical toolbox designed to speak the many languages of chemical data.

Psi4 (1.1)

Psi4 is an ab-initio electronic structure code that supports various methods for calculating energies and gradients of molecular systems.

Rosetta (2017.13)

The Rosetta++ software suite can perform de novo protein structure predictions, identify low free energy sequences for target protein backbones, predict the structure of a protein-protein complex from the individual structures of the monomer components, incorporate NMR data into the basic Rosetta protocol to accelerate the process of NMR structure prediction, and more...

Schrodinger (2017.1)

A limited number of Schrödinger applications are available on the Biowulf cluster through the Molecular Modeling Interest Group. Most are available through the Maestro GUI.

TINKER (7.1.2)

TINKER molecular modeling software is a complete and general package for molecular mechanics and dynamics, with some special features for biopolymers. TINKER has the ability to use any of several common parameter sets, such as Amber, CHARMM, Allinger MM, OPLS, Merck Molecular Force Field, Liam Dang's polarizable model, and the AMOEBA polarizable atomic multipole force field.

VMD (1.9.3)

VMD is a molecular visualization program for displaying, animating, and analyzing large biomolecular systems using 3-D graphics and built-in scripting. To use, type vmd at the prompt.

abruijn (1.0)

ABruijn is a assembler for long reads from, for example, PacBio and Oxford Nanopore Technologies sequencers.

abyss (2.0.2)

Abyss represents Assembly By Short Sequences - a de novo, parallel, paired-end sequence assembler. The parallel version is implemented using MPI and is capable of assembling larger genomes.

annogesic (0.5.6)

ANNOgesic is a transcriptome annotation pipeline for RNA-seq.

ANNOVAR (2017-07-16)

ANNOVAR is an efficient software tool to utilize update-to-date information to functionally annotate genetic variants detected from diverse genomes.

apt (1.19.0)

apt - Affymetrix Power Tools - is a set of cross-platform command line programs that implement algorithms for analyzing and working with Affymetrix GeneChipR arrays.

ascatNgs (3.0.2)

AscatNGS contains the Cancer Genome Projects workflow implementation of the ASCAT copy number algorithm for paired end sequencing.

asciigenome (1.0.0)

ASCIIGenome is a text-only command line genome browser.

Athlates (2014-04-26)

ATHLATES is a software package for determining HLA genotypes for individuals from Illumina exome sequencing data.

bam-matcher (2016-05-16)

A tool for determining whether two BAM files were sequenced from the same sample or individual.

bam2fastq (1.1.0)

This tool is used to extract raw sequences (with qualities) from bam files.

bam2mpg (1.0.1)

The program “bam2mpg” calls genotypes from sequence reads of haploid or diploid DNA aligned to a closely-related reference sequence. The program reads alignments in BAM format (http://samtools.sourceforge.net). The MPG (Most Probable Genotype) algorithm is based on a Bayesian model which simulates sampling from one or two alleles with sequencing error, and then calculates the likelihood of each possible genotype given the observed sequence data.

bamliquidator is a set of tools for analyzing the density of short DNA sequence read alignments in the BAM file format.

bamreadcount (0.7.4)

Bam-readcount generates metrics at single nucleotide positions. There are number of metrics generated which can be useful for filtering out false positive calls.

bamsurgeon (2015.03.30)

Tools for adding mutations to existing .bam files; used for testing mutation callers

bamtools (2.4.1)

BamTools provides a fast, flexible C++ API & toolkit for reading, writing, and manipulating BAM files.

bamUtil (1.0.13)

bamUtil is a repository that contains several programs that perform operations on SAM/BAM files. All of these programs are built into a single executable, bam.

basespace_cli (0.8.12)

Command line interface for Illumina's BaseSpace

bbtools (37.36)

An extensive set of bioinformatics tools including bbmap (short read aligner), bbnorm (kmer based normalization), dedupe (deduplication and clustering of unaligned reads), reformat (formatting and trimming reads) and many more.

bcbio-nextgen (1.0.1)

Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis

bcl2fastq (2.19.0)

a tool to handle bcl conversion and demultiplexing

bedops (2.4.26)

Bedops is a suite of tools to address common questions raised in genomic studies - mostly with regard to overlap and proximity relationships between data sets - BEDOPS aims to be scalable and flexible, facilitating the efficient and accurate analysis and management of large-scale genomic data.

bedtools (2.26.0-38-gf3db04e)

The BEDTools utilities allow one to address common genomics tasks such finding feature overlaps and computing coverage. In addition, one can develop sophisticated pipelines that answer complicated research questions by "streaming" several BEDTools together.

bfast (0.7.0a)

BFAST facilitates the fast and accurate mapping of short reads to reference sequences, where mapping billions of short reads with variants is of utmost importance.

bfast+bwa (0.7.0a)

BFAST facilitates the fast and accurate mapping of short reads to reference sequences, where mapping billions of short reads with variants is of utmost importance.

bfc (v1)

High-performance error correction for Illumina resequencing data

bioawk (1.0)

Regular awk with support for several common biological data formats, including optionally gzip'ed BED, GFF, SAM, VCF, FASTA/Q and TAB-delimited formats with column names.

biom-format (2.1.5)

tool (and library) to manipulate Biological Observation Matrix (BIOM) Format files

bismark (0.17.0)

Bismark is a program to map bisulfite treated sequencing reads to a genome of interest and perform methylation calls in a single step. The output can be easily imported into a genome viewer, such as SeqMonk, and enables a researcher to analyse the methylation levels of their samples straight away.

bison (62bf61f7)

BISON is a bisulfite-converted short-read aligner that can natively utilize high-performance computing clusters to increase speed.

bowtie (1.2.0)

bowtie is an ultrafast, memory-efficient short read aligner geared toward quickly aligning large sets of short DNA sequences (reads) to large genomes.

bowtie2 (2.3.2)

A version of bowtie that's particularly good at aligning reads of about 50 up to 100s or 1,000s of characters, and particularly good at aligning to relatively long (e.g. mammalian) genomes

breakdancer (1.4.5)

provides genome-wide detection of structural variants from next generation paired-end sequencing reads.

breakseq (2.2)

Ultrafast and accurate nucleotide-resolution analysis of structural variants

BreakTrans (0.0.6)

BreakTrans maps predicted gene fusions to genomic structural rearrangements so as to validate both types of events and provide them mechanism/functional interpretation.

bsmap (2.90)

BSMAP is a short reads mapping software for bisulfite sequencing reads. Bisulfite treatment converts unmethylated Cytosines into Uracils (sequenced as Thymine) and leave methylated Cytosines unchanged, hence provides a way to study DNA cytosine methylation at single nucleotide resolution. BSMAP aligns the Ts in the reads to both Cs and Ts in the reference.

BuddySuite is a set of command-line tools for common biological data file manipulations.

bwa (0.7.15)

BWA is a fast light-weighted tool that aligns short sequences to a sequence database, such as the human reference genome.

Canu (1.5)

Canu is a fork of the Celera Assembler designed for high-noise single-molecule sequencing (such as the PacBio RSII or Oxford Nanopore MinION). Canu will correct the reads, then trim suspicious regions (such as remaining SMRTbell adapter), then assemble the corrected and cleaned reads into unitigs.

Canvas (1.25)

Canvas is a tool for calling copy number variants (CNVs) from human DNA sequencing data.

casper (0.8.2)

CASPER (Context-Aware Scheme for Paired-End Read) is state-of-the art merging tool in terms of accuracy and robustness. Using this sophisticated merging method, we could get elongated reads from the forward and reverse reads.

ceas (1.0.2)

Cis-regulatory Element Annotation System is a tool designed to characterize genome-wide protein-DNA interaction patterns from ChIP-chip and ChIP-Seq of both sharp and broad binding factors.

cellranger (2.0.0)

Cell Ranger is a set of analysis pipelines that processes Chromium single cell 3’ RNA-seq output to align reads, generate gene-cell matrices and perform clustering and gene expression analysis.

CGAT (0.2.5)

CGAT is a collection of tools for the computational genomicist written in the python language.

cgatools (

The Complete Genomics Analysis Tools (cgatools) is an open source project to provide tools for downstream analysis of Complete Genomics data. The general areas of functionality include genome comparison, format conversion, and reference tools.

changeo (0.3.2)

Change-O is a collection of tools for analyzing immunoglobulin sequences.

ChromHMM (1.12)

ChromHMM is software for learning and characterizing chromatin states.

ChunkChromosome (2012-08-28)

ChunkChromosome is a helper utility for minimac and MaCH. It can be used to facilitate analyses of very large datasets in overlapping slices.

circexplorer (1.1.10)

A combined strategy to identify circular RNAs (circRNAs and ciRNAs)

circleseq (1.0)

Circleseq takes sample-specific paired end FASTQ files as input and produces a list of CIRCLE-seq detected off-target cleavage sites as output.

circos (0.69-5)

Circos is a program for the generation of publication-quality, circularly composited renditions of genomic data and related annotations. Circos is particularly suited for visualizing alignments, conservation and intra and inter-chromosomal relationships. Also, Circos is useful to visualize any type of information that benefits from a circular layout. Thus, although it has been designed for the field of genomics, it is sufficiently flexible to be used in other data domains.

clark (1.2.3)

A method based on a supervised sequence classification using discriminative k-mers

clinEff (1.0c)

ClinEff is a professional version of the SnpEff and SnpSift packages, suitable for production in clincal labs. ClinEff combines the flexibility of multiple SnpEff/SnpSift commands with simplicity of running one program to perform all the annotations at once (i.e. in a single pass). It is highly customizable and can be taylored to specific pipeline needs in Clinical production environments.

cnvkit (0.8.5)

Copy number variant detection from targeted DNA sequencing

cnvnator (0.3.3)

CNVnator is a tool for CNV discovery and genotyping from depth of read mapping.

coltron (1.0.2)

Coltron is an application designed to build transcriptional regulatory networks.

conifer (0.2.2)

CoNIFER uses exome sequencing data to find copy number variants (CNVs) and genotype the copy-number of duplicated genes.

conpair (10102016)

Concordance and contamination estimator for tumor–normal pairs

contest (1.0.24530)

ContEst is a tool (and method) for estimating the amount of cross-sample contamination in next generation sequencing data. Using a Bayesian framework, contamination levels are estimated from array based genotypes and sequencing reads.

CONTRA (2.0.6)

CONTRA is a tool for copy number variation (CNV) detection for targeted resequencing data such as those from whole-exome capture data.

CREST (1.0.1)

CREST (Clipping Reveals Structure) is an algorithm for detecting genomic structural variations at base-pair resolution using next-generation sequencing data.

crispresso (1.0.5)

Software pipeline for the analysis of CRISPR-Cas9 genome editing outcomes from sequencing data

crossmap (0.2.6)

CrossMap is a program for convenient conversion of genome coordinates between different assemblies (e.g. mm9->mm10). It can convert SAM, BAM, bed, GTF, GFF, wig/bigWig, and VCF files

csvkit (0.9.1)

csvkit is a suite of command-line tools for converting to and working with CSV, the king of tabular file formats.

cufflinks (2.2.1)

Cufflinks assembles transcripts and estimates their abundances in RNA-Seq samples. It accepts aligned RNA-Seq reads and assembles the alignments into a parsimonious set of transcripts. Cufflinks then estimates the relative abundances of these transcripts based on how many reads support each one.

cutadapt (1.14)

cutadapt removes adapter sequences from DNA high-throughput sequencing data. This is usually necessary when the read length of the machine is longer than the molecule that is sequenced, such as in microRNA data.

danpos (2.2.2)

A toolkit for Dynamic Analysis of Nucleosome and Protein Occupancy by Sequencing, version 2

datamash (1.1.1)

datamash is a command-line program which performs basic numeric,textual and statistical operations on input textual data files.

deeptools (

deepTools is a suite of user-friendly tools for the visualization, quality control and normalization of data from deep-sequencing DNA sequencing experiments.

defuse (0.8.0)

deFuse is a software package for gene fusion discovery using RNA-Seq data. The software uses clusters of discordant paired end alignments to inform a split read alignment analysis for finding fusion boundaries. The software also employs a number of heuristic filters in an attempt to reduce the number of false positives and produces a fully annotated output for each predicted fusion.

delly (0.7.7)

DELLY is an integrated structural variant prediction method that can detect deletions, tandem duplications, inversions and translocations at single-nucleotide resolution in short-read massively parallel sequencing data. It uses paired-ends and split-reads to sensitively and accurately delineate genomic rearrangements throughout the genome.

Discovar (52488)

DISCOVAR is a new genome assembler and variant caller for state-of-the-art data. Currently it takes as input Illumina reads of length 250 or longer -- produced on MiSeq or HiSeq 2500 -- and from a single PCR-free library.

DosageConvertor is a C++ tool to convert dosage files (in VCF format) from Minimac3 to ther formats such as MaCH or PLINK.

dropseq (1.12)

Drop-seq is a technology that allows biologists to analyze genome-wide gene expression in thousands of individual cells in a single experiment.

drseq (2.0.2)

Dr.seq is a QC pipeline for Drop-seq data

ea-utils (r822)

Command-line tools for processing biological sequencing data. Barcode demultiplexing, adapter trimming, etc.

EPACTS (3.2.6)

EPACTS (Efficient and Parallelizable Association Container Toolbox) is a versatile software pipeline to perform various statistical tests for identifying genome-wide association from sequence data through a user-friendly interface, both to scientific analysts and to method developers.

express (1.5.1)

eXpress is a streaming tool for quantifying the abundances of a set of target sequences from sampled subsequences.

fastqc (0.11.5)

It provide quality control functions to next gen sequencing data.

fastqtools (0.8)

fastq-tools a collection of small and efficient programs for performing some common and uncommon tasks with FASTQ files.

fastq_screen (0.11.1)

FastQ Screen allows you to screen a library of sequences in FastQ format against a set of sequence databases so you can see if the composition of the library matches with what you expect.

fastStructure is an algorithm for inferring population structure from large SNP genotype data

fastxtoolkit (0.0.14)

The FASTX-Toolkit is a collection of command line tools for Short-Reads FASTA/FASTQ files preprocessing.

fcgene (1.0.7)

FCgene is a Format Converting tool for genotyped Data (e.g.PLINK-MACH,MACH-PLINK)

Flexbar (2.5.0)

Flexbar preprocesses high-throughput sequencing data efficiently. It demultiplexes barcoded runs and removes adapter sequences. Moreover, trimming and filtering features are provided. Flexbar increases read mapping rates and improves genome and transcriptome assemblies. It supports next-generation sequencing data in fasta and fastq format, e.g. from Illumina and the Roche 454 platform

freec (9.5)

Control-FREEC is a tool for detection of copy-number changes and allelic imbalances (including LOH) using deep-sequencing data

fusioncatcher (0.99.7b)

FusionCatcher searches for novel/known somatic fusion genes, translocations, and chimeras in RNA-seq data (paired-end or single-end reads from Illumina NGS platforms like Solexa/HiSeq/NextSeq/MiSeq) from diseased samples.

fusionmap (2015-03-31)

FusionMap is an efficient fusion aligner which aligns reads spanning fusion junctions directly to the genome without prior knowledge of potential fusion regions.

FusionQ (v5)

A novel approach for gene fusion detection and quantification from paired-end RNA-Seq

fusionseq (0.7.0)

A computational framework to identify fusion transcripts from paired-end RNA-Seq data.

GATK (3.7)

GATK, from the Broad Institute, is a structured software library that makes writing efficient analysis tools using next-generation sequencing data very easy, and second it's a suite of tools for working with human medical resequencing projects such as 1000 Genomes and The Cancer Genome Atlas. These tools include things like a depth of coverage analyzers, a quality score recalibrator, a SNP/indel caller and a local realigner.

gdc-client (1.2.0)

The GDC Data Transfer Tool provides an optimized method of transferring data to and from the GDC, and enables resumption of interrupted transfers.

gem (3.0)

High resolution peak calling and motif discovery for ChIP-seq and ChIP-exo data

Gemini (0.20.0)

GEMINI (GEnome MINIng) is designed to be a flexible framework for exploring genetic variation in the context of the wealth of genome annotations available for the human genome. By placing genetic variants, sample genotypes, and useful genome annotations into an integrated database framework, GEMINI provides a simple, flexible, yet very powerful system for exploring genetic variation for for disease and population genetics.

GEMMA (0.95)

GEMMA is the software implementing the Genome-wide Efficient Mixed Model Association algorithm for a standard linear mixed model and some of its close relatives for genome-wide association studies (GWAS).

GeneTorrent (3.8.6)

GeneTorrent is a client application for downloading BAM files from the Cancer Genetics Hub (CGHub) of UC Santa Cruz.

The Genome Browser Mirror Fragments at Helix Systems is a mirror of the UCSC Genome Browser. The URL is https://hpcnihapps.cit.nih.gov/genome. Users can also access the MySQL databases, supporting files directly, and a huge number of associated executables.

genometools (1.5.9)

collection of bioinformatic tools

GiniClust (2017-03-22)

GiniClust is a clustering method implemented in Python and R for detecting rare cell-types from large-scale single-cell gene expression data.

Gistic (2.0.22)

Facilitates sensitive and confident localization of the targets of focal somatic copy-number alteration in human cancers.

glu (1.0b3)

GLU is a framework and a software package that was designed to store, clean, and analyze data generated by whole-genome or candidate gene association scans.

gmap-gsnap (2017-06-20)

A Genomic Mapping and Alignment Programs

gossamer (ac492a8 )

A tool for de novo assembly of high throughput sequencing data.

gqt (1.0)

Genotype Query Tools (GQT) is command line software and a C API for indexing and querying large-scale genotype data sets like those produced by 1000 Genomes, the UK100K, and forthcoming datasets involving millions of genomes.

gtool (0.7.5)

GTOOL is a program for transforming sets of genotype data for use with the programs SNPTEST and IMPUTE. GTOOL can be used to (a) generate subsets of genotype data, (b) to convert genotype data between the PED file format and the FILE FORMAT used by SNPTEST and IMPUTE.

hap.py (0.3.7)

A set of programs based on htslib to benchmark variant calls against gold standard truth datasets.

hgvs (1.0.0)

The hgvs package provides a Python library to facilitate the use of genome, transcript, and protein variants that are represented using the Human Genome Variation Society (varnomen) recommendations. To use, type module load hgvs prior to calling python.

hifive (1.3)

Tools for handling HiC and 5C data

hisat (2.0.5)

HISAT is a fast and sensitive spliced alignment program which uses Hierarchical Indexing for Spliced Alignment of Transcripts.

hiseq (2.3.20-4)

HiSeq Analysis Software provides rapid and easy alignment and variant calling for Whole Human Genomes or libraries prepared with the Nextera Rapid Capture (NRC) exome enrichment kit.

HLA-PRG-LA (f0833ed)

Stands for HLA PRG, linear approximation. The basic idea is to seed graph alignments with linear alignments to the sequences that the graph consists of.

homer (4.8.2)

HOMER (Hypergeometric Optimization of Motif EnRichment) is a suite of tools for Motif Discovery and ChIP-Seq analysis.

hotnet2 (1.0.1)

HotNet2 is an algorithm for finding significantly altered subnetworks in a large gene interaction network.

hotspot (4.1.0)

Hotspot is a program for identifying regions of local enrichment of short-read sequence tags mapped to the genome using a binomial distribution model.

htgts (v2)

High-Throughput Genome-Wide Translocation Sequencing pipeline

htseq (0.7.2)

HTSeq is a Python package that provides infrastructure to process data from high-throughput sequencing assays.

humann2 (0.9.4)

HUMAnN is a pipeline for efficiently and accurately profiling the presence/absence and abundance of microbial pathways in a community from metagenomic or metatranscriptomic sequencing data (typically millions of short DNA/RNA reads).

IDR (2.0.3)

The IDR (Irreproducible Discovery Rate) framework is a unified approach to measure the reproducibility of findings identified from replicate experiments and provide highly stable thresholds based on reproducibility. The IDR method compares a pair of ranked lists of identifications (such as ChIP-seq peaks).

IGV (2.3.94)

The Integrative Genomics Viewer is a high-performance visualization tool for interactive exploration of large, integrated genomic datasets.

IGVTools (2.3.94)

IGVTools provides utilities for working with ascii file formats used by the Integrated Genome Viewer. The files can be sorted, tiled, indexed, and counted.

IMPUTE (2.3.2)

Impute is a program for estimating ("imputing") unobserved genotypes in SNP association studies.

integrative (default)

Software Pipeline for Integrative Genetic Association Analysis: Probabilistic Assessment of Enrichment and Colocalization

intervene (0.5.8)

a tool for intersection and visualization of multiple genomic region sets


iSAAC is an ultrafast DNA sequence aligner (Isaac Genome Alignment Software) that takes advantage of high-memory hardware (>48 GB) and variant caller (Isaac Variant Caller)

iva (1.0.2)

IVA is a de novo assembler designed to assemble virus genomes that have no repeat sequences, using Illumina read pairs sequenced from mixed populations at extremely high and variable depth.


JAMM is a peak finder for NGS datasets (ChIP-Seq, ATAC-Seq, DNase-Seq..etc.) that can integrate replicates and assign peak boundaries accurately. JAMM is applicable to both broad and narrow datasets.

jannovar (0.12)

Jannovar: A Java library for Exome Annotation. Jannovar is a stand-alone Java application as well as a Java library designed to be used in larger software frameworks for exome analysis.

Juicer (1.5)

A One-Click System for Analyzing Loop-Resolution Hi-C Experiments

KMC (2.3.0)

KMC is a disk-based programm for counting k-mers from (possibly gzipped) FASTQ/FASTA files

kneaddata (0.5.4)

KneadData is a tool designed to perform quality control on metagenomic and metatranscriptomic sequencing data, especially data from microbiome experiments.

kraken (0.10.5-beta)

Kraken is a system for assigning taxonomic labels to short DNA sequences, usually obtained through metagenomic studies

laser (2.04)

LASER is a program to estimate individual ancestry by directly analyzing shotgun sequence reads without calling genotypes.

lefse (1.0.7)

LEfSe (Linear discriminant analysis Effect Size) determines the features (organisms, clades, operational taxonomic units, genes, or functions) most likely to explain differences between classes by coupling standard tests for statistical significance with additional tests encoding biological consistency and effect relevance.

lobstr (4.0.6)

a short tandem repeat profiler for next generation sequencing data

locuszoom (1.3)

LocusZoom is designed to facilitate viewing of local association results together with useful information about a locus, such as the location and orientation of the genes it includes, linkage disequilibrium coefficients and local estimates of recombination rates

LongRanger (2.1.3)

Long Ranger is a set of analysis pipelines that processes GemCode sequencing output to align reads and call and phase SNPs, indels, and structural variants Loupe is a genome browser designed to visualize the Linked-Read data produced by the 10x Chromium Platform.

lumpy (0.2.13)

A probabilistic framework for structural variant discovery.

Mach 1.0 is a Markov Chain based haplotyper. It can be resolve long haplotypes or infer missing genotypes in samples of unrelated individuals.

mach2dat (1.024)

mach2dat performs logistic regression, using imputed SNP dosage data and adjusting for covariates.

mach2qtl (1.1.3)

mach2qtl uses dosages/posterior probabilities inferred with MACH as predictors in a linear regression to test association with a quantitative trait

macs (

Model-based Analysis of ChIP-Seq (MACS) on short reads sequencers such as Genome Analyzer (Illumina / Solexa). MACS empirically models the length of the sequenced ChIP fragments, which tends to be shorter than sonication or library construction size estimates, and uses it to improve the spatial resolution of predicted binding sites. MACS also uses a dynamic Poisson distribution to effectively capture local biases in the genome sequence, allowing for more sensitive and robust prediction.

mafft (7.305)

Multiple alignment program for amino acid or nucleotide sequences

mageck-vispr (0.5.3)

MAGeCK-VISPR is a comprehensive quality control, analysis and visualization workflow for CRISPR/Cas9 screens.

MAJIQ (0.9.2)

Modeling Alternative Junction Inclusion Quantification. MAJIQ and Voila are two software packages that together define, quantify, and visualize local splicing variations (LSV) from RNA-Seq data.

manta (1.1.0)

Structural variant and indel caller for mapped sequencing data

mapDamage (2.0.6)

mapDamage profiles DNA damage patterns in next-generation sequencing analyses of ancient DNA samples.

mapsplice (2.1.8)

Accurate mapping of RNA-seq reads for splice junction discovery

maq (0.7.1)

MAQ is a software package that builds mapping assemblies from short reads generated by the next-generation sequencing machines. It is particularly designed for Illumina-Solexa 1G Genetic Analyzer, and has preliminary functions to handle ABI SOLiD data.

mash (1.1.1)

mash is a command line tool and library to provide fast genome and metagenome distance estimation using MinHash. Only command line tool is installed

mats (3.2.5)

MATS is a computational tool to detect differential alternative splicing events from RNA-Seq data.

MEGAN (6.6.6)

MEtaGenome ANalyzer that takes a file of reads and a Blast output from comparison against a reference genome, and automatically calculate a taxonomic classification of the reads and if desired, a functional classification.

metal (2011-03-25)

The METAL software is designed to facilitate meta-analysis of large datasets (such as several whole genome scans) in a convenient, rapid and memory efficient manner.

metAMOS (1.5rc3)

metAMOS is an integrated assembly and analysis pipeline for metagenomic data.

methylflow (0.1.0-pre)

Cell-specific methylation pattern reconstruction

methylQA (0.1.8)

methylQA is a methylation sequencing data quality assessment tool for MeDIP-seq and MRE-seq. It provides basic mapping status of next generating sequencing data, like number of total reads, number of mapped reads, etc. It also provides CpG status information such as how many CpG have been covered by one experiment, how many times one CpG have been covered, etc. methylQA can also process general ChIP-seq data like Histone/TF ChIP-seq data, generate read density and mapping statistics.

migec (1.2.4a)

Molecular Identifier Guided Error Correction pipeline (MIGEC)

minimac (3)

minimac is a low memory, computationally efficient implementation of the MaCH algorithm for genotype imputation. It is designed to work on phased genotypes and can handle very large reference panels with hundreds or thousands of haplotypes. 'mini' refers to the low amount of computational resources it needs.

miranda (3.3a)

an algorithm for finding genomic targets for microRNAs

mirdeep2 (

miRDeep2 is a completely overhauled tool which discovers microRNA genes by analyzing sequenced RNAs.

miso (0.5.3)

MISO (Mixture-of-Isoforms) is a probabilistic framework that quantitates the expression level of alternatively spliced genes from RNA-Seq data

mixcr (2.1.3)

MiXCR is a universal software for fast and accurate analysis of T- and B- cell receptor repertoire sequencing data.

mono (

mosaik (2.2.30)

MOSAIK is a reference-guided assembler that can work with FASTA,FASTQ,Illumina Bustard & Gerald, or SRF file formats and outputs phrap ace and GigaBayes gig formats.

MToolbox (1.0)

A bioinformatics pipeline aimed at the analysis of mitochondrial DNA (mtDNA) in high throughput sequencing studies.

multiqc (1.1)

aggregates results for various frequently used bioinformatics tools across multiple samples into a nice visual report

multiSNV (2.3)

multiSNV is a tool for calling somatic single-nucleotide variants (SNVs) using NGS data from a normal and multiple tumour samples of the same patient. Instead of performing multiple pairwise analyses of a single tumour sample and its matched normal, multiSNV jointly considers all available samples under a Bayesian framework to increase sensitivity of calling shared SNVs. multiSNV accepts BAM files (one BAM file for each sample) and produces a single VCF file with variant predictions for all samples.

muTect (1.1.7)

MuTect is a method developed at the Broad Institute for the reliable and accurate identification of somatic point mutations in next generation sequencing data of cancer genomes.

MutSig (1.41)

MutSig analyzes lists of mutations discovered in DNA sequencing, to identify genes that were mutated more often than expected by chance given background mutation processes.

NestedMICA (0.8.0)

NestedMICA is a method for discovering over-represented short motifs in large sets of strings. Typical applications include finding candidate transcription factor binding sites in DNA sequences.

netMHC (4.0a)

Prediction of peptide-MHC class I binding using artificial neural networks (ANNs).

ngCGH (0.4.4)

Tools for producing pseudo-cgh of next-generation sequencing data

ngsplot (2.61)

ngsplot is an easy-to-use global visualization tool for next-generation sequencing data.

ngsqctoolkit (2.3.3)

A toolkit for the quality control (QC) of next generation sequencing (NGS) data.

NGSUtils is a suite of software tools for working with next-generation sequencing datasets.

novocraft (3.07.01)

Package includes aligner for single-ended and paired-end reads from the Illumina Genome Analyser. Novoalign finds global optimum alignments using full Needleman-Wunsch algorithm with affine gap penalties.

nucleoatac (0.3.4)

package calling nucleosome positions and occupancy using ATAC-Seq data

oases (0.2.1)

oases is a de novo transcriptome assembler based on the Velvet genome assembler core.

pandaseq (2.10)

PANDASEQ is a program to align Illumina reads, optionally with PCR primers embedded in the sequence, and reconstruct an overlapping sequence.

parpipe (current)

Complete analysis pipeline for PAR-CLIP data

PartekFlow (

Web interface designed specifically for the analysis needs of next generation sequencing applications including RNA, small RNA, and DNA sequencing.

PartekGS (6.6-6.16.0812)

Partek GS provides rigorous and easy-to-use statistical tests for differential expression of genes or exons, and a flexible and powerful statistical test to detect alternative splicing based on a powerful mixed model analyis of variance.

pbsuite (15.8.24)

The PBSuite contains two projects created for analysis of Pacific Biosciences long-read sequencing data: PBHoney and PBJelly. PBHoney is an implementation of two variant-identification approaches designed to exploit the high mappability of long reads (i.e., greater than 10,000 bp). PBHoney considers both intra-read discordance and soft-clipped tails of long reads to identify structural variants. PBJelly is a highly automated pipeline that aligns long sequencing reads (such as PacBio RS reads or long 454 reads in fasta format) to high-confidence draft assembles. PBJelly fills or reduces as many captured gaps as possible to produce upgraded draft genomes.

pear (0.9.6)

PEAR is an ultrafast, memory-efficient and highly accurate pair-end read merger. It is fully parallelized and can run with as low as just a few kilobytes of memory.

peddy (0.2.9)

peddy is used to compare sex and familial relationships given in a PED file with those inferred from a VCF file

penncnv (1.0.4)

kilobase-resolution detection of copy number variations (CNVs) from Illumina high-density SNP genotyping data

PePr (1.1.18)

PePr is a ChIP-Seq Peak-calling and Prioritization pipeline that uses a sliding window approach and models read counts across replicates and between groups with a negative binomial distribution.

picard (2.9.2)

Picard comprises Java-based command-line utilities that manipulate SAM files, and a Java API (SAM-JDK) for creating new programs that read and write SAM files. Both SAM text format and SAM binary (BAM) format are supported.

picrust (1.1.0)

PICRUSt is a bioinformatics software package designed to predict metagenome functional content from marker gene (e.g., 16S rRNA) surveys and full genomes.

PileOMeth (0.1.13)

PileOMeth (a temporary name derived due to it using a PILEup to extract METHylation metrics) will process a coordinate-sorted and indexed BAM or CRAM file containing some form of BS-seq alignments and extract per-base methylation metrics from them. PileOMeth requires an indexed fasta file containing the reference genome as well.

pindel (0.2.5b8)

Pindel can detect breakpoints of large deletions, medium sized insertions, inversions, tandem duplications and other structural variants at single-based resolution from next-gen sequence data. It uses a pattern growth approach to identify the breakpoints of these variants from paired-end short reads.

plastid (0.4.8)

Position-wise analysis of sequencing and genomics data

Platypus (0.8.1)

tool for variant-detection in high-throughput sequencing data.

plink (1.9.0-beta4.4)

PLINK is whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.

plinkseq (0.10)

library for the analysis of genetic variation data

popins (a4c6566)

A method for discovering and genotyping novel sequence insertions. PopIns takes as input a reads-to-reference alignment, assembles unaligned reads using a standard assembly tool, merges the contigs of different individuals into high-confidence sequences, anchors the merged sequences into the reference genome, and finally genotypes all individuals for the discovered insertions.

poretools (0.6.1a1)

Poretools is a toolkit for manipulating and exploring nanopore sequencing data sets. Poretools operates on individual FAST5 files, directory of FAST5 files, and tar archives of FAST5 files.

preseq (2.0.3)

predicting library complexity and genome coverage in high-throughput sequencing

presto (0.5.4)

A bioinformatics toolkit for processing high-throughput lymphocyte receptor sequencing data.

PRINSEQ (0.20.4)

PRINSEQ is a tool that generates summary statistics of sequence and quality data and that is used to filter, reformat and trim next-generation sequence data. It is particular designed for 454/Roche data, but can also be used for other types of sequence data.

probabel (0.5.0)

ProbABEL is a Tool for genome-wide association analysis of imputed genetic data. It was designed to perform such regression in fast, memory-efficient and consequently genome-wide feasible manner. Currently, ProbABEL implements linear, logistic regression, and Cox proportional hazards models.

PROVEAN (1.1.5)

PROVEAN (Protein Variation Effect Analyzer) is a software tool which predicts whether an amino acid substitution or indel has an impact on the biological function of a protein.

pseudogenome (current)

Pseudogenome tools are a suite of tools that simplify the incorporation of our pseudogenomes into standard analysis and hiseq pipelines. It includes modtools, lapsels, and suspenders.

pvacseq (4.0.3)

pVAC-Seq offers epitope binding predictions for missense, inframe indel, and frameshift mutations.

pyclone (0.13.0-b1)

PyClone is statistical model and software tool designed to infer the prevalence of point mutations in heterogeneous cancer samples.

pyDNase (0.2.4)

pyDNase is a suite of tools for analysing DNase-seq data - pyDNase comes with several analysis scripts covering several common use cases of DNase-seq analysis, and also an implementation of the Wellington, Wellington 1D, and Wellington-boostrap footprinting algorithms.

PyLOH (1.4.3)

Deconvolving tumor purity and ploidy by integrating copy number alterations and loss of heterozygosity

qctool (1.4)

QCTOOL is a command-line utility program for basic quality control of gwas datasets.

QoRTs (1.1.6)

The QoRTs software package is a fast, efficient, and portable multifunction toolkit designed to assist in the analysis, quality control, and data management of RNA-Seq datasets.

qualimap (2.2)

A platform-independednt application written in Java and R that provides both a GUI and a co mmand-line interface to facilitate the quality control of alignment sequencing data.

To detect rare or de novo copy number alterations in normal DNA samples.

quast (4.5)

QUAST stands for QUality ASsessment Tool. The tool evaluates genome assemblies by computing various metrics. The package includes the general QUAST tool for genome assemblies, MetaQUAST, the extension for metagenomic datasets, and Icarus, interactive visualizer for these tools.

rail-rna (0.2.3.b)

Spliced RNA-Seq aligner designed to take advantage of data from multiple samples.

raremetal (4.13.8)

RAREMETAL is a computationally efficient tool for meta-analysis of rare variants using sequencing or genotyping array data.

READemption (0.4.3)

RNA-Seq pipeline including alignment, coverage tracks, quantitation, and differential expression analysis.

reditools (1.0.4)

REDItools are python scripts developed with the aim to study RNA editing at genomic scale by next generation sequencing data.

RepeatMasker (4.0.7)

RepeatMasker is a program that screens DNA sequences for interspersed repeats and low complexity DNA sequences. The output of the program is a detailed annotation of the repeats that are present in the query sequence as well as a modified version of the query sequence in which all the annotated repeats have been masked (default: replaced by Ns). On average, almost 50% of a human genomic DNA sequence currently will be masked by the program.

rgt (0.9.9)

Regulatory Genomics Toolbox: Python library and set of tools for the integrative analysis of high throughput regulatory genomics data. http://www.regulatory-genomics.org

rilseq (0.49)

RILseq computational protocol

rmats (3.1.0)

MATS is a computational tool to detect differential alternative splicing events from RNA-Seq data.

rnaseqc (1.1.8)

RNA-SeQC is a java program which computes a series of quality control metrics for RNA-seq data.

roche454 (2.9)

The Roche 454 tools (GS Data Analysis Software package) includes the tools to investigate complex genomic variation in samples including de novo assembly, reference guided alignment and variant calling, and low abundance variant identification and quantification.

RSD (1.1.7)

Reciprocal Smallest Distance (RSD) is a pairwise orthology algorithm that uses global sequence alignment and maximum likelihood evolutionary distance between sequences to accurately detects orthologs between genomes.

rsem (1.3.0 )

RSEM is a software package for estimating gene and isoform expression levels from RNA-Seq data.

rseqc (2.6.4)

Rseqc comprehensively evaluate RNA-seq datasets generated from clinical tissues or other well annotated organisms such as mouse, fly and yeast.

rvtests (20141006)

Rare Variant tests is a flexible software package for genetic association studies. It is designed to support unrelated individual or related (family-based) individuals

sailfish (0.10.0)

Sailfish is a tool for transcript quantification from RNA-seq data. It requires a set of target transcripts (either from a reference or de-novo assembly) to quantify. All that is needed to run sailfish is a fasta file containing your reference transcripts and a (set of) fasta/fastq file(s) containing your RNA-Seq reads.

salmon (0.8.2)

a tool for quantifying the expression of transcripts using RNA-seq data.

sambamba (0.6.6)

Sambamba is a high performance modern robust and fast tool (and library), written in the D programming language, for working with SAM and BAM files. Current parallelised functionality is an important subset of samtools functionality, including view, index, sort, markdup, and depth.

samblaster (0.1.24)

samblaster is a program for marking duplicates and finding discordant/split read pairs in read-id grouped paired-end SAM files. When marking duplicates, samblaster will use about 20MB per 1M read pairs. In a read-id grouped SAM file all alignments for a read-id (QNAME) are continuous. Aligners naturally produce such files. They can also be created by sorting a SAM file by read-id.

samtools (1.5)

The samtools package now provides samtools, bcftools, tabix, and the underlying htslib library.

scallop (0.9.8)

Scallop is a reference-based transcript assembler.

ScanIndel (1.2)

ScanIndel is a python program to detect indels (insertions and deletions) from NGS data by re-align and de novo assemble soft clipped reads.

segemehl (0.2.0)

Segemehl is a short read aligner that allows local alignment and can align reads obtained after bisulfite treatment.

seqlinkage (1.0)

SEQLinkage implements a collapsed haplotype pattern (CHP) method to generate markers from sequence data for linkage analysis.

SeqMonk (1.38.1)

SeqMonk is a program to enable the visualization and analysis of mapped sequence data. It was written for use with mapped next generation sequence data but can in theory be used for any dataset which can be expressed as a series of genomic positions.

seqtk (1.2)

seqtk is a toolkit for processing sequences in FASTA/Q formats

shapeit (2.r837)

SHAPEIT is a fast and accurate haplotype inference software

shrimp (2_2_3)

SHRiMP is a software package for aligning genomic reads against a target genome. It was primarily developed with the multitudinous short reads of next generation sequencing machines in mind, as well as Applied Biosystem's colourspace genomic representation.

sicer (1.1)

A clustering approach for identification of enriched domains from histone modification ChIP-Seq data

sickle (1.33)

A windowed adaptive trimming tool for FASTQ files using quality

SIFT (6.2.1)

SIFT predicts whether an amino acid substitution affects protein function based on sequence homology and the physical properties of amino acids.

smalt (0.7.6)

SMALT efficiently aligns DNA sequencing reads with a reference genome.

smart (2.1.5)

Specific Methylation Analysis and Report Tool (SMART) uses the signal from bisulfite sequencing experiments across multiple samples to identify genome segments with similar methylation secificities.

smrtanalysis (

SMRT® Analysis is a bioinformatics software suite available for analysis of DNA sequencing data from Pacific Biosciences’ SMRT technology. Users can choose from a variety of analysis protocols that utilize PacBio® and third-party tools. Analysis protocols include de novo genome assembly, cDNA mapping, DNA base-modification detection, and long-amplicon analysis to determine phased consensus sequences.

snap (1.0Beta23)

A Scalable Nucleotide Alignment Program. SNAP is a new sequence aligner that is 3-20x faster and just as accurate as existing tools like BWA-mem, Bowtie2 and Novoalign.

snp2hla (1.0.3)

SNP2HLA is a tool to impute amino acid polymorphisms and single nucleotide polymorphisms in human luekocyte antigenes (HLA) within the major histocompatibility complex (MHC) region in chromosome 6.

snpEff (4.3k)

snpEff is a variant annotation and effect prediction tool. It annotates and predicts the effects of variants on genes (such as amino acid changes).

snptest (2.5.2)

SNPTEST is a program for the analysis of single SNP association in genome-wide studies. The tests implemented include * Binary (case-control) phenotypes, single and multiple quantitative phenotypes * Bayesian and Frequentist tests * Ability to condition upon an arbitrary set of covariates * Various different methods for the dealing with imputed SNPs. The program is designed to work seamlessly with the output of both the genotype calling program CHIAMO, the genotype imputation program IMPUTE and the program GTOOL.

SOAP3-dp (2.3.178+20170103)

SOAP3-dp is a GPU-based software for aligning short reads to a reference sequence. It improves on SOAP3 in terms of both speed and sensitivity by exploitation of whole-genome indexing and dynamic programming on a GPU. SOAP3 is limited to find alignments with at most 4 mismatches, while SOAP3-dp can find alignments involving mismatches, INDELs, and small gaps. The number of reads aligned, especially for paired-end data, typically increases 5 to 10 percent from SOAP3 to SOAP3-dp.

SoftSearch (1.0)

For discovering structural variant breakpoints in Illumina paired-end next-generation sequencing data.

SomaticSeq (2.2.3)

SomaticSeq is an ensemble approach to accurately detect somatic mutations. It incorporates multiple somatic mutation caller(s) to obtain a combined call set, and then uses machine learning to distinguish true mutations from false positives from that call set.

somaticsniper (

The purpose of this program is to identify single nucleotide positions that are different between tumor and normal (or, in theory, any two bam files). It takes a tumor bam and a normal bam and compares the two to determine the differences. It outputs a file in a format very similar to Samtools consensus format.

sortmerna (2.1)

SortMeRNA is a biological sequence analysis tool for filtering, mapping and OTU-picking NGS reads.

spades (3.10.1)

SPAdes – St. Petersburg genome assembler – is intended for both standard isolates and single-cell MDA bacteria assemblies.

spats (1.0.0)

Spats processes reads and calculates SHAPE reactivities for SHAPE-Seq experiments on multiple RNAs.

splicemap (

SpliceMap is a de novo splice junction discovery and alignment tool. It offers high sensitivity and support for arbitrary RNA-seq read lengths.

sratoolkit (2.8.1-2)

The NCBI SRA Toolkit enables reading ("dumping") of sequencing files from the SRA database and writing ("loading") files into the .sra format.

stacks (1.46)

Stacks is a software pipeline for building loci from short-read sequences, such as those generated on the Illumina platform. Stacks was developed to work with restriction enzyme-based data, such as RAD-seq, for the purpose of building genetic maps and conducting population genomics and phylogeography.

stampy (1.0.31)

Short read aligner

STAR (2.5.3a)

Spliced Transcripts Alignment to a Reference

strelka (2.7.1)

Strelka is an analysis package designed to detect somatic SNVs and small indels from the aligned sequencing reads of matched tumor-normal samples.

stringtie (1.3.3)

StringTie is a fast and highly efficient assembler of RNA-Seq alignments into potential transcripts. It is primarily a genome-guided transcriptome assembler, although it can borrow algorithmic techniques from de novo genome assembly to help with transcript assembly.

subread (1.5.2)

High-performance read alignment, quantification and mutation discovery

supernova (1.2.0)

Supernova generates highly-contiguous, phased, whole-genome de novo assemblies from a Chromium-prepared library.

SVPV (1.01)

SVPV (Structural Variant Prediction Viewer) enables visualisation of predicted structural variant regions in paired-end whole genome sequencing alignments, and allows comparison of calls from differenct structural variant prediction algorithms.

SVseq (2_2)

SVseq2 takes BAM file with soft-clip signature as input, is faster then SVseq1 and is calling both deletions and insertions.

svtyper (0.1.4)

Svtyper is a Bayesian genotyper for structural variants.

taco (0.7.2)

Multi-sample transcriptome assembly from RNA-Seq

telseq (0.0.1)

TelSeq is a software that estimates telomere length from whole genome sequencing data (BAMs).

tetoolkit (1.5.0)

A package for including transposable elements in differential enrichment analysis of sequencing datasets.

THetA (0.7-6-g4f12904)

Tumor Heterogeneity Analysis (THetA) is an algorithm used to estimate tumor purity and clonal/subclonal copy number aberrations simultaneously from high-throughput DNA sequencing data.

TMAP (3.4.1)

TMAP is a fast and accurate alignment software for short and long nucleotide sequences produced by Ion Torrent sequencing technologies.

tophat (2.1.1)

TopHat is a fast splice junction mapper for RNA-Seq reads. It aligns RNA-Seq reads to mammalian-sized genomes using the ultra high-throughput short read aligner Bowtie, and then analyzes the mapping results to identify splice junctions between exons.

transabyss (1.5.3)

Trans-ABySS is a software pipeline for analyzing ABySS-assembled contigs from shotgun transcriptome data.

TransDecoder (3.0.1)

TransDecoder identifies candidate coding regions within transcript sequences, such as those generated by de novo RNA-Seq transcript assembly using Trinity, or constructed based on RNA-Seq alignments to the genome using Tophat and Cufflinks.

transvar (2.1.17)

TransVar is a versatile annotator for 3-way conversion and annotation among genomic characterization(s) of mutations and transcript-dependent annotation(s).

trimgalore (0.4.2)

Consistent quality and adapter trimming for RRBS or standard FastQ files.

trimmomatic (0.36)

Trimmomatic performs a variety of useful trimming tasks for illumina paired-end and single ended data.

trinity (2.4.0)

Trinity, developed at the Broad Institute and the Hebrew University of Jerusalem, represents a novel method for the efficient and robust de novo reconstruction of transcriptomes from RNA-seq data.

Trinotate (3.0.1)

Trinotate is a comprehensive annotation suite designed for automatic functional annotation of transcriptomes, particularly de novo assembled transcriptomes, from model or non-model organisms.

trup (01062017)

TRUP is a Tumor-specimen suited RNA-seq Unified Pipeline

TSSpredator (1.06)

A tool for comparative detection of transcription start sites

TVC (5.0.2)

TVC is the standalone Torrent Variant Caller, part of the Ion Torrent Suite.

useq (8.9.6)

USeq is a collection of software tools for for both low and high level analysis of next generation, ultra high throughput signature sequencing data from the Solexa, SOLiD, and 454 platforms. Initial emphasis: chIP-seq and RNA-Seq with FDR estimations.

VarDict (1.4.5)

VarDict is a sensitive variant caller for both single and paired sample variant calling from BAM files. VarDictJava is a faster Java version of VarDict.

varscan (2.4.3)

A platform-independent, technology-independent software tool for identifying SNPs and indels in massively parallel sequencing of individual and pooled samples.

vasttools (1.2.0)

A toolset for profiling alternative splicing events in RNA-Seq data.

vcf2maf (1.6.12)

A smarter, more reproducible, and more configurable tool for converting a VCF to a MAF.

vcfanno (0.1.1)

annotate a VCF with other VCFs/BEDs/tabixed files

vcflib (v1.0.0-rc0-279-gc71853a)

a simple C++ library for parsing and manipulating VCF files, + many command-line utilities

vcftools (0.1.15)

VCFtools contains a Perl API (Vcf.pm) and a number of Perl scripts that can be used to perform common tasks with VCF files such as file validation, file merging, intersecting, complements, etc.

Velvet (1.2.10)

Velvet is a de novo genomic assembler specially designed for short read sequencing technologies, such as Solexa or 454

VEP (89)

VEP (Variant Effect Predictor) determines the effect of your variants (SNPs, insertions, deletions, CNVs or structural variants) on genes, transcripts, and protein sequence, as well as regulatory regions.

verifybamid (1.1.3)

verifyBamID is a software that verifies whether the reads in particular file match previously known genotypes for an individual (or group of individuals), and checks whether the reads are contaminated as a mixture of two samples. verifyBamID can detect sample contamination and swaps when external genotypes are available. When external genotypes are not available, verifyBamID still robustly detects sample swaps.

ViralFusionSeq (20121130)

ViralFusionSeq (VFS) is a versatile high-throughput sequencing (HTS) tool for discovering viral integration events and reconstruct fusion transcripts at single-base resolution.

vt (0.577)

vt is a variant tool set that discovers short variants from Next Generation Sequencing data.

XHMM (2016-01-04)

XHMM uses principal component analysis (PCA) normalization and a hidden Markov model (HMM) to detect and genotype copy number variation (CNV) from normalized read-depth data from targeted sequencing experiments.

xwas (1.1)

XWAS (chromosome X-Wide Analysis toolSet) is a software suite for the analysis of the X chromosome in association analyses and similar studies.

Scientific Databases

A set of centrally-maintained and updated scientific databases is made available to users of Helix and Biowulf.

alleleCount (3.2.2)

Calculates genotype frequencies of a SNPMatrix. This component tests each SNP for its Hardy-Weinberg equilibrium. If there are NA values, the frequencies of missing value per sample in the input file are calculated.

Azimuth (2.0)

Machine Learning-Based Predictive Modelling of CRISPR/Cas9 guide efficiency.

Blast (2.5.0+)

NCBI's famous sequence database searching program which compares a nucleotide or protein query sequence against all sequences in a database.

BLAT (3.5)

BLAT is a DNA/Protein Sequence Analysis program that is designed to quickly find sequences of 95% and greater similarity of length 40 bases or more.

ChIA-PET (r261)

ChIA-PET is a software package for automatic processing of ChIA-PET sequence data, including linker filtering, mapping tags to reference genomes, identifying protein binding sites and chromatin interactions, and displaying the results on a graphical genome browser (not installed).

clustalo (1.2.1)

Clustal-Omega is a general purpose multiple sequence alignment (MSA) program for proteins and DNA/RNA. It produces high quality MSAs and is capable of handling data-sets of hundreds of thousands of sequences in reasonable time.

ClustalW (2.1)

ClustalW is a general-purpose multiple alignment program for DNA or protein sequences.

DNAWorks (3.2.4)

DNAWorks is a computer program that automates the design of oligonucleotides for gene synthesis by PCR-based gene assembly. The program requires simple input information: an amino acid sequence of the target protein or a DNA sequence, and a desired annealing temperature. It is a web-based tool available at https://hpcwebapps.cit.nih.gov/dnaworks/.


Formerly GCG-Lite, the same user-friendly web interface, updated and modified to use the EMBOSS suite of sequence analysis programs. The application is available at https://hpcnihapps.cit.nih.gov/emboss_lite.

exonerate (2.2.0)

Exonerate is a generic tool for pairwise sequence comparison.

freecontact (1.0.21)

reeContact predicts interactions between pairs of residues in a protein based on information about correlated changes in multiple sequence alignments.

gffcompare (0.9.8)

gffcompare can be used to compare and evaluate the accuracy of RNA-Seq transcript assemblers (Cufflinks, Stringtie). It can collapse (merge) duplicate transcripts from multiple GTF/GFF3 files (e.g. resulted from assembly of different samples) and classify transcripts from one or multiple GTF/GFF3 files as they relate to reference transcripts provided in a annotation file (also in GTF/GFF3 format).

hhsuite (3.0-beta.2)

The HH-suite is an open-source software package for sensitive protein sequence searching based on the pairwise alignment of hidden Markov models (HMMs).

HMMER (3.1b2)

HMMER is used for searching sequence databases for homologs of protein sequences, and for making protein sequence alignments. It implements methods using probabilistic models called "profile hidden Markov models" (profile HMMs). Compared to BLAST, FASTA, and other sequence alignment and database search tools based on older scoring methodology, HMMER aims to be significantly more accurate and more able to detect remote homologs because of the strength of its underlying mathematical models.

IgBlast (1.7.0)

IgBlast is a sequence analysis tool for immunoglobulin variable domains.

INRICH (1.1)

INRICH is a pathway analysis tool for genome wide association studies, designed for detecting enriched association signals of LD-independent genomic regions within biologically relevant gene sets.

interproscan (5.22-61.0)

InterProScan is the software package that allows sequences (protein and nucleic) to be scanned against InterPro's signatures. Signatures are predictive models, provided by several different databases, that make up the InterPro consortium.

jellyfish (2.2.6)

Jellyfish is a tool for fast, memory-efficient counting of k-mers in DNA.

kallisto (0.42.4)

kallisto is a program for quantifying abundances of transcripts from RNA-Seq data, or more generally of target sequences using high-throughput sequencing reads. It is based on the novel idea of pseudoalignment for rapidly determining the compatibility of reads with targets, without the need for alignment.

khmer (2.1.2)

Library and suite of command line tools for working with short-read, DNA sequences, taking a k-mer-centric approach to sequence analysis.

kplogo (1.1)

k-mer probability logo (kpLogo) is a probability-based logo tool for integrated detection and visualization of position-specific ultra-short motifs from a set of aligned sequences

LASTZ (1.03.66)

LASTZ is a tool for (1) aligning two DNA sequences, and (2) inferring appropriate scoring parameters automatically. LASTZ is a drop-in replacement for BLASTZ, and is backward compatible with BLASTZ's command-line syntax.

Meme (4.11.2)

MEME is used to discover motifs in groups of DNA/protein sequences or databases.

mfold (3.6)

MFOLD predicts DNA and RNA secondary structure.

MUMmer (3.23)

Mummer is a system for aligning entire genomes extremely rapidly.

MUSCLE (3.8.31)

Fast Multiple Sequence Alignment program.

ncbi-toolkit (12.0.0)

The NCBI C++ Toolkit is a set of executables and libraries for a multitude of sequence analysis functions.

OligoArray (2.1)

OligoArray computes gene specific oligonucleotides that are free of secondary structure for genome-scale oligonucleotide microarray construction.

oncotator (

Tool for annotating information onto genomic point mutations (SNPs/SNVs) and indels.

ORFfinder (0.4.0)

ORF finder searches for open reading frames (ORFs) in the DNA sequence you enter. The program returns the range of each ORF, along with its protein translation. Use ORF finder to search newly sequenced DNA for potential protein encoding segments, verify predicted protein using newly developed SMART BLAST or regular BLASTP.

PRANK (150803)

PRANK is a probabilistic multiple alignment program for DNA, codon and amino-acid sequences. PRANK is based on a novel algorithm that treats insertions correctly and avoids over-estimation of the number of deletion events.

prokka (1.12)

Prokka is a software tool for the rapid annotation of prokaryotic genomes.


PSIPRED is a simple and accurate secondary structure prediction method, incorporating two feed-forward neural networks which perform an analysis on output obtained from PSI-BLAST (Position Specific Iterated - BLAST).

randfold (2.0)

RandFold computes the probability that, for a given RNA sequence, the Minimum Free Energy (MFE) of the secondary structure is different from a distribution of MFE computed with random sequences.

RNAmmer (1.2)

RNAmmer predicts ribosomal RNA genes in full genome sequences by utilising two levels of Hidden Markov Models: An initial spotter model searches both strands. The spotter model is constructed from highly conserved loci within a structural alignment of known rRNA sequences. Once the spotter model detects an approximate position of a gene, flanking regions are extracted and parsed to the full model which matches the entire gene.

roary (3.7.0)

Roary is a high speed stand alone pan genome pipeline, which takes annotated assemblies in GFF3 format (produced by Prokka) and calculates the pan genome.

SeqMonk (1.38.1)

SeqMonk is a program to enable the visualization and analysis of mapped sequence data. It was written for use with mapped next generation sequence data but can in theory be used for any dataset which can be expressed as a series of genomic positions.

SIFT4G (2.4)

SIFT4G searches for similar sequences, chooses closely related sequences that may share similar function to the query sequences, obtains the alignment of these chosen sequences, and calculates normalized probabilities for all possible substitutions from the alignment. Positions with normalized probabilities less than 0.05 are predicted to be deleterious, those greater than or equal to 0.05 are predicted to be tolerated.

signalp (4.1)

SignalP predicts the presence and location of signal peptide cleavage sites in amino acid sequences from different organisms: Gram-positive bacteria, Gram-negative bacteria, and eukaryotes. The method incorporates a prediction of cleavage sites and a signal peptide/non-signal peptide prediction based on a combination of several artificial neural networks.

steme (1.9.1)

An efficient accurate motif finder based on MEME and implemented using suffix arrays.

svviz (1.5.1)

svviz visualizes high-throughput sequencing data relevant to a structural variant. Only reads supporting the variant or the reference allele will be shown. svviz can operate in both an interactive web browser view to closely inspect individual variants, or in batch mode, allowing multiple variants (annotated in a VCF file) to be analyzed simultaneously.

tantan (13)

A tool to mask low complexity and short period tandem repeats

TMHMM (2.0c)

TMHMM predicts transmembrane helices in proteins.

TRF (4.09)

A tandem repeat in DNA is two or more adjacent, approximate copies of a pattern of nucleotides. Tandem Repeats Finder is a program to locate and display tandem repeats in DNA sequences.

Two Sample Logo is a procedure for discovery of statistically significant position-specific differences in residue compositions between two multiple sequence alignments, as well as for graphical representation of those differences.

unafold (3.8)

UNAFold is a comprehensive software package for nucleic acid folding and hybridization prediction.

usearch (9.0.2132)

USEARCH is a unique sequence analysis tool with thousands of users world-wide. USEARCH offers search and clustering algorithms that are often orders of magnitude faster than BLAST.

viennarna (2.3.5)

RNA Secondary Structure Prediction and Comparison

weblogo (3.5)

contains seqlogo utility to create sequence logo summarizing sequence alignments

CCP4 (7.0)

CCP4 is a suite of programs for protein crystallography and structural biology.

CSD (5.38)

The Cambridge Structural Database is the world repository of small molecule crystal structures. Available on Helix only.

DSSP (2.2.1)

The DSSP program was designed by Wolfgang Kabsch and Chris Sander to standardize secondary structure assignment. DSSP is a database of secondary structure assignments (and much more) for all protein entries in the Protein Data Bank (PDB). DSSP is also the program that calculates DSSP entries from PDB entries. DSSP does not predict secondary structure.

emspring (0.84)

SPRING (Single Particle Reconstruction from Images of kN own Geometry) is a single-particle based helical reconstruction package for electron cryo-micrographs and has been used to determine 3D structures of a variety of highly ordered and less ordered specimens.

I-TASSER (5.1)

I-TASSER (Iterative Threading ASSEmbly Refinement) is a hierarchical approach to protein structure and function prediction.

Jackal (2002)

Jackal is a collection of programs designed for the modeling and analysis of protein structures. Its core program is a versatile homology modeling package nest.

lammps (30Jul16)

LAMMPS is a classical molecular dynamics code, and an acronym for Large-scale Atomic/Molecular Massively Parallel Simulator. It runs on a variety of different computer systems, including single processor systems, distributed-memory machines with MPI, and GPU and Xeon Phi systems. LAMMPS is open source software, released under the GNU General Public License.

LOOS (2.3.1)

LOOS (Lightweight Object-Oriented Structure library) is a code library for developing new molecular dynamics analysis applications. It also has a large number of stand-alone tools for manipulating and analyzing trajectories and molecules.

Marvin (16.4.11)

MarvinBeans is an intuitive applications and API for chemical sketching, visualization and data exploration

mdtraj (1.6.1)

MDTraj is a python library that allows users to manipulate molecular dynamics (MD) trajectories and perform a variety of analyses, including fast RMSD, solvent accessible surface area, hydrogen bonding, etc.

moabs (1.3.2)

A comprehensive, accurate and efficient solution for analysis of large scale base-resolution DNA methylation data, bisulfite sequencing or single molecule direct sequencing.

Phenix (1.12-2829)

PHENIX is a software suite for the automated determination of macromolecular structures using X-ray crystallography and other methods.

PROCHECK (3.5.4)

PROCHECK checks the stereochemical quality of a protein structure, producing a number of PostScript plots analysing its overall and residue-by-residue geometry. It includes PROCHECK-NMR for checking the quality of structures solved by NMR.

prody (1.8.2)

ProDy is a free and open-source Python package for protein structural dynamics analysis. It is designed as a flexible and responsive API suitable for interactive usage and application development.

ProFit (3.1)

ProFit is designed to be the ultimate protein least squares fitting program. It has many features including flexible specification of fitting zones and atoms, calculation of RMS over different zones or atoms, RMS-by-residue calculation, on-line help facility, etc.

Pymol (1.8.4)

A comprehensive molecular visualization product for rendering and animating 3D molecular structures.

Schrodinger (2017.1)

A limited number of Schrödinger applications are available on the Biowulf cluster through the Molecular Modeling Interest Group. Most are available through the Maestro GUI.

Scipion (1.1)

Scipion is an image processing framework to obtain 3D models of macromolecular complexes using Electron Microscopy (3DEM). It integrates several software packages and presents an unified interface for both biologists and developers. Scipion allows to execute workflows combining different software tools, while taking care of formats and conversions. Additionally, all steps are tracked and can be reproduced later on.

SHELX (2016)

SHELX is a set of programs for the determination of small (SM) and macromolecular (MM) crystal structures by single crystal X-ray and neutron diffraction.

VMD (1.9.3)

VMD is a molecular visualization program for displaying, animating, and analyzing large biomolecular systems using 3-D graphics and built-in scripting. To use, type vmd at the prompt.

wimsi (0.3.0)

a new approach to methylation analysis based on shape-similarity

Xplor-NIH (2.42)

Xplor-NIH is a structure determination program which builds on the X-PLOR v3.851 program, including additional tools developed at the NIH.

genesis (2.4)

GENESIS (GEneral NEural SImulation System) is a software platform for the simulation of neural systems ranging from subcellular components and biochemical reactions to complex models of single neurons, large networks, and systems-level processes.

kilosort (8738ef7)

a Matlab-based program for identifying and sorting neuronal spikes from multi-channel electrophysiological recording data sets.

Neuron (7.4)

NEURON is a simulation environment for modeling individual neurons and networks of neurons. It provides tools for conveniently building, managing, and using models in a way that is numerically sound and computationally efficient. It is particularly well-suited to problems that are closely linked to experimental data, especially those that involve cells with complex anatomical and biophysical properties.

phy (1.0.9)

phy is an open source neurophysiological data analysis package in Python. It provides features for sorting, analyzing, and visualizing extracellular recordings made with high-density multielectrode arrays containing hundreds to thousands of recording sites.

agrep (0.8.0-6fb7206)

approximate GREP for fast fuzzy string searching. This is the TRE implementation of the tool. TRE is a lightweight, robust, and efficient POSIX compliant regexp matching library with some special features such as approximate (fuzzy) matching.

aria2 (1.23.0)

multiprotocol download utility

asciinema (1.3.0)

asciinema [as-kee-nuh-muh] is a free and open source solution for recording terminal sessions and sharing them.

Aspera (

High-speed fasp-powered file transfers. Mostly used to download data from NCBI, which has an Aspera server. See the data transfer page for details.

AWS (Jul2015)

Command-line tools for Amazon Web Services. Use 'module load python; aws -help' to see the command-line help, or http://aws.amazon.com/cli/.

Bats (0.4.0)

Bats is a TAP-compliant testing framework for Bash. It provides a simple way to verify that the UNIX programs you write behave as expected.

coreutils (8.27)

The GNU Core Utilities are the basic file, shell and text manipulation utilities of the GNU operating system.

EDirect (7.00)

Entrez Direct (EDirect) is an advanced method for accessing the NCBI's set of interconnected databases (publication, sequence, structure, gene, variation, expression, etc.) from a UNIX terminal window.

Text Editors available on the systems

Eye Of GNOME (2.28.2)

Eye of GNOME is the GNOME image viewer. You can use it to view images through a running X Windows session. To start it, type eog at the prompt.

gdrive (2.1.0)

gdrive is a command line utility for interacting with Google Drive.

geomview (1.9.5)

Geomview is an interactive program for viewing and manipulating geometric objects. It can be used as a standalone viewer for static objects or as a display engine for other programs which produce dynamically changing geometry

The GNOME desktop is available via NX.

gnuplot (5.0.4)

Gnuplot is a portable command-line driven graphing utility to visualize mathematical functions and data interactively, and can support many non-interactive uses such as web scripting.

Google Cloud SDK (143.0.1)

Google Cloud SDK is a set of tools that you can use to manage resources and applications hosted on Google Cloud Platform. These include the gcloud, gsutil, and bq command line tools. See docs at https://cloud.google.com/sdk/docs/how-to

Grace (5.1.25)

Grace is a WYSIWYG 2D plotting tool for the X-Window system. It is a successor to Xmgr.

graphviz (2.38.0)

Graphviz is open source graph visualization software. Graph visualization is a way of representing structural information as diagrams of abstract graphs and networks. It has important applications in networking, bioinformatics, software engineering, database and web design, machine learning, and in visual interfaces for other technical domains.

groff (1.22.3)

Groff (GNU troff) is a typesetting system that reads plain text mixed with formatting commands and produces formatted output. Output may be PostScript or PDF, html, or ASCII/UTF8 for display at the terminal. Formatting commands may be either low-level typesetting requests (“primitives”) or macros from a supplied set. Users may also write their own macros. All three may be combined.

ImageMagick (6.8.9)

ImageMagick is a software suite to create, edit, compose, or convert bitmap images. It can read and write images in a variety of formats.

jq (1.5)

Command line json processor

kronatools (2.7)

Krona allows hierarchical data to be explored with zooming, multi-layered pie charts. Krona charts can be created using an Excel template or KronaTools, which includes support for several bioinformatics tools and raw data formats. The interactive charts are self-contained and can be viewed with any modern web browser.

mc (4.8.19)

GNU Midnight Commander is a visual file manager, with a feature rich full-screen text mode application that allows you to copy, move and delete files and whole directory trees, search for files and run commands in the subshell. Type module load mc and then the command mc to get started.

mercurial (3.7.3)

mercurial is a version control system that runs within Python.

MySQL (5.5.54)

MySQL is an open-source relational database management system.

niftilib (2.0.0)

Niftilib is a set of i/o libraries for reading and writing files in the nifti-1 data format. nifti-1 is a binary file format for storing medical image data, e.g. magnetic resonance image (MRI) and functional MRI (fMRI) brain images.


7-Zip is a file archiver with the highest compression ratio. The program supports 7z (that implements LZMA compression algorithm), ZIP, CAB, ARJ, GZIP, BZIP2, TAR, CPIO, RPM and DEB formats. Compression ratio in the new 7z format is 30-50% better than ratio in ZIP format. 7za is a stand-alone executable. 7za handles less archive formats than 7z, but does not need any others.

parallel (20170422)

GNU parallel is a shell tool for executing jobs in parallel using one or more computers.

pigz (2.3.4)

pigz (parallel implementation of gzip) is a fully functional replacement for gzip that exploits multiple processors and multiple cores to the hilt when compressing data.

POVRay (3.7)

POVRAY (Persistence of Vision RAYtracer) is a high-quality tool for creating three-dimensional graphics. Raytraced images are publication-quality and 'photo-realistic', but are computationally expensive so that large images can take many hours to create.

rdfind (1.3.4)

rdfind is a program that finds duplicate files. It is useful for compressing backup directories or just finding duplicate files. It compares files based on their content, NOT on their file names. After typing module load rdfind, type man rdfind for more information.

shunit2 (2.1.6)

shUnit2 is a xUnit unit test framework for Bourne based shell scripts, and it is designed to work in a similar manner to JUnit, PyUnit, etc. If you have ever had the desire to write a unit test for a shell script, shUnit2 can do the job.

singularity (2.3.1)

Singularity is a container platform focused on supporting ``Mobility of Compute``. It allows users to emulate, and share custom Linux environments allowing for the creation of self-contained development stacks.

Spark (2.1.1)

Apache Spark is a fast and general engine for large-scale data processing. It is commonly used as an in-memory alternative to Hadoop MapReduce.

SQLite (3.15.0)

SQLite is a software library that implements a self-contained, serverless, zero-configuration, transactional SQL database engine.

Swarm is a script designed to simplify submitting a group of commands to the Biowulf cluster.

synapseclient (1.6.2)

The synapseclient package provides an interface to Synapse, a collaborative workspace for reproducible, data intensive research projects

TAU (2.26)

TAU - an acronym for Tuning And Analysis Utilities - is a suite of software tools for measuring performance of software packages running on a High Performance Computing resource such as the Biowulf cluster. It has the capability to measure CPU, memory, and I/O performance throughout the execution of an application.

tmux (2.3)

tmux is a terminal multiplexer.

vcf2db (7dfc48a)

vcf2db creates a gemini-compatible database from a VCF.

wuzz (0.3.0)

Interactive cli tool for HTTP inspection

bds (0.99999l)

BDS, or Big Data Script, is a s cross-system workflow language for working with big data pipelines in computer systems of different sizes and capabilities.

nextflow (0.23.3)

Data-driven computational pipelines

snakemake (3.13.3)

Snakemake aims to reduce the complexity of creating workflows by providing a fast and comfortable execution environment, together with a clean and modern domain specific specification language (DSL) in python style. It is well suited for bioinformatic workflows.