Biowulf High Performance Computing at the NIH
Application updates in the last 3 months
To see all versions available for any application, use module avail application_name
All centrally-installed applications are listed on the Applications page
Updated Application
24 May 2019 guppy updated to version 3.1.5
Local accelerated basecalling for Nanopore data
23 May 2019 turbovnc updated to version 2.2.2
TurboVNC is a derivative of VNC (Virtual Network Computing) that is tuned to provide peak performance for 3D and video workloads.
22 May 2019 xcpengine updated to version 1.0
xpcEngine performs denoising and estimation of Functional Connectivity on fMRI datasets
21 May 2019 QIIME updated to version 2-2019.4
QIIME is an open source software package for comparison and analysis of microbial communities, primarily based on high-throughput amplicon sequencing data (such as SSU rRNA) generated on a variety of platforms, but also supporting analysis of other types of data (such as shotgun metagenomic data).
20 May 2019 circtools updated to version
Circtools is a modular, Python3-based framework for circRNA-related tools that unifies several functionalities in single command line driven software. The command line follows the circtools subcommand standard that is employed in samtools or bedtools. Currently, circtools includes modules for detecting and reconstructing circRNAs, a quick check of circRNA mapping results, RBP enrichment screenings, circRNA primer design, statistical testing, and an exon usage module.
17 May 2019 RepeatMasker updated to version 4.0.9-p2
RepeatMasker is a program that screens DNA sequences for interspersed repeats and low complexity DNA sequences. The output of the program is a detailed annotation of the repeats that are present in the query sequence as well as a modified version of the query sequence in which all the annotated repeats have been masked (default: replaced by Ns). On average, almost 50% of a human genomic DNA sequence currently will be masked by the program.
17 May 2019 PhyloBayes updated to version 4.1c
PhyloBayes is a software package which can be used for conducting Bayesian phylogenetic reconstruction and molecular dating analyses, using a large variety of amino acid replacement and nucleotide substitution models, including empirical mixtures or non-parametric models, as well as alternative clock relaxation processes.
17 May 2019 DNAnexus updated to version 0.276.0
DNAnexus is a cloud-based commercial solution for next-generation sequence analysis and visualization. It has a command-line interface (CLI) which can be used to log in to the DNAnexus platform, upload and navigate data, and launch analyses.
16 May 2019 Rstudio updated to version 1.2.1335
RStudio is a set of integrated tools designed to help you be more productive with R. It includes a console, syntax-highlighting editor that supports direct code execution, as well as tools for plotting, history, debugging and workspace management.
16 May 2019 king updated to version 2.2.1
Quick Links Documentation Notes Interactive job Batch job Swarm of jobs KING is a toolset to explore genotype data from a genome-wide association study (GWAS) or a sequencing project. KING can be used to check family relationship and flag pedigree errors by estimating kinship coefficients and inferring IBD segments for all pairwise relationships.
16 May 2019 vsearch updated to version 2.13.4
VSEARCH supports de novo and reference based chimera detection, clustering, full-length and prefix dereplication, rereplication, reverse complementation, masking, all-vs-all pairwise global alignment, exact and global alignment searching, shuffling, subsampling and sorting. It also supports FASTQ file analysis, filtering, conversion and merging of paired-end reads.
14 May 2019 singularity updated to version 3.2.0
Singularity is a container platform focused on supporting ``Mobility of Compute``. It allows users to emulate, and share custom Linux environments allowing for the creation of self-contained development stacks.
14 May 2019 trimgalore updated to version 0.6.2
Consistent quality and adapter trimming for RRBS or standard FastQ files.
7 May 2019 scanpy updated to version 1.4.2
Scanpy is a scalable toolkit for analyzing single-cell gene expression data. It includes preprocessing, visualization, clustering, pseudotime and trajectory inference and differential expression testing. The Python-based implementation efficiently deals with datasets of more than one million cells.
7 May 2019 minimap2 updated to version 2.17
Minimap2 is a fast sequence mapping and alignment program that can find overlaps between long noisy reads, or map long reads or their assemblies to a reference genome optionally with detailed alignment (i.e. CIGAR).
7 May 2019 RELION updated to version 3.0.5
RELION (for REgularised LIkelihood OptimisatioN) is a stand-alone computer program for Maximum A Posteriori refinement of (multiple) 3D reconstructions or 2D class averages in cryo-electron microscopy.
7 May 2019 SpliceAI updated to version 20190507
SpliceAI is a deep neural network that accurately predicts splice junctions from an arbitrary pre-mRNA transcript sequence, enabling precise prediction of noncoding genetic variants that cause cryptic splicing.
2 May 2019 TORTOISE updated to version 3.1.4
(Tolerably Obsessive Registration and Tensor Optimization Indolent Software Ensemble) The TORTOISE software package is for processing diffusion MRI data.
2 May 2019 tagit updated to version 1.0.8
Tag(ging) It(erative) of SNVs in multiple populations.
2 May 2019 Rosetta updated to version 2019.14
The Rosetta++ software suite can perform de novo protein structure predictions, identify low free energy sequences for target protein backbones, predict the structure of a protein-protein complex from the individual structures of the monomer components, incorporate NMR data into the basic Rosetta protocol to accelerate the process of NMR structure prediction, and more...
2 May 2019 bowtie2 updated to version
A version of bowtie that's particularly good at aligning reads of about 50 up to 100s or 1,000s of characters, and particularly good at aligning to relatively long (e.g. mammalian) genomes
2 May 2019 bcbio-nextgen updated to version 1.1.5
Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis
1 May 2019 encode-atac-seq-pipeline updated to version 1.3.0
This pipeline is designed for automated end-to-end quality control and processing of ATAC-seq or DNase-seq data.
29 Apr 2019 rgt updated to version 0.12.1
Regulatory Genomics Toolbox: Python library and set of tools for the integrative analysis of high throughput regulatory genomics data.
29 Apr 2019 DeepLabCut updated to version 2.0.5
DeepLabCut is an open source toolbox that builds on a state-of-the-art human pose estimation algorithm. It allows training of a deep neural network by using limited training data to precisely track user-defined features, so that the human labeling accuracy will be matched.
26 Apr 2019 STAR updated to version 2.7.0f
Spliced Transcripts Alignment to a Reference
26 Apr 2019 GATK updated to version
GATK, from the Broad Institute, is a structured software library that makes writing efficient analysis tools using next-generation sequencing data very easy, and second it's a suite of tools for working with human medical resequencing projects such as 1000 Genomes and The Cancer Genome Atlas. These tools include things like a depth of coverage analyzers, a quality score recalibrator, a SNP/indel caller and a local realigner.
25 Apr 2019 cromwell updated to version 40
A Workflow Management System geared towards scientific workflows.
25 Apr 2019 golang updated to version 1.12.4
The Go programming language
25 Apr 2019 cutadapt updated to version 2.3
cutadapt removes adapter sequences from DNA high-throughput sequencing data. This is usually necessary when the read length of the machine is longer than the molecule that is sequenced, such as in microRNA data.
25 Apr 2019 Phenix updated to version 1.15.2-3472
PHENIX is a software suite for the automated determination of macromolecular structures using X-ray crystallography and other methods.
24 Apr 2019 parallel updated to version 20190422
GNU parallel is a shell tool for executing jobs in parallel using one or more computers.
24 Apr 2019 IGVTools updated to version 2.5.2
IGVTools provides utilities for working with ascii file formats used by the Integrated Genome Viewer. The files can be sorted, tiled, indexed, and counted.
24 Apr 2019 IGV updated to version 2.5.2
The Integrative Genomics Viewer is a high-performance visualization tool for interactive exploration of large, integrated genomic datasets.
24 Apr 2019 xpdf updated to version 4.01.01
Xpdf is a free PDF viewer and toolkit, including a text extractor, image converter, HTML converter, and more. Most of the tools are available as open source.
23 Apr 2019 FSL updated to version 6.0.1
FSL is a comprehensive library of image analysis and statistical tools for FMRI, MRI and DTI brain imaging data.
23 Apr 2019 osca updated to version 0.43
OSCA (OmicS-data-based Complex trait Analysis) is a software tool written in C/C++ for the analysis of complex traits using multi-omics data.
22 Apr 2019 maxquant updated to version
MaxQuant is a quantitative proteomics software package designed for analyzing large mass-spectrometric data sets. It is specifically aimed at high-resolution MS data. Several labeling techniques as well as label-free quantification are supported.
22 Apr 2019 MrBayes updated to version 3.2.7
MrBayes is a program for Bayesian inference and model choice across a wide range of phylogenetic and evolutionary models. MrBayes uses Markov chain Monte Carlo (MCMC) methods to estimate the posterior distribution of model parameters.
22 Apr 2019 beagle-lib updated to version 3.1.2
BEAGLE is a high-performance library that can perform the core calculations at the heart of most Bayesian and Maximum Likelihood phylogenetics packages. It can make use of highly-parallel processors such as those in graphics cards (GPUs) found in many PCs.
module name: beagle-lib
18 Apr 2019 vartrix updated to version 1.1.3
VarTrix is a software tool for extracting single cell variant information from 10x Genomics single cell data.
18 Apr 2019 mbin updated to version 1.1.1
The mBin pipeline is designed to discover the unique signals of DNA methylation in metagenomic SMRT sequencing reads and leverage them for organism binning of assembled contigs or unassembled reads. Because all cellular DNA is modified by the same set of methyltransferases encoded in the genome, DNA methylation signals can be used for binning not just chromosomal sequences, but also extrachromosomal mobile genetic elements like plasmids.
18 Apr 2019 dcm2niix updated to version 1.0.20190410
DICOM to NIfTI converter
18 Apr 2019 Mathematica updated to version 12.0
Mathematica is an interactive system for doing mathematical computation. It performs numerical, symbolic and graphical computations, and incorporates a high-level programming language.
17 Apr 2019 Matlab updated to version 2019a
MATLAB is an interactive software package for scientific and engineering numeric computation. MATLAB integrates numerical analysis, matrix computation, signal processing, and graphics in an environment where problems and solutions are expressed just as they are written mathematically.
16 Apr 2019 opera-lg updated to version 2.0.6
OPERA (Optimal Paired-End Read Assembler) is a sequence assembly program. It uses information from paired-end/mate-pair/long reads to order and orient the intermediate contigs/scaffolds assembled in a genome assembly project, in a process known as Scaffolding.
16 Apr 2019 sga updated to version 0.10.15
SGA (String Graph Assembler) is a de novo genome assembler based on the concept of string graphs. The major goal of SGA is to be very memory efficient, which is achieved by using a compressed representation of DNA sequence reads. It is based on Gene Myers' string graph formulation of assembly and uses the FM-index/Burrows-Wheeler transform to efficiently find overlaps between sequence reads.
16 Apr 2019 jemalloc updated to version 5.2.0
jemalloc is a general purpose malloc(3) implementation that emphasizes fragmentation avoidance and scalable concurrency support.
15 Apr 2019 blender updated to version 2.79
Blender is the free and open source 3D creation suite. Blender on Biowulf is meant for command-line rendering.
15 Apr 2019 pvactools updated to version 1.3.5
pVACtools is a cancer immunotherapy suite consisting of pVACseq, pVACfuse, pVACvector
12 Apr 2019 mriqc updated to version 0.15.0
MRIQC is an MRI quality control tool
11 Apr 2019 gffcompare updated to version 0.11.2
gffcompare can be used to compare and evaluate the accuracy of RNA-Seq transcript assemblers (Cufflinks, Stringtie). It can collapse (merge) duplicate transcripts from multiple GTF/GFF3 files (e.g. resulted from assembly of different samples) and classify transcripts from one or multiple GTF/GFF3 files as they relate to reference transcripts provided in a annotation file (also in GTF/GFF3 format).
10 Apr 2019 VEP updated to version 96
VEP (Variant Effect Predictor) determines the effect of your variants (SNPs, insertions, deletions, CNVs or structural variants) on genes, transcripts, and protein sequence, as well as regulatory regions.
8 Apr 2019 FitHiChIP updated to version 6.0
FitHiChIP is a computational method for identifying chromatin contacts among regulatory regions such as enhancers and promoters from HiChIP/PLAC-seq data. FitHiChIP jointly models the non-uniform coverage and genomic distance scaling of HiChIP data, captures previously validated enhancer interactions for several genes including MYC and TP53, and recovers contacts genome-wide that are supported by ChIA-PET, promoter capture Hi-C and Hi-C data.
5 Apr 2019 fmriprep updated to version 1.3.2
A Robust Preprocessing Pipeline for fMRI Data
3 Apr 2019 SEACR updated to version 1.0
SEACR is intended to call peaks and enriched regions from sparse Cleavage Under Targets and Release Using Nuclease (CUT&RUN) or chromatin profiling data in which background is dominated by "zeroes" (i.e. regions with no read coverage).
3 Apr 2019 Coot updated to version
Coot is for macromolecular model building, model completion and validation, particularly suitable for protein modelling using X-ray data.
3 Apr 2019 Blast updated to version 2.9.0+
NCBI's famous sequence database searching program which compares a nucleotide or protein query sequence against all sequences in a database.
2 Apr 2019 ncbi-toolkit updated to version 22.0.0
The NCBI C++ Toolkit is a set of executables and libraries for a multitude of sequence analysis functions.
29 Mar 2019 mc updated to version 4.8.22
GNU Midnight Commander is a visual file manager, with a feature rich full-screen text mode application that allows you to copy, move and delete files and whole directory trees, search for files and run commands in the subshell. Type module load mc and then the command mc to get started.
28 Mar 2019 Schrodinger updated to version 2019.1
A limited number of Schrödinger applications are available on the Biowulf cluster through the Molecular Modeling Interest Group. Most are available through the Maestro GUI.
26 Mar 2019 genoml updated to version 1.0.3
GenoML is an Automated Machine Learning tool that optimizes machine learning pipelines for genomic data. GenoML will automate the most tedious part of machine learning by intelligently exploring thousands of possible models to find the best one for your data.
26 Mar 2019 libjpeg-turbo updated to version 2.0.2
libjpeg-turbo is a JPEG image codec that uses SIMD instructions (MMX, SSE2, NEON) to accelerate baseline JPEG compression and decompression on x86, x86-64, and ARM systems.
26 Mar 2019 circlator updated to version 1.5.5
A tool to circularize genome assemblies
26 Mar 2019 pysamstats updated to version 1.1.2
Pysamstats is a fast Python and command-line utility for extracting simple statistics against genome positions based on sequence alignments from a SAM or BAM file.
25 Mar 2019 nasm updated to version 2.14.02
asssembler/disassembler for the intel x86 architecture
25 Mar 2019 novocraft updated to version 3.09.02
Package includes aligner for single-ended and paired-end reads from the Illumina Genome Analyser. Novoalign finds global optimum alignments using full Needleman-Wunsch algorithm with affine gap penalties.
25 Mar 2019 cnvkit updated to version 0.9.6
Copy number variant detection from targeted DNA sequencing
22 Mar 2019 miniasm updated to version 0.3.r179
Ultrafast de novo assembly for long noisy reads (though having no consensus step)
22 Mar 2019 racon updated to version 1.3.2
Ultrafast consensus module for raw de novo genome assembly of long uncorrected reads.
21 Mar 2019 mercurial updated to version 4.5.3
mercurial is a version control system that runs within Python. To use, type ml python/2.7. For help, type hg --help.
21 Mar 2019 rdfind updated to version 1.4.1
rdfind is a program that finds duplicate files. It is useful for compressing backup directories or just finding duplicate files. It compares files based on their content, NOT on their file names. After typing module load rdfind, type man rdfind for more information.
21 Mar 2019 cellranger-dna updated to version 1.0.0
Cell Ranger DNA is a set of analysis pipelines that process Chromium single cell DNA sequencing output to align reads, identify copy number variation (CNV), and compare heterogeneity among cells.
19 Mar 2019 netOglyc updated to version 3.1d
NetOglyc produces neural network predictions of mucin type GalNAc O-glycosylation sites in mammalian proteins.
19 Mar 2019 vt updated to version 0.577
vt is a variant tool set that discovers short variants from Next Generation Sequencing data.
19 Mar 2019 nvchecker updated to version 1.4
nvchecker (short for new version checker) is for checking if a new version of some software has been released.
19 Mar 2019 ncbi-vdb updated to version 2.9.6
The SRA Toolkit and SDK from NCBI is a collection of tools and libraries for using data in the INSDC Sequence Read Archives.
19 Mar 2019 ncbi-ngs updated to version 2.9.6
NCBI's NGS is a new, domain-specific API for accessing reads, alignments and pileups produced from Next Generation Sequencing
19 Mar 2019 sratoolkit updated to version 2.9.6
The NCBI SRA Toolkit enables reading ("dumping") of sequencing files from the SRA database and writing ("loading") files into the .sra format.
18 Mar 2019 viennarna updated to version 2.4.11
RNA Secondary Structure Prediction and Comparison
18 Mar 2019 globus-cli updated to version 1.10.0
Globus command line interface
18 Mar 2019 BEAST updated to version 1.10.4,2.5.2
BEAST (Bayesian Evolutionary Analysis Sampling Trees) is a cross-platform program for Bayesian MCMC analysis of molecular sequences.
18 Mar 2019 whippet updated to version 0.11
Lightweight and Fast; RNA-seq quantification at the event-level
14 Mar 2019 salmon updated to version 0.13.0
a tool for quantifying the expression of transcripts using RNA-seq data.
14 Mar 2019 mosdepth updated to version 0.2.5
Fast BAM/CRAM depth calculation for WGS, exome, or targeted sequencing.
11 Mar 2019 genrich updated to version 0.5
Genrich is a peak-caller for genomic enrichment assays (e.g. ChIP-seq, ATAC-seq). It analyzes alignment files generated following the assay and produces a file detailing peaks of significant enrichment.
11 Mar 2019 CCP4 updated to version 7.0.071
CCP4 is a suite of programs for protein crystallography and structural biology.
11 Mar 2019 Maven updated to version 3.6.0
Apache Maven is a software project management and comprehension tool. Based on the concept of a project object model (POM), Maven can manage a project's build, reporting and documentation from a central piece of information.
6 Mar 2019 UNet updated to version 20190225
U-Net is an image segmentation tool. It relies on the strong use of data augmentation to use the available annotated samples more efficiently. The architecture consists of a contracting path to capture context and a symmetric expanding path that enables precise localization.
6 Mar 2019 idep updated to version 0.81
iDEP (integrated Differential Expression and Pathway analysis) is shiny application for analyzing RNA-seq data
5 Mar 2019 speedseq updated to version 0.1.2-20180208-4e60002
SpeedSeq is a genome analysis platform designed for rapid whole-genome variant detection and interpretation
1 Mar 2019 quast updated to version 5.0.2
QUAST stands for QUality ASsessment Tool. The tool evaluates genome assemblies by computing various metrics. The package includes the general QUAST tool for genome assemblies, MetaQUAST, the extension for metagenomic datasets, and Icarus, interactive visualizer for these tools.
1 Mar 2019 nodejs updated to version 10.15.2
Node.js is a JavaScript runtime built on Chrome's V8 JavaScript engine. module name: nodejs
27 Feb 2019 qcat updated to version 1.0.1
qcat is Python command-line tool for demultiplexing Oxford Nanopore reads from FASTQ files.
27 Feb 2019 3DSlicer updated to version 4.10.1
A software platform for the analysis (including registration and interactive segmentation) and visualization (including volume rendering) of medical images and for research in image guided therapy.
27 Feb 2019 neovim updated to version 0.3.4
Neovim is a refactor, and sometimes redactor, in the tradition of Vim (which itself derives from Stevie). It is not a rewrite but a continuation and extension of Vim.
26 Feb 2019 AWS updated to version 1.16.111
Command-line tools for Amazon Web Services. Use 'module load python; aws -help' to see the command-line help, or
25 Feb 2019 icgc-get updated to version 0.6.1
icgc-get provides a unified interface to the many sources of data from the International Cancer Genome Consortium.
Scientific Databases updated in last 3 months
For a full list of scientific databases available on the NIH HPC systems, see this page

Updated Database Format Location
05 May 2019Rat Genome (Rattus norvegicus) rn4MySQLNIH mirror of UCSC Genome Browser
05 Apr 2019Mouse Genome (Mus musculus) mm8MySQLNIH mirror of UCSC Genome Browser