Biowulf High Performance Computing at the NIH
Application updates in the last 3 months
To see all versions available for any application, use module avail application_name
All centrally-installed applications are listed on the Applications page
Updated Application
27 Feb 2020 fmriprep updated to version 20.0.0
A Robust Preprocessing Pipeline for fMRI Data
26 Feb 2020 hisat updated to version
HISAT is a fast and sensitive spliced alignment program which uses Hierarchical Indexing for Spliced Alignment of Transcripts.
26 Feb 2020 sratoolkit updated to version 2.10.4
The NCBI SRA Toolkit enables reading ("dumping") of sequencing files from the SRA database and writing ("loading") files into the .sra format.
26 Feb 2020 ncbi-vdb updated to version 2.10.4
The SRA Toolkit and SDK from NCBI is a collection of tools and libraries for using data in the INSDC Sequence Read Archives.
26 Feb 2020 ncbi-ngs updated to version 2.10.4
NCBI's NGS is a new, domain-specific API for accessing reads, alignments and pileups produced from Next Generation Sequencing
26 Feb 2020 vasttools updated to version 2.3.0
A toolset for profiling alternative splicing events in RNA-Seq data.
25 Feb 2020 GATK updated to version
GATK, from the Broad Institute, is a structured software library that makes writing efficient analysis tools using next-generation sequencing data very easy, and second it's a suite of tools for working with human medical resequencing projects such as 1000 Genomes and The Cancer Genome Atlas. These tools include things like a depth of coverage analyzers, a quality score recalibrator, a SNP/indel caller and a local realigner.
25 Feb 2020 SimNIBS updated to version 3.1.1
SimNIBS is a free software package for the Simulation of Non-invasive Brain Stimulation. It allows for realistic calculations of the electric field induced by transcranial magnetic stimulation (TMS) and transcranial direct current stimulation (tDCS).
25 Feb 2020 DeepLabCut updated to version 2.1
DeepLabCut is an open source toolbox that builds on a state-of-the-art human pose estimation algorithm. It allows training of a deep neural network by using limited training data to precisely track user-defined features, so that the human labeling accuracy will be matched.
25 Feb 2020 htgts updated to version 2
High-Throughput Genome-Wide Translocation Sequencing pipeline
25 Feb 2020 uropa updated to version 3.5.0
UROPA is a command line based tool for genomic region annotation
24 Feb 2020 sysbench updated to version 1.0.11
sysbench is a scriptable multi-threaded benchmark tool based on LuaJIT. It is most frequently used for database benchmarks, but can also be used to create arbitrarily complex workloads that do not involve a database server.
24 Feb 2020 PRSice updated to version 2.2.12
PRSice is a Polygenic Risk Score software for calculating, applying, evaluating and plotting the results of polygenic risk scores (PRS) analyses.
24 Feb 2020 TORTOISE updated to version 3.2.0
(Tolerably Obsessive Registration and Tensor Optimization Indolent Software Ensemble) The TORTOISE software package is for processing diffusion MRI data.
24 Feb 2020 boost updated to version 1.72
Boost provides free peer-reviewed portable C++ source libraries. Boost libraries are intended to be widely useful, and usable across a broad spectrum of applications.
24 Feb 2020 parallel updated to version 20200222
GNU parallel is a shell tool for executing jobs in parallel using one or more computers.
20 Feb 2020 DNAnexus updated to version 0.290.1
DNAnexus is a cloud-based commercial solution for next-generation sequence analysis and visualization. It has a command-line interface (CLI) which can be used to log in to the DNAnexus platform, upload and navigate data, and launch analyses.
20 Feb 2020 YOLO updated to version 20200211
YOLO is a new approach to object detection. Prior work on object detection repurposed classifiers to perform detection. Instead, YOLO frames object detection as a regression problem to spatially separated bounding boxes and associated class probabilities.
20 Feb 2020 AdmixTools updated to version 6.0
ADMIXTOOLS is a software package that supports formal tests of whether admixture occurred, and makes it possible to infer admixture proportions and dates.
20 Feb 2020 OpenCRAVAT updated to version 1.7.0
OpenCRAVAT is a new open source, scalable decision support system for variant and gene prioritization. It includses a modular resource catalog to maximize community and developer involvement, and as a result the catalog is being actively developed and growing every month. Resources made available via the store are well-suited for analysis of cancer, as well as Mendelian and complex diseases.
19 Feb 2020 singularity updated to version 3.5.3
Singularity is a container platform focused on supporting ``Mobility of Compute``. It allows users to emulate, and share custom Linux environments allowing for the creation of self-contained development stacks.
19 Feb 2020 WISExome updated to version 20180814
WISExome is the tool that implements a within-sample comparison approach to CNV detection. It correctly identifies known pathogenic CNVs.
18 Feb 2020 fusioninspector updated to version 2.2.1
In silico Validation of Fusion Transcript Predictions
15 Feb 2020 nodejs updated to version 12.16.0
Node.js is a JavaScript runtime built on Chrome's V8 JavaScript engine. module name: nodejs
13 Feb 2020 Comsol updated to version 55
The COMSOL Multiphysics engineering simulation software environment facilitates all steps in the modeling process − defining your geometry, meshing, specifying your physics, solving, and then visualizing your results.
13 Feb 2020 VEP updated to version 99
VEP (Variant Effect Predictor) determines the effect of your variants (SNPs, insertions, deletions, CNVs or structural variants) on genes, transcripts, and protein sequence, as well as regulatory regions.
12 Feb 2020 mocat2 updated to version current
a package for analyzing metagenomics datasets
12 Feb 2020 Genome Browser updated to version 393
The Genome Browser Mirror Fragments is a mirror of the UCSC Genome Browser. The URL is Users can also access the MySQL databases, supporting files directly, and a huge number of associated executables.
11 Feb 2020 ORCA updated to version 4.2.1
ORCA is an ab initio, DFT, and semi-empirical SCF-MO package.
11 Feb 2020 encode-atac-seq-pipeline updated to version 1.6.1
This pipeline is designed for automated end-to-end quality control and processing of ATAC-seq or DNase-seq data.
10 Feb 2020 vt updated to version 0.57721
vt is a variant tool set that discovers short variants from Next Generation Sequencing data.
6 Feb 2020 diamond updated to version 0.9.30
DIAMOND is a new high-throughput program for aligning DNA reads or protein sequences against a protein reference database such as NR, at up to 20,000 times the speed of BLAST, with high sensitivity.
6 Feb 2020 crystfel updated to version 0.9.0
CrystFEL is a suite of programs for processing diffraction data acquired serially in a snapshot manner, such as when using the technique of Serial Femtosecond Crystallography (SFX) with a free-electron laser source.
6 Feb 2020 dashing updated to version 0.4.2
Fast and accurate genomic distances using HyperLogLog
6 Feb 2020 cmake updated to version 3.16.4
CMake is a family of tools designed to build, test and package software.
6 Feb 2020 hint updated to version 2.27
a computational method to detect CNVs and Translocations from Hi-C data.
5 Feb 2020 pdf2svg updated to version 0.2.3
A simple PDF to SVG converter using the Poppler and Cairo libraries.
4 Feb 2020 biom-format updated to version 2.1.8
tool (and library) to manipulate Biological Observation Matrix (BIOM) Format files
4 Feb 2020 baracus updated to version 1.1.4
Baracus predicts brain age, based on data from Freesurfer. It combines data from cortical thickness, cortical surface area, and subcortical information
3 Feb 2020 guppy updated to version 3.4.5
Local accelerated basecalling for Nanopore data
31 Jan 2020 busco updated to version 4.0.2
BUSCO completeness assessments employ sets of Benchmarking Universal Single-Copy Orthologs from OrthoDB ( to provide quantitative measures of the completeness of genome assemblies, annotated gene sets, and transcriptomes in terms of expected gene content.
31 Jan 2020 netpbm updated to version 10.86.8
Netpbm is a toolkit for manipulation of graphic images, including conversion of images between a variety of different formats. There are over 300 separate tools in the package including converters for about 100 graphics formats. Examples of the sort of image manipulation we're talking about are: Shrinking an image by 10%; Cutting the top half off of an image; Making a mirror image; Creating a sequence of images that fade from one image to another.
30 Jan 2020 Hail updated to version 0.2.31
Hail is an open-source, scalable framework for exploring and analyzing genomic data.
30 Jan 2020 vsearch updated to version 2.14.2
VSEARCH supports de novo and reference based chimera detection, clustering, full-length and prefix dereplication, rereplication, reverse complementation, masking, all-vs-all pairwise global alignment, exact and global alignment searching, shuffling, subsampling and sorting. It also supports FASTQ file analysis, filtering, conversion and merging of paired-end reads.
26 Jan 2020 BioGANs updated to version 20191230
BioGANs is a novel application of Generative Adversarial Networks (GAN) to the synthesis of cells imaged by fluorescence microscopy. It allows to infer the correlation between the spatial pattern of different fluorescent proteins that reflects important biological functions. The synthesized images capture these relationships, which are relevant for biological applications.
24 Jan 2020 deepsea updated to version 0.94c
DeepSEA is a deep learning-based algorithmic framework for predicting the chromatin effects of sequence alterations with single nucleotide sensitivity. DeepSEA can accurately predict the epigenetic state of a sequence, including transcription factors binding, DNase I sensitivities and histone marks in multiple cell types, and further utilize this capability to predict the chromatin effects of sequence variants and prioritize regulatory variants.
23 Jan 2020 gurobi updated to version 9.0.0
Gurobi is a mathematical optimization solver. It is a commercial product developed by On Biowulf, Gurobi is licensed for use by the members of the CDSL_Gurobi_users group only. It is installed in /data/CDSL_Gurobi_users and is not accessible by any other users. A token license server, running on Biowulf, manages the Gurobi license.
22 Jan 2020 repeatmodeler updated to version 2.0.1
RepeatModeler is a de novo transposable element (TE) family identification and modeling package. RepeatModeler assists in automating the runs of the various algorithms given a genomic database, clustering redundant results, refining and classifying the families and producing a high quality library of TE families suitable for use with RepeatMasker and ultimately for submission to the Dfam database (
22 Jan 2020 MAKER updated to version 2.31.10
MAKER is a portable and easily configurable genome annotation pipeline. Its purpose is to allow smaller eukaryotic and prokaryotic genome projects to independently annotate their genomes and to create genome databases. MAKER identifies repeats, aligns ESTs and proteins to a genome, produces ab-initio gene predictions and automatically synthesizes these data into gene annotations having evidence-based quality values.
21 Jan 2020 guidance updated to version 2.02
GUIDANCE is meant to be used for weighting, filtering or masking unreliably aligned positions in sequence alignments before subsequent analysis.
17 Jan 2020 Gaussian updated to version G16-C01
Gaussian is a connected system of programs for performing semiempirical and ab initio molecular orbital (MO) calculations.
16 Jan 2020 Xvfb updated to version 1.19.6
X virtual frame buffer.
16 Jan 2020 breseq updated to version 0.35.0
breseq is a computational pipeline for finding mutations relative to a reference sequence in short-read DNA re-sequencing data. It is intended for haploid microbial genomes (<20 Mb).
15 Jan 2020 LDpred updated to version 1.0.11
LDpred is a Python based software package that adjusts GWAS summary statistics for the effects of linkage disequilibrium (LD).
15 Jan 2020 medaka updated to version 0.11.4
medaka is a tool to create a consensus sequence from nanopore sequencing data. This task is performed using neural networks applied from a pileup of individual sequencing reads against a draft assembly.
15 Jan 2020 cromwell updated to version 48
A Workflow Management System geared towards scientific workflows.
13 Jan 2020 trinity updated to version 2.9.0
Trinity, developed at the Broad Institute and the Hebrew University of Jerusalem, represents a novel method for the efficient and robust de novo reconstruction of transcriptomes from RNA-seq data.
13 Jan 2020 cutadapt updated to version 2.8
cutadapt removes adapter sequences from DNA high-throughput sequencing data. This is usually necessary when the read length of the machine is longer than the molecule that is sequenced, such as in microRNA data.
10 Jan 2020 cgpBattenberg updated to version 3.5.3
Detect subclonality and copy number in matched NGS data
10 Jan 2020 freebayes updated to version 1.3.2
Bayesian haplotype-based polymorphism discovery and genotyping
10 Jan 2020 mosdepth updated to version 0.2.8
Fast BAM/CRAM depth calculation for WGS, exome, or targeted sequencing.
10 Jan 2020 scallop updated to version 0.10.4
Scallop is a reference-based transcript assembler.
10 Jan 2020 Connectome Workbench updated to version 1.4.2
Tools to browse, download, explore, and analyze data from the Human Connectome Project (HCP). Allows users to compare their own data to that of the HCP.
10 Jan 2020 bbtools updated to version 38.75
An extensive set of bioinformatics tools including bbmap (short read aligner), bbnorm (kmer based normalization), dedupe (deduplication and clustering of unaligned reads), reformat (formatting and trimming reads) and many more.
10 Jan 2020 atom updated to version 1.42.0
A hackable text editor for the 21st Century.
8 Jan 2020 Qt updated to version 5.14.0
Qt is a cross-platform application framework that is used for developing application software that can be run on various software and hardware platforms with little or no change in the underlying codebase, while still being a native application with native capabilities and speed.
8 Jan 2020 Schrodinger updated to version 2019.4
A limited number of Schrödinger applications are available on the Biowulf cluster through the Molecular Modeling Interest Group. Most are available through the Maestro GUI.
6 Jan 2020 PartekFlow updated to version
Web interface designed specifically for the analysis needs of next generation sequencing applications including RNA, small RNA, and DNA sequencing.
3 Jan 2020 pigz updated to version 2.4
pigz (parallel implementation of gzip) is a fully functional replacement for gzip that exploits multiple processors and multiple cores to the hilt when compressing data.
3 Jan 2020 metal updated to version 2018-08-28
The METAL software is designed to facilitate meta-analysis of large datasets (such as several whole genome scans) in a convenient, rapid and memory efficient manner.
2 Jan 2020 racon updated to version 1.4.3
Ultrafast consensus module for raw de novo genome assembly of long uncorrected reads.
31 Dec 2019 spades updated to version 3.14.0
SPAdes – St. Petersburg genome assembler – is intended for both standard isolates and single-cell MDA bacteria assemblies.
30 Dec 2019 cuDNN updated to version 7.6.5
The NVIDIA CUDA Deep Neural Network library (cuDNN) is a GPU-accelerated library of primitives for deep neural networks. cuDNN provides highly tuned implementations for standard routines such as forward and backward convolution, pooling, normalization, and activation layers.
30 Dec 2019 sambamba updated to version 0.7.1
Sambamba is a high performance modern robust and fast tool (and library), written in the D programming language, for working with SAM and BAM files. Current parallelised functionality is an important subset of samtools functionality, including view, index, sort, markdup, and depth.
30 Dec 2019 git updated to version 2.24.1
Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency.
30 Dec 2019 rgt updated to version 0.12.3
Regulatory Genomics Toolbox: Python library and set of tools for the integrative analysis of high throughput regulatory genomics data.
27 Dec 2019 presto updated to version 0.5.13
pRESTO performs all stages of raw sequence processing prior to alignment against reference germline sequences.
27 Dec 2019 qcat updated to version 1.1.0
qcat is Python command-line tool for demultiplexing Oxford Nanopore reads from FASTQ files.
27 Dec 2019 novocraft updated to version 4.01.00
Package includes aligner for single-ended and paired-end reads from the Illumina Genome Analyser. Novoalign finds global optimum alignments using full Needleman-Wunsch algorithm with affine gap penalties.
26 Dec 2019 minialign updated to version 0.6.0
fast and accurate alignment tool for PacBio and Nanopore long reads
26 Dec 2019 mega2 updated to version 6.0.0
Mega2 is a data-handling program for facilitating genetic linkage and association analyses.
26 Dec 2019 khmer updated to version 3.0.0a3
Library and suite of command line tools for working with short-read, DNA sequences, taking a k-mer-centric approach to sequence analysis.
23 Dec 2019 nccl updated to version 2.5.6
The NVIDIA Collective Communications Library (NCCL) implements multi-GPU and multi-node collective communication primitives that are performance optimized for NVIDIA GPUs. NCCL provides routines such as all-gather, all-reduce, broadcast, reduce, reduce-scatter, that are optimized to achieve high bandwidth over PCIe and NVLink high-speed interconnect.
23 Dec 2019 ImReP updated to version 0.8
ImReP is a novel computational method for rapid and accurate profiling of the adaptive immune repertoire from regular RNA-Seq data. It is able to efficiently extract TCR- and BCR-derived reads from RNA-Seq data. ImReP can also accurately assemble the complementary determining regions 3 (CDR3s), the most variable regions of B and T cell receptors, and determine their antigen specificity.
23 Dec 2019 meme updated to version 5.1.0
MEME is used to discover motifs in groups of DNA/protein sequences or databases.
23 Dec 2019 genrich updated to version 0.6
Genrich is a peak-caller for genomic enrichment assays (e.g. ChIP-seq, ATAC-seq). It analyzes alignment files generated following the assay and produces a file detailing peaks of significant enrichment.
23 Dec 2019 conpair updated to version 0.2
Concordance and contamination estimator for tumor–normal pairs
23 Dec 2019 pandoc updated to version 2.9.1
Pandoc is a Haskell library for converting from one markup format to another, and a command-line tool that uses this library.
20 Dec 2019 circexplorer2 updated to version 2.3.8
A combined strategy to identify circular RNAs (circRNAs and ciRNAs)
18 Dec 2019 Blast updated to version 2.10.0+
NCBI's well-known sequence database searching program which compares a nucleotide or protein query sequence against all sequences in a database.
17 Dec 2019 trimgalore updated to version 0.6.5
Consistent quality and adapter trimming for RRBS or standard FastQ files.
17 Dec 2019 umitools updated to version 1.0.1
tools for dealing with Unique Molecular Identifiers (UMIs)/Random Molecular Tags (RMTs) and single cell RNA-Seq cell barcodes
17 Dec 2019 ascatNgs updated to version 4.3.3
AscatNGS contains the Cancer Genome Projects workflow implementation of the ASCAT copy number algorithm for paired end sequencing.
17 Dec 2019 shapeit updated to version 4.1
SHAPEIT is a fast and accurate haplotype inference software
16 Dec 2019 mafft updated to version 7.453
Multiple alignment program for amino acid or nucleotide sequences
16 Dec 2019 GRNBoost updated to version 20191216
GRNBoost is a library built on top of Apache Spark that implements a scalable strategy for gene regulatory network (GRN) inference. GRNBoost was inspired by GENIE3, a popular algorithm for GRN inference. GRNBoost adopts GENIE3's algorithmic blueprint and aims at improving its runtime performance and data size capability.
13 Dec 2019 rclone updated to version 1.50.2
Rclone is a utility for synchronizing directories on a file-based storage system (e.g. /home or /data) with an object store such as Amazon S3. It uses the S3 protocol, and it can be used with the HPC object storage system.
13 Dec 2019 R updated to version 3.6.1
R (the R Project) is a language and environment for statistical computing and graphics. R is similar to S, and provides a wide variety of statistical and graphical techniques (linear and nonlinear modelling, statistical tests, time series analysis, classification, clustering, ...).
11 Dec 2019 dropest updated to version 0.8.6
Pipeline for estimating molecular count matrices for droplet-based single-cell RNA-seq measurements.
10 Dec 2019 scalpel updated to version 0.5.4
Bioinformatics pipeline for discovery of genetic variants from NGS reads.
10 Dec 2019 Maven updated to version 3.6.3
Apache Maven is a software project management and comprehension tool. Based on the concept of a project object model (POM), Maven can manage a project's build, reporting and documentation from a central piece of information.
10 Dec 2019 CCP4 updated to version 7.0.078
CCP4 is a suite of programs for protein crystallography and structural biology.
10 Dec 2019 LASTZ updated to version 1.04.03
LASTZ is a tool for (1) aligning two DNA sequences, and (2) inferring appropriate scoring parameters automatically. LASTZ is a drop-in replacement for BLASTZ, and is backward compatible with BLASTZ's command-line syntax.
5 Dec 2019 CUDA updated to version 10.2.89
CUDA is a parallel computing platform and programming model invented by NVIDIA. It enables dramatic increases in computing performance by harnessing the power of the graphics processing unit (GPU).
5 Dec 2019 spaceranger updated to version 1.0.0
10x pipeline for processing Visium spatial RNA-seq data
3 Dec 2019 cnvkit updated to version 0.9.7.b1
Copy number variant detection from targeted DNA sequencing
3 Dec 2019 Octave updated to version 5.1.0
GNU Octave is a high-level language, primarily intended for numerical computations. It provides a convenient command line interface for solving linear and nonlinear problems numerically, and for performing other numerical experiments using a language that is mostly compatible with Matlab.
2 Dec 2019 plink updated to version 2.0-dev-20191128
PLINK is whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.
2 Dec 2019 gem updated to version 3.4
High resolution peak calling and motif discovery for ChIP-seq and ChIP-exo data
Scientific Databases updated in last 3 months
For a full list of scientific databases available on the NIH HPC systems, see this page

Updated Database Format Type Location
25 Feb 2020NCBI Taxonomytaxonomy /fdb/taxonomy
16 Feb 2020Mouse Genome (Mus musculus) mm8MySQL NIH mirror of UCSC Genome Browser
16 Feb 2020NCBI ntBlast Nuc/fdb/blastdb/nt
16 Feb 2020NCBI nrBlast Prot/fdb/blastdb/nr
16 Feb 2020Protein Data BankBlast Prot/fdb/blastdb/pdbaa
12 Feb 2020Mouse Genome GRCm38.p6 proteinsBlast Prot/fdb/blastdb/GRCm38.p6.prot
12 Feb 2020Mouse Genome GRCm38.p6Blast Nuc/fdb/blastdb/GRCm38.p6
12 Feb 2020Human Genome GRCh38.p13 proteinsBlast Prot/fdb/blastdb/GRCh38.p13.prot
12 Feb 2020Human Genome GRCh38.p13Blast Nuc/fdb/blastdb/GRCh38.p13
12 Feb 2020Mouse Genome GRCm38.p6 proteinsFasta Prot/fdb/genome/GRCm38.p6
12 Feb 2020Mouse Genome GRCm38.p6Fasta Nuc/fdb/genome/GRCm38.p6
12 Feb 2020Human Genome hg19Fasta Nuc/fdb/genome/human-feb2009/
10 Feb 2020NCBI nrBlast_v4 Prot/fdb/blastdb/v4/nr
09 Feb 2020SwissProtBlast Prot/fdb/blastdb/swissprot
29 Jan 2020NCBI ntBlast_v4 Nuc/fdb/blastdb/v4/nt
21 Jan 2020ANNOVARANNOVAR /fdb/annovar/current
22 Dec 2019Rat Genome (Rattus norvegicus) rn5MySQL NIH mirror of UCSC Genome Browser
19 Dec 2019TCGA DREAM SMC synthetic dataBAM /fdb/DREAM/SMC
17 Dec 2019Protein Data BankFasta Nuc/fdb/fastadb/pdb.nt.fas
17 Dec 2019NCBI ntFasta Nuc/fdb/fastadb/nt.fas
17 Dec 2019MitoFasta Nuc/fdb/fastadb/mito.nt.fas
17 Dec 2019SwissProtFasta Prot/fdb/fastadb/swissprot.aa.fas
17 Dec 2019Protein Data BankFasta Prot/fdb/fastadb/pdb.aa.fas
17 Dec 2019MitoFasta Prot/fdb/fastadb/mito.aa.fas
17 Dec 2019NCBI nrFasta Prot/fdb/fastadb/nr.aa.fas
15 Dec 201916S MicrobialBlast_v4 Nuc/fdb/blastdb/v4/16SMicrobial
13 Dec 2019Protein Data BankBlast_v4 Prot/fdb/blastdb/v4/pdbaa
07 Dec 2019Protein Data BankBlast Nuc/fdb/blastdb/pdbnt