Biowulf High Performance Computing at the NIH
Application updates in the last 3 months
To see all versions available for any application, use module avail application_name
All centrally-installed applications are listed on the Applications page
Updated Application
19 Jan 2021 libgit2 updated to version 1.1.0
Libgit2 is an implementation of the GIT version control software protocols that is can be embedded in other software.
13 Jan 2021 MMARGE updated to version 1.0
MMARGE: Motif Mutation Analysis for Regulatory Genomic Elements
12 Jan 2021 seqkit updated to version 0.15.0
A cross-platform toolkit for FASTA/Q file manipulation
11 Jan 2021 cnvkit updated to version 0.9.8
Copy number variant detection from targeted DNA sequencing
8 Jan 2021 anvio updated to version 7
Anvi’o is an open-source, community-driven analysis and visualization platform for microbial ‘omics. It brings together many aspects of today’s cutting-edge strategies including genomics, metagenomics, metatranscriptomics, pangenomics, metapangenomics, phylogenomics, and microbial population genetics in an integrated and easy-to-use fashion through extensive interactive visualization capabilities.
7 Jan 2021 igblast updated to version 1.17.0
IgBlast is a sequence analysis tool for immunoglobulin variable domains.
7 Jan 2021 scratch1d updated to version 1.3
SCRATCH-1D is a suite of one-dimensional predictors included in the long-established and widely used SCRATCH suite of predictors developed by the Institute for Genomics and Bioinformatics (IGB) of the University of California, Irvine (UCI).
7 Jan 2021 delly updated to version 0.8.7
DELLY is an integrated structural variant prediction method that can detect deletions, tandem duplications, inversions and translocations at single-nucleotide resolution in short-read massively parallel sequencing data. It uses paired-ends and split-reads to sensitively and accurately delineate genomic rearrangements throughout the genome.
7 Jan 2021 mafft updated to version 7.475
Multiple alignment program for amino acid or nucleotide sequences
6 Jan 2021 pandoc updated to version
Pandoc is a Haskell library for converting from one markup format to another, and a command-line tool that uses this library.
6 Jan 2021 git updated to version 2.30.0
Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency.
6 Jan 2021 mriqc updated to version 0.16.0
MRIQC is an MRI quality control tool
4 Jan 2021 deepvariant updated to version 1.1.0
DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data.
4 Jan 2021 PartekFlow updated to version
Web interface designed specifically for the analysis needs of next generation sequencing applications including RNA, small RNA, and DNA sequencing.
3 Jan 2021 DeepHit updated to version 20201211
The DeepHit application uses a deep neural network to learn the distribution of survival times directly. DeepHit makes no assumptions about the underlying stochastic process and allows for the possibility that the relationship between covariates and risk(s) changes over time. Most importantly, DeepHit smoothly handles competing risks; i.e. settings in which there is more than one possible event of interest.
31 Dec 2020 cellranger updated to version 5.0.1
Cell Ranger is a set of analysis pipelines that processes Chromium single cell 3’ RNA-seq output to align reads, generate gene-cell matrices and perform clustering and gene expression analysis.
30 Dec 2020 screen updated to version 4.8.0
Screen is a full-screen window manager that multiplexes a physical terminal between several processes, typically interactive shells.
28 Dec 2020 crumble updated to version 0.8.3
Crumble is a tool to explore controlled loss of quality values for compressing CRAM files. Crumble can read a SAM/BAM/CRAM file, compute which confidence values to keep and which to omit, and emit a new file with most qualities removed.
21 Dec 2020 Huygens updated to version 20.10
Huygens is an image restoration, deconvolution, resolution and noise reduction. It can process images from all current optical microscopes, including wide-field, confocal, Nipkow (scanning disk confocal), multiple-photon, and 4Pi microscopes.
21 Dec 2020 parallel updated to version 20201222
GNU parallel is a shell tool for executing jobs in parallel using one or more computers.
18 Dec 2020 megalodon updated to version 2.2.9
Megalodon is a research command line tool to extract high accuracy modified base and sequence variant calls from raw nanopore reads by anchoring the information rich basecalling neural network output to a reference genome/transcriptome.
18 Dec 2020 cactus updated to version 1.2.3
Cactus is a reference-free whole-genome multiple alignment program.
17 Dec 2020 ncbi-vdb updated to version 2.10.9
The SRA Toolkit and SDK from NCBI is a collection of tools and libraries for using data in the INSDC Sequence Read Archives.
17 Dec 2020 ncbi-ngs updated to version 2.10.9
NCBI's NGS is a new, domain-specific API for accessing reads, alignments and pileups produced from Next Generation Sequencing
17 Dec 2020 hisat updated to version
HISAT is a fast and sensitive spliced alignment program which uses Hierarchical Indexing for Spliced Alignment of Transcripts.
17 Dec 2020 sratoolkit updated to version 2.10.9
The NCBI SRA Toolkit enables reading ("dumping") of sequencing files from the SRA database and writing ("loading") files into the .sra format.
17 Dec 2020 ascatNgs updated to version 4.5.0
AscatNGS contains the Cancer Genome Projects workflow implementation of the ASCAT copy number algorithm for paired end sequencing.
15 Dec 2020 diamond updated to version 2.0.5
DIAMOND is a new high-throughput program for aligning DNA reads or protein sequences against a protein reference database such as NR, at up to 20,000 times the speed of BLAST, with high sensitivity.
15 Dec 2020 Hail updated to version 0.2.61
Hail is an open-source, scalable framework for exploring and analyzing genomic data.
14 Dec 2020 sonicparanoid updated to version 1.3.5
A stand-alone software tool for the identification of orthologous relationships among multiple species.
14 Dec 2020 libjpeg-turbo updated to version 2.0.6
libjpeg-turbo is a JPEG image codec that uses SIMD instructions (MMX, SSE2, NEON) to accelerate baseline JPEG compression and decompression on x86, x86-64, and ARM systems.
14 Dec 2020 cellphonedb updated to version 2.1.4
A publicly available repository of curated receptors, ligands and their interactions. Subunit architecture is included for both ligands and receptors, representing heteromeric complexes accurately.
11 Dec 2020 rnaseqc updated to version 2.3.6
RNA-SeQC is a java program which computes a series of quality control metrics for RNA-seq data.
11 Dec 2020 crispresso updated to version 2.0.44
Software pipeline for the analysis of CRISPR-Cas9 genome editing outcomes from sequencing data
11 Dec 2020 biom-format updated to version 2.1.10
tool (and library) to manipulate Biological Observation Matrix (BIOM) Format files
11 Dec 2020 humann updated to version 3.0.0-alpha.3
HUMAnN is a pipeline for efficiently and accurately profiling the presence/absence and abundance of microbial pathways in a community from metagenomic or metatranscriptomic sequencing data (typically millions of short DNA/RNA reads).
11 Dec 2020 CCP4 updated to version 7.1.009
CCP4 is a suite of programs for protein crystallography and structural biology.
10 Dec 2020 chipseq_pipeline updated to version 1.6.1
AQUAS Transcription Factor and Histone ChIP-Seq processing pipeline. The AQUAS pipeline is based off the ENCODE (phase-3) transcription factor and histone ChIP-seq pipeline specifications (by Anshul Kundaje)
10 Dec 2020 Emacs updated to version 27.1
Emacs is a text and source code editor for text terminals and X. It has a vast set of features and is well suited for doing everything from reading mail and simple text editing to managing and editing large programming projects. It has its own help and tutorial which can be accessed by typing Ctrl-h i and Ctrl-h t respectively. Type emacs [filename] to edit a file. For more info, see here.
10 Dec 2020 lofreq updated to version 2.1.5
LoFreq is a fast and sensitive variant-caller for inferring SNVs and indels from next-generation sequencing data.
10 Dec 2020 exomiser updated to version 12.1.0
The Exomiser is a Java program that functionally annotates variants from whole-exome sequencing data starting from a VCF file.
10 Dec 2020 telseq updated to version 0.0.2
TelSeq is a software that estimates telomere length from whole genome sequencing data (BAMs).
10 Dec 2020 unafold updated to version 4.0
UNAFold is a comprehensive software package for nucleic acid folding and hybridization prediction.
9 Dec 2020 bwa-mem2 updated to version 2-2.1
The next version of the bwa-mem algorithm in bwa.
8 Dec 2020 metamorpheus updated to version 2020.11.23
MetaMorpheus is a bottom-up proteomics database search software with integrated post-translational modification (PTM) discovery capability.
7 Dec 2020 singularity updated to version 3.7.1
Singularity is a container platform focused on supporting ``Mobility of Compute``. It allows users to emulate, and share custom Linux environments allowing for the creation of self-contained development stacks.
7 Dec 2020 metaphlan updated to version 3.0.6
MetaPhlAn is a computational tool for profiling the composition of microbial communities (Bacteria, Archaea, Eukaryotes and Viruses) from metagenomic shotgun sequencing data (i.e. not 16S) with species-level. With the newly added StrainPhlAn module, it is now possible to perform accurate strain-level microbial profiling.
2 Dec 2020 sambamba updated to version 0.8.0
Sambamba is a high performance modern robust and fast tool (and library), written in the D programming language, for working with SAM and BAM files. Current parallelised functionality is an important subset of samtools functionality, including view, index, sort, markdup, and depth.
2 Dec 2020 ExpansionHunterDenovo updated to version 0.9.0
ExpansionHunter Denovo (EHdn) is a suite of tools for detecting novel expansions of short tandem repeats (STRs). EHdn is intended for analysis of a collection of BAM/CRAM files containing alignments of short (100-200bp) reads.
2 Dec 2020 MELT updated to version 2.2.2
MELT is an application for identifying mobile elements in genomic data
2 Dec 2020 VEP updated to version 102.0
VEP (Variant Effect Predictor) determines the effect of your variants (SNPs, insertions, deletions, CNVs or structural variants) on genes, transcripts, and protein sequence, as well as regulatory regions.
2 Dec 2020 ExpansionHunter updated to version 4.0.1
Expansion Hunter: a tool for estimating repeat sizes. There are a number of regions in the human genome consisting of repetitions of short unit sequence (commonly a trimer). Such repeat regions can expand to a size much larger than the read length and thereby cause a disease. Expansion Hunter aims to estimate sizes of such repeats by performing a targeted search through a BAM/CRAM file for reads that span, flank, and are fully contained in each repeat.
27 Nov 2020 cnvnator updated to version 0.4.1
CNVnator is a tool for CNV discovery and genotyping from depth of read mapping.
27 Nov 2020 salmon updated to version 1.4.0
a tool for quantifying the expression of transcripts using RNA-seq data.
25 Nov 2020 rsem updated to version 1.3.3
RSEM is a software package for estimating gene and isoform expression levels from RNA-Seq data.
25 Nov 2020 viennarna updated to version 2.4.17
RNA Secondary Structure Prediction and Comparison
25 Nov 2020 Comsol updated to version 56
The COMSOL Multiphysics engineering simulation software environment facilitates all steps in the modeling process − defining your geometry, meshing, specifying your physics, solving, and then visualizing your results.
25 Nov 2020 bowtie2 updated to version 2.4.2
A version of bowtie that's particularly good at aligning reads of about 50 up to 100s or 1,000s of characters, and particularly good at aligning to relatively long (e.g. mammalian) genomes
25 Nov 2020 bismark updated to version 0.23.0
Bismark is a program to map bisulfite treated sequencing reads to a genome of interest and perform methylation calls in a single step. The output can be easily imported into a genome viewer, such as SeqMonk, and enables a researcher to analyse the methylation levels of their samples straight away.
25 Nov 2020 nextflow updated to version 20.10.0
Data-driven computational pipelines
24 Nov 2020 philosopher updated to version 3.3.12
Philosopher is fast, easy-to-use, scalable, and versatile data analysis software for mass spectrometry-based proteomics. Philosopher is dependency-free and can analyze both traditional database searches and open searches for post-translational modification (PTM) discovery.
24 Nov 2020 fragpipe updated to version 14.0
FragPipe is a Java Graphical User Interface (GUI) for a suite of computational tools enabling comprehensive analysis of mass spectrometry-based proteomics data. It is powered by MSFragger.
24 Nov 2020 msfragger updated to version 3.1.1
An ultrafast database search tool for peptide identification in mass spectrometry-based proteomics.
23 Nov 2020 Rosetta updated to version 2020.46
The Rosetta++ software suite can perform de novo protein structure predictions, identify low free energy sequences for target protein backbones, predict the structure of a protein-protein complex from the individual structures of the monomer components, incorporate NMR data into the basic Rosetta protocol to accelerate the process of NMR structure prediction, and more...
23 Nov 2020 Coot updated to version 0.9.3
Coot is for macromolecular model building, model completion and validation, particularly suitable for protein modelling using X-ray data.
23 Nov 2020 vcf2maf updated to version 1.6.19
A smarter, more reproducible, and more configurable tool for converting a VCF to a MAF.
23 Nov 2020 patchelf updated to version 0.12
patchelf is a small utility to modify the dynamic linker and RPATH of ELF executables.
19 Nov 2020 Scipion updated to version 3.0.6
Scipion is an image processing framework to obtain 3D models of macromolecular complexes using Electron Microscopy (3DEM). It integrates several software packages and presents an unified interface for both biologists and developers. Scipion allows to execute workflows combining different software tools, while taking care of formats and conversions. Additionally, all steps are tracked and can be reproduced later on.
19 Nov 2020 crystfel updated to version 0.9.1
CrystFEL is a suite of programs for processing diffraction data acquired serially in a snapshot manner, such as when using the technique of Serial Femtosecond Crystallography (SFX) with a free-electron laser source.
18 Nov 2020 seqlinkage updated to version 1.0
SEQLinkage implements a collapsed haplotype pattern (CHP) method to generate markers from sequence data for linkage analysis.
18 Nov 2020 fastqtools updated to version 0.8.3
fastq-tools a collection of small and efficient programs for performing some common and uncommon tasks with FASTQ files.
17 Nov 2020 cutadapt updated to version 3.0
cutadapt removes adapter sequences from DNA high-throughput sequencing data. This is usually necessary when the read length of the machine is longer than the molecule that is sequenced, such as in microRNA data.
13 Nov 2020 Cytoscape updated to version 3.8.2
Cytoscape is an open source software platform for visualizing molecular interaction networks and biological pathways and integrating these networks with annotations, gene expression profiles and other state data.
11 Nov 2020 medaka updated to version 1.2.0
medaka is a tool to create a consensus sequence from nanopore sequencing data. This task is performed using neural networks applied from a pileup of individual sequencing reads against a draft assembly.
10 Nov 2020 ReLeaSE updated to version 20200516
ReLeaSE (Reinforcement Learning for Structural Evolution) is an application for de-novo Drug Design based on Reinforcement Learning. It integrates two deep neural networks: generative and predictive, that are trained separately but are used jointly to generate novel targeted chemical libraries. ReLeaSE uses simple representation of molecules by their simplified molecular input line entry specification (SMILES) strings only.
9 Nov 2020 datalad updated to version 0.13.0rc2
Datalad is a tool for uploading and downloading public up-t-to-date neuroimaging datasets.
9 Nov 2020 fitlins updated to version 0.8.0
Fitlins fits linear models to BIDS neuroimaging datasets.
9 Nov 2020 RELION updated to version 3.1.1
RELION (for REgularised LIkelihood OptimisatioN) is a stand-alone computer program for Maximum A Posteriori refinement of (multiple) 3D reconstructions or 2D class averages in cryo-electron microscopy.
9 Nov 2020 Chimera updated to version 1.15.0
Chimera is a highly extensible program for interactive visualization and analysis of molecular structures and related data, including density maps, supramolecular assemblies, sequence alignments, docking results, trajectories, and conformational ensembles.
6 Nov 2020 ChromHMM updated to version 1.22
ChromHMM is software for learning and characterizing chromatin states.
5 Nov 2020 Blast updated to version 2.11.0+
NCBI's well-known sequence database searching program which compares a nucleotide or protein query sequence against all sequences in a database.
3 Nov 2020 R updated to version 4.0.3
R (the R Project) is a language and environment for statistical computing and graphics. R is similar to S, and provides a wide variety of statistical and graphical techniques (linear and nonlinear modelling, statistical tests, time series analysis, classification, clustering, ...).
2 Nov 2020 MySQL updated to version 8.0.22
MySQL is an open-source relational database management system.
2 Nov 2020 hhsuite updated to version 3.3.0
The HH-suite is an open-source software package for sensitive protein sequence searching based on the pairwise alignment of hidden Markov models (HMMs).
2 Nov 2020 MotionCor2 updated to version 1.4.0
MotionCor2 is a multi-GPU accelerated program that provides iterative, patch-based motion detection combining spatial and temporal constraints and dose weighting for both single particle and tomographic cryo-electon microscopy images.
2 Nov 2020 ctffind updated to version 4.1.14
Programs for finding CTFs of electron micrographs
2 Nov 2020 netpbm updated to version 10.86.17
Netpbm is a toolkit for manipulation of graphic images, including conversion of images between a variety of different formats. There are over 300 separate tools in the package including converters for about 100 graphics formats. Examples of the sort of image manipulation we're talking about are: Shrinking an image by 10%; Cutting the top half off of an image; Making a mirror image; Creating a sequence of images that fade from one image to another.
29 Oct 2020 guppy updated to version 4.2.2
Local accelerated basecalling for Nanopore data
28 Oct 2020 STAR updated to version 2.7.6a
Spliced Transcripts Alignment to a Reference
28 Oct 2020 MEGAN updated to version 6_20_8
MEtaGenome ANalyzer that takes a file of reads and a Blast output from comparison against a reference genome, and automatically calculate a taxonomic classification of the reads and if desired, a functional classification.
27 Oct 2020 Scramble updated to version 1.0.1
Scramble is a mobile element insertion (MEI) detection tool. It identifies clusters of soft clipped reads in a BAM file, builds consensus sequences, aligns to representative L1Ta, AluYa5, and SVA-E sequences, and outputs MEI calls.
27 Oct 2020 jvarkit updated to version 20200713
Java tools for bioinformatics
26 Oct 2020 bbtools updated to version 38.87
An extensive set of bioinformatics tools including bbmap (short read aligner), bbnorm (kmer based normalization), dedupe (deduplication and clustering of unaligned reads), reformat (formatting and trimming reads) and many more.
26 Oct 2020 topaz updated to version 0.2.4
topaz is a pipeline for particle detection in cryo-electron microscopy images using convolutional neural networks trained from positive and unlabeled examples. Topaz also includes methods for micrograph and tomogram denoising using deep denoising models.
26 Oct 2020 iqtree updated to version 2.1.2
Efficient phylogenomic software by maximum likelihood
26 Oct 2020 cromwell updated to version 53.1
A Workflow Management System geared towards scientific workflows.
23 Oct 2020 rclone updated to version 1.53.1
Rclone is a utility for synchronizing directories on a file-based storage system (e.g. /home or /data) with an object store such as Amazon S3. It uses the S3 protocol, and it can be used with the HPC object storage system.
Scientific Databases updated in last 3 months
For a full list of scientific databases available on the NIH HPC systems, see this page

Updated Database Format Location
19 Jan 2021NCBI Taxonomytaxonomy/fdb/taxonomy
18 Jan 2021BetacoronavirusBlast/fdb/blastdb/Betacoronavirus
12 Jan 2021NCBI nrBlast/fdb/blastdb/nr
12 Jan 2021Protein Data BankBlast/fdb/blastdb/pdbaa
12 Jan 2021SwissProtBlast/fdb/blastdb/swissprot
10 Jan 2021NCBI ntBlast/fdb/blastdb/nt
05 Jan 2021Protein Data BankBlast/fdb/blastdb/pdbnt
23 Dec 2020UniclustAnnotations/fdb/hhsuite
15 Dec 2020SwissProtFasta/fdb/fastadb/swissprot.aa.fas
15 Dec 2020Protein Data BankFasta/fdb/fastadb/pdb.aa.fas
06 Dec 2020Mouse Genome (Mus musculus) mm8MySQLNIH mirror of UCSC Genome Browser
06 Dec 2020Chicken Genome (Gallus gallus) MySQLNIH mirror of UCSC Genome Browser
18 Nov 2020ANNOVARANNOVAR/fdb/annovar/current
18 Nov 2020Human Genome hg19Fasta/fdb/genome/human-feb2009/
25 Oct 2020Rat Genome (Rattus norvegicus) rn4MySQLNIH mirror of UCSC Genome Browser
25 Oct 2020Dog Genome (Canis familiaris)MySQLNIH mirror of UCSC genome browser