Biowulf High Performance Computing at the NIH
Application updates in the last 3 months
To see all versions available for any application, use module avail application_name
All centrally-installed applications are listed on the Applications page
Updated Application
7 Aug 2020 fastsurfer updated to version c5e9677
Fastsurfer is a neuroimaging pipeline based on deep learning.
7 Aug 2020 Freesurfer updated to version 7.1.1
Freesurfer is a set of automated tools for reconstruction of the brain's cortical surface from structural MRI data, and overlay of functional MRI data onto the reconstructed surface.
6 Aug 2020 PartekFlow updated to version 9.0.20.0804
Web interface designed specifically for the analysis needs of next generation sequencing applications including RNA, small RNA, and DNA sequencing.
4 Aug 2020 mrtrix updated to version 3.0.1
MRtrix provides a large suite of tools for image processing, analysis and visualisation, with a focus on the analysis of white matter using diffusion-weighted MRI.
3 Aug 2020 Spinal Cord Toolbox (SCT) updated to version 4.3
Spinal Cord Toolbox (SCT), a comprehensive software dedicated to the processing of spinal cord MRI data. SCT builds on previously-validated methods and includes state-ofthe-art MRI templates and atlases of the spinal cord, algorithms to segment and register new data to the templates, and motion correction methods for diffusion and functional time series.
3 Aug 2020 VEP updated to version 100.4
VEP (Variant Effect Predictor) determines the effect of your variants (SNPs, insertions, deletions, CNVs or structural variants) on genes, transcripts, and protein sequence, as well as regulatory regions.
29 Jul 2020 regenie updated to version 20200624
regenie is a C++ program for whole genome regression modelling of large genome-wide association studies. It is developed and supported by a team of scientists at the Regeneron Genetics Center. regenie employs the BGEN library.
29 Jul 2020 git updated to version 2.28.0
Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency.
29 Jul 2020 pandoc updated to version 2.10.1
Pandoc is a Haskell library for converting from one markup format to another, and a command-line tool that uses this library.
29 Jul 2020 FLAIR updated to version 20200707
FLAIR (Full-Length Alternative Isoform analysis of RNA) is a workflow leveraging the full-length transcript sequencing data that nanopore affords. It uses multiple alignment steps and splice site filters to increase confidence in the set of isoforms defined from noisy data.
28 Jul 2020 m2clust updated to version 1.1.3
m2clust provides an elegant clustering approach to find clusters in data sets with different density and resolution.
27 Jul 2020 spacy updated to version 2.2.3
advanced Natural Language Processing (NLP) in Python
27 Jul 2020 bwa-mem2 updated to version 2-2.0
The next version of the bwa-mem algorithm in bwa.
27 Jul 2020 diamond updated to version 2.0.0
DIAMOND is a new high-throughput program for aligning DNA reads or protein sequences against a protein reference database such as NR, at up to 20,000 times the speed of BLAST, with high sensitivity.
24 Jul 2020 bowtie updated to version 1.3.0
bowtie is an ultrafast, memory-efficient short read aligner geared toward quickly aligning large sets of short DNA sequences (reads) to large genomes.
23 Jul 2020 fmriprep updated to version 20.1.1
A Robust Preprocessing Pipeline for fMRI Data
23 Jul 2020 parallel updated to version 20200722
GNU parallel is a shell tool for executing jobs in parallel using one or more computers.
23 Jul 2020 Genome Browser updated to version 401
The Genome Browser Mirror Fragments is a mirror of the UCSC Genome Browser. The URL is https://hpcnihapps.cit.nih.gov/genome. Users can also access the MySQL databases, supporting files directly, and a huge number of associated executables.
23 Jul 2020 datamash updated to version 1.7
GNU datamash is a command-line program which performs basic numeric,textual and statistical operations on input textual data files.
23 Jul 2020 cromwell updated to version 52
A Workflow Management System geared towards scientific workflows.
23 Jul 2020 xengsort updated to version 28762aac
A fast xenograft read sorter based on space-efficient k-mer hashing.
22 Jul 2020 GATK updated to version 4.1.8.1
GATK, from the Broad Institute, is a structured software library that makes writing efficient analysis tools using next-generation sequencing data very easy, and second it's a suite of tools for working with human medical resequencing projects such as 1000 Genomes and The Cancer Genome Atlas. These tools include things like a depth of coverage analyzers, a quality score recalibrator, a SNP/indel caller and a local realigner.
16 Jul 2020 B-SOiD updated to version 1.3
B-SOiD (Behavioral Segmentation in Deeplabcut) is an unsupervised learning algorithm that serves to discover and classify behaviors that are not pre-defined by users. It segregates statistically different, sub-second rodent behaviors with a single bottom-up perspective video cameraR by performing a novel expectation maximization fitting of Gaussian mixture models on t-Distributed Stochastic Neighbor Embedding (t-SNE).
14 Jul 2020 TeraStitcher updated to version 1.11.10
TeraStitcher is a free tool that enables the stitching of Teravoxel-sized tiled microscopy images even on workstations with relatively limited resources of memory (<8 GB) and processing power.
14 Jul 2020 ctk updated to version 1.1.3
The CLIP Tool Kit (CTK) is a software package that provides a set of tools for analysis of CLIP data starting from the raw reads generated by the sequencer.
14 Jul 2020 freec updated to version 11.6
Control-FREEC is a tool for detection of copy-number changes and allelic imbalances (including LOH) using deep-sequencing data
14 Jul 2020 morgan updated to version 3.4
MORGAN (Monte Carlo Genetic Analysis)
13 Jul 2020 spaceranger updated to version 1.1.0
10x pipeline for processing Visium spatial RNA-seq data
13 Jul 2020 snakemake updated to version 5.19.0
Snakemake aims to reduce the complexity of creating workflows by providing a fast and comfortable execution environment, together with a clean and modern domain specific specification language (DSL) in python style. It is well suited for bioinformatic workflows.
13 Jul 2020 seqkit updated to version 0.13.2
A cross-platform toolkit for FASTA/Q file manipulation
10 Jul 2020 cellranger updated to version 4.0.0
Cell Ranger is a set of analysis pipelines that processes Chromium single cell 3’ RNA-seq output to align reads, generate gene-cell matrices and perform clustering and gene expression analysis.
8 Jul 2020 gffcompare updated to version 0.11.6
gffcompare can be used to compare and evaluate the accuracy of RNA-Seq transcript assemblers (Cufflinks, Stringtie). It can collapse (merge) duplicate transcripts from multiple GTF/GFF3 files (e.g. resulted from assembly of different samples) and classify transcripts from one or multiple GTF/GFF3 files as they relate to reference transcripts provided in a annotation file (also in GTF/GFF3 format).
7 Jul 2020 svtk updated to version 0.1
Utilities for consolidating, filtering, resolving, and annotating structural variants.
7 Jul 2020 ROSE updated to version 20200707
ROSE (Rank Ordering of Super-Enhancers) is tool for (1) creating stitched enhancers, and (2) separating super-enhancers from typical enhancers. given sequencing data (.bam) and a file of previously identified constituent enhancers (.gff)
6 Jul 2020 ChromHMM updated to version 1.21
ChromHMM is software for learning and characterizing chromatin states.
6 Jul 2020 nvchecker updated to version 1.7
nvchecker (short for new version checker) is for checking if a new version of some software has been released.
5 Jul 2020 smrtanalysis updated to version 9.0
SMRT® Analysis is a bioinformatics software suite available for analysis of DNA sequencing data from Pacific Biosciences’ SMRT technology. Users can choose from a variety of analysis protocols that utilize PacBio® and third-party tools. Analysis protocols include de novo genome assembly, cDNA mapping, DNA base-modification detection, and long-amplicon analysis to determine phased consensus sequences.
5 Jul 2020 nodejs updated to version 12.18.2
Node.js is a JavaScript runtime built on Chrome's V8 JavaScript engine. module name: nodejs
5 Jul 2020 nasm updated to version 2.15.02
asssembler/disassembler for the intel x86 architecture
4 Jul 2020 RELION updated to version 3.1.0
RELION (for REgularised LIkelihood OptimisatioN) is a stand-alone computer program for Maximum A Posteriori refinement of (multiple) 3D reconstructions or 2D class averages in cryo-electron microscopy.
29 Jun 2020 hisat updated to version 2.2.1.0-ngs2.10.8
HISAT is a fast and sensitive spliced alignment program which uses Hierarchical Indexing for Spliced Alignment of Transcripts.
29 Jun 2020 sratoolkit updated to version 2.10.8
The NCBI SRA Toolkit enables reading ("dumping") of sequencing files from the SRA database and writing ("loading") files into the .sra format.
29 Jun 2020 ncbi-vdb updated to version 2.10.8
The SRA Toolkit and SDK from NCBI is a collection of tools and libraries for using data in the INSDC Sequence Read Archives.
29 Jun 2020 ncbi-ngs updated to version 2.10.8
NCBI's NGS is a new, domain-specific API for accessing reads, alignments and pileups produced from Next Generation Sequencing
29 Jun 2020 star-seqr updated to version 0.6.7
Star-seqr is a tool for identifying gene fusions.
29 Jun 2020 arriba updated to version 1.2.0
Arriba identifies gene fusions in RNA-Seq data. It also can detect other structural variants in genomic data, such as intron duplications and gene truncations.
29 Jun 2020 mdtraj updated to version 1.9.4
MDTraj is a python library that allows users to manipulate molecular dynamics (MD) trajectories and perform a variety of analyses, including fast RMSD, solvent accessible surface area, hydrogen bonding, etc.
26 Jun 2020 augustus updated to version 3.3.3
AUGUSTUS is a program that predicts genes in eukaryotic genomic sequences.
25 Jun 2020 Huygens updated to version 20.04
Huygens is an image restoration, deconvolution, resolution and noise reduction. It can process images from all current optical microscopes, including wide-field, confocal, Nipkow (scanning disk confocal), multiple-photon, and 4Pi microscopes.
25 Jun 2020 bamgineer updated to version 2
Bamgineer introduces simulated allele-specific copy number variants into exome and targeted sequence data sets
23 Jun 2020 vg updated to version 1.25.0
Tools for working with genome variation graphs
23 Jun 2020 golang updated to version 1.14.4
The Go programming language
22 Jun 2020 OpenMPI updated to version 4.0.4
OpenMPI is a popular implementation of Ethernet MPI with very active support and development.
18 Jun 2020 gdc-client updated to version 1.5.0
The GDC Data Transfer Tool provides an optimized method of transferring data to and from the GDC, and enables resumption of interrupted transfers.
18 Jun 2020 SurvivalGWAS_SV updated to version 1.3.2
SurvivalGWAS_SV is an easy to use software that is able to handle large scale genome-wide data, allowing for imputed genotypes by modelling time to event outcomes under a dosage model. The software can adjust for multiple covariates and incorporate SNP-covariate interaction effects.
17 Jun 2020 gephi updated to version 0.9.2
gephi is a scientific application for computation of a number of metrics in graph/network analysis.
16 Jun 2020 pychopper updated to version 2.4.0
Pychopper v2 is a tool to identify, orient and trim full-length Nanopore cDNA reads. The tool is also able to rescue fused reads.
16 Jun 2020 atom updated to version 1.46.0
A hackable text editor for the 21st Century.
15 Jun 2020 mafft updated to version 7.467
Multiple alignment program for amino acid or nucleotide sequences
15 Jun 2020 metaphlan updated to version 3.0
MetaPhlAn is a computational tool for profiling the composition of microbial communities (Bacteria, Archaea, Eukaryotes and Viruses) from metagenomic shotgun sequencing data (i.e. not 16S) with species-level. With the newly added StrainPhlAn module, it is now possible to perform accurate strain-level microbial profiling.
12 Jun 2020 vasttools updated to version 2.5.0
A toolset for profiling alternative splicing events in RNA-Seq data.
12 Jun 2020 qsiprep updated to version 0.8.0
qsiprep configures pipelines for processing diffusion-weighted MRI (dMRI) data.
11 Jun 2020 selscan updated to version 1.3.0
selscan is a tool for haplotype-based scans to detect natural selection, which are useful to identify recent or ongoing positive selection in genomes. It is an efficient multithreaded application that implements Extended Haplotype Homozygosity (EHH), Integrated Haplotype Score (iHS), and Cross-population EHH (XPEHH). selscan accepts phased genotypes in multiple formats, including TPED.
11 Jun 2020 sidesplitter updated to version 1.2
Sidesplitter reduces over-fitting in both idealised and experimental settings, while maintaining independence between the two sides of a split refinement. It can improve the final resolution in refinements of structures prone to severe over-fitting, such as membrane proteins in detergent micelles.
10 Jun 2020 Perl updated to version 5.24.3
Perl is a highly capable, feature-rich programming language with over 23 years of development.
10 Jun 2020 tesseract updated to version 4.1.1
tesseract is a open source commercial quality OCR engine developed at Google.
10 Jun 2020 spm12 updated to version 7870
The (S)tatistical (P)ara(M)etric application analyzes brain imaging data.
9 Jun 2020 CCP4 updated to version 7.1.001
CCP4 is a suite of programs for protein crystallography and structural biology.
4 Jun 2020 sonicparanoid updated to version 1.3.2
A stand-alone software tool for the identification of orthologous relationships among multiple species.
4 Jun 2020 sicer updated to version 2-1.0.2
A clustering approach for identification of enriched domains from histone modification ChIP-Seq data
4 Jun 2020 Gromacs updated to version 2020.2
Gromacs is a versatile package to perform molecular dynamics, i.e. simulate the Newtonian equations of motion for systems with hundreds to millions of particles. It is primarily designed for biochemical molecules like proteins and lipids that have a lot of complicated bonded interactions, but since GROMACS is extremely fast at calculating the nonbonded interactions (that usually dominate simulations) many groups are also using it for research on non-biological systems, e.g. polymers.
3 Jun 2020 multiqc updated to version 1.9
aggregates results for various frequently used bioinformatics tools across multiple samples into a nice visual report
2 Jun 2020 xcpengine updated to version 1.2.1
xpcEngine performs denoising and estimation of Functional Connectivity on fMRI datasets
2 Jun 2020 rilseq updated to version 0.75
RILseq computational protocol
2 Jun 2020 PRSice updated to version 2.3.1
PRSice is a Polygenic Risk Score software for calculating, applying, evaluating and plotting the results of polygenic risk scores (PRS) analyses.
1 Jun 2020 fusioninspector updated to version 2.3.0
In silico Validation of Fusion Transcript Predictions
1 Jun 2020 HMMRATAC updated to version 1.2.10
HMMRATAC peak caller for ATAC-seq data
1 Jun 2020 Hail updated to version 0.2.43
Hail is an open-source, scalable framework for exploring and analyzing genomic data.
1 Jun 2020 stringtie updated to version 2.1.4
StringTie is a fast and highly efficient assembler of RNA-Seq alignments into potential transcripts. It is primarily a genome-guided transcriptome assembler, although it can borrow algorithmic techniques from de novo genome assembly to help with transcript assembly.
1 Jun 2020 vscode updated to version 1.45.1
Free source code editor with many utilities for Python, Julia and others.
1 Jun 2020 cutadapt updated to version 2.10
cutadapt removes adapter sequences from DNA high-throughput sequencing data. This is usually necessary when the read length of the machine is longer than the molecule that is sequenced, such as in microRNA data.
28 May 2020 ChIPseeqer updated to version 2.1
ChIPseeqer is an integrative, comprehensive, fast and user-friendly computational framework for in-depth analysis of ChIP-seq datasets. It combinse several computational tools in order to create easily customized workflows that can be adapted to the user’s needs and objectives.
28 May 2020 TelomereHunter updated to version 1.1.0
TelomereHunter is a software for the detailed characterization of telomere maintenance mechanism footprints in the genome. The tool is implemented for the analysis of large cancer genome cohorts and provides a variety of diagnostic diagrams as well as machine-readable output for subsequent analysis.
26 May 2020 rscape updated to version 1.5.2
RNA Significant Covariation Above Phylogenetic Expectation is a program that given a multiple sequence alignment of RNA sequences
26 May 2020 boost updated to version 1.73
Boost provides free peer-reviewed portable C++ source libraries. Boost libraries are intended to be widely useful, and usable across a broad spectrum of applications.
21 May 2020 EVcouplings updated to version 0.0.5
Helps to predict protein structure, function and mutations using evolutionary sequence covariation.
21 May 2020 intarna updated to version 3.2.0
IntaRNA is a program for the fast and accurate prediction of interactions between two RNA molecules.
21 May 2020 rnaview updated to version current
The RNAView program generates 2-dimensional displays of RNA/DNA secondary structures with tertiary interactions.
21 May 2020 rnastructure updated to version 6.2
RNAstructure is a complete package for RNA and DNA secondary structure prediction and analysis. It includes algorithms for secondary structure prediction, including facility to predict base pairing probabilities. It also can be used to predict bimolecular structures and can predict the equilibrium binding affinity of an oligonucleotide to a structured RNA target.
20 May 2020 vcflib updated to version 1.0.1
a simple C++ library for parsing and manipulating VCF files, + many command-line utilities
20 May 2020 slamdunk updated to version 0.4.3
SlamDunk is a novel, fully automated software tool for automated, robust, scalable and reproducible SLAMseq data analysis.
18 May 2020 PyCharm updated to version 2018.3.5
A Python IDE
18 May 2020 nanopolish updated to version 0.13.2
nanopolish is a software package for signal-level analysis of Oxford Nanopore sequencing data. Nanopolish can calculate an improved consensus sequence for a draft genome assembly, detect base modifications, call SNPs and indels with respect to a reference genome and more (see Nanopolish modules, below).
15 May 2020 gen3-client updated to version 2020.05
The gen3-client provides an easy-to-use, command-line interface for uploading and downloading data files to and from a Gen3 data commons from the terminal or command prompt.
14 May 2020 fusioncatcher updated to version 1.20
FusionCatcher searches for novel/known somatic fusion genes, translocations, and chimeras in RNA-seq data (paired-end or single-end reads from Illumina NGS platforms like Solexa/HiSeq/NextSeq/MiSeq) from diseased samples.
14 May 2020 samtools updated to version 1.10
The samtools package now provides samtools, bcftools, tabix, and the underlying htslib library.
14 May 2020 crystfel updated to version 0.9.0.5ae3043d
CrystFEL is a suite of programs for processing diffraction data acquired serially in a snapshot manner, such as when using the technique of Serial Femtosecond Crystallography (SFX) with a free-electron laser source.
13 May 2020 tvb updated to version 1.5.8
The Virtual Brain (TVB) scientific library has the purpose of offering modern tools to the Neurosciences community, for computing, simulating and analyzing functional and structural data of human brains
13 May 2020 AMON updated to version 1.0.0
AMON (Annotation of Metabolite Origins via Networks) is an open-source bioinformatics application that can be used to (1) annotate which compounds in the metabolome could have been produced by bacteria present or the host; (2) evaluate the pathway enrichment of host verses microbial metabolites, and (3) to visualize which compounds may have been produced by host versus microbial enzymes in KEGG pathway maps.
11 May 2020 Rosetta updated to version 2020.11
The Rosetta++ software suite can perform de novo protein structure predictions, identify low free energy sequences for target protein backbones, predict the structure of a protein-protein complex from the individual structures of the monomer components, incorporate NMR data into the basic Rosetta protocol to accelerate the process of NMR structure prediction, and more...
Scientific Databases updated in last 3 months
For a full list of scientific databases available on the NIH HPC systems, see this page

Updated Database Format Location
04 Aug 2020NCBI Taxonomytaxonomy/fdb/taxonomy
03 Aug 2020BetacoronavirusBlast/fdb/blastdb/Betacoronavirus
30 Jul 2020Protein Data BankBlast/fdb/blastdb/pdbaa
30 Jul 2020SwissProtBlast/fdb/blastdb/swissprot
29 Jul 2020NCBI nrBlast/fdb/blastdb/nr
28 Jul 2020ANNOVARANNOVAR/fdb/annovar/current
28 Jul 2020NCBI ntBlast/fdb/blastdb/nt
10 Jun 2020tessdata_fastTesseract data/fdb/tesseract/tessdata_fast
20 May 2020Human Genome hg19Fasta/fdb/genome/human-feb2009/