Biowulf High Performance Computing at the NIH
Application updates in the last 3 months
To see all versions available for any application, use module avail application_name
All centrally-installed applications are listed on the Applications page
Updated Application
23 Nov 2022 VEP updated to version 108
VEP (Variant Effect Predictor) determines the effect of your variants (SNPs, insertions, deletions, CNVs or structural variants) on genes, transcripts, and protein sequence, as well as regulatory regions.
22 Nov 2022 decifer updated to version 2.1.3
DeCiFer is an algorithm that simultaneously selects mutation multiplicities and clusters somatic single-nucleotide variants (SNVs) by their corresponding descendant cell fractions (DCF), a statistic that quantifies the proportion of cells which acquired the SNV or whose ancestors acquired the SNV. DCF is related to the commonly used cancer cell fraction (CCF) but further accounts for SNVs which are lost due to deleterious somatic copy-number aberrations (CNAs), identifying clusters of SNVs which occur in the same phylogenetic branch of tumour evolution.
21 Nov 2022 PEPATAC updated to version 0.10.3
PEPATAC is a robust pipeline for Assay for Transposase-Accessible Chromatin using sequencing (ATAC-seq) built on a loosely coupled modular framework. It may be easily applied to ATAC-seq projects of any size, from one-off experiments to large-scale sequencing projects. It is optimized on unique features of ATAC-seq data to be fast and accurate and provides several unique analytical approaches.
21 Nov 2022 GLIMPSE updated to version 1.1.1
GLIMPSE is a phasing and imputation method for large-scale low-coverage sequencing studies. It perform accurate imputed genotype calls and outperforms SNP arrays.
21 Nov 2022 MINTIE updated to version 0.4.1
MINTIE is a tool for identifying novel, rare transcriptional variants in cancer RNA-seq data. MINTIE detects gene fusions, transcribed structural variants, novel splice variants and complex variants, and annotates all novel transcriptional variants.
18 Nov 2022 PyTom updated to version 1.0
PyTom is a software package for the analysis of volumetric data obtained by cryo electron tomography (cryo-ET). It covers a complete pipeline of processing steps for tomogram reconstruction, localization of macromolecular complexes in tomograms, fine alignment of subtomograms extracted at these locations, and their classification.
18 Nov 2022 Gaussian updated to version G16-C02
Gaussian is a connected system of programs for performing semiempirical and ab initio molecular orbital (MO) calculations.
16 Nov 2022 R updated to version 4.2.2
R (the R Project) is a language and environment for statistical computing and graphics. R is similar to S, and provides a wide variety of statistical and graphical techniques (linear and nonlinear modelling, statistical tests, time series analysis, classification, clustering, ...).
16 Nov 2022 GCN_Cancer updated to version 20221105
The GCN_Cancer application employs graph convolutional network (GCN) models to classify the gene expression data samples from The Cancer Genonme Atlas (TCAG) as 33 designated tumor types or as normal. It has been trained on 10,340 cancer samples and 731 normal tissue samples from TCGA dataset.
16 Nov 2022 ncbi-vdb updated to version 3.0.1
The SRA Toolkit and SDK from NCBI is a collection of tools and libraries for using data in the INSDC Sequence Read Archives.
16 Nov 2022 ncbi-ngs updated to version 3.0.1
NCBI's NGS is a new, domain-specific API for accessing reads, alignments and pileups produced from Next Generation Sequencing
16 Nov 2022 hisat updated to version
HISAT is a fast and sensitive spliced alignment program which uses Hierarchical Indexing for Spliced Alignment of Transcripts.
16 Nov 2022 sratoolkit updated to version 3.0.1
The NCBI SRA Toolkit enables reading ("dumping") of sequencing files from the SRA database and writing ("loading") files into the .sra format.
15 Nov 2022 Comsol updated to version
The COMSOL Multiphysics engineering simulation software environment facilitates all steps in the modeling process − defining your geometry, meshing, specifying your physics, solving, and then visualizing your results.
15 Nov 2022 Genome Browser updated to version 439
The Genome Browser Mirror Fragments is a mirror of the UCSC Genome Browser. The URL is Users can also access the MySQL databases, supporting files directly, and a huge number of associated executables.
10 Nov 2022 jupyter updated to version 5.0.0
Jupyter is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and explanatory text. Uses include: data cleaning and transformation, numerical simulation, statistical modeling, machine learning and much more.
10 Nov 2022 MinkowskiEngine updated to version 0.5.4
MinkowskiEngine is a python toolchain that allows the efficient auto-differentiation of sparse tensors and allows the use of sparse tensors in standard neural network layers such as convolution, pooling, unpooling, and broadcasting.
8 Nov 2022 KAT updated to version 2.4.2
KAT (K-mer Analysis Toolkit) is a suite of tools that analyse Jellyfish hashes or sequence files (fasta or fastq) using kmer counts.
8 Nov 2022 hwloc updated to version 1.11.13
The Hardware Locality (hwloc) software project aims at easing the process of discovering hardware resources in parallel architectures. It offers command-line tools and a C API for consulting these resources, their locality, attributes, and interconnection.
8 Nov 2022 READemption updated to version 2.0.3
RNA-Seq pipeline including alignment, coverage tracks, quantitation, and differential expression analysis.
6 Nov 2022 Matlab updated to version 2022b
MATLAB is an interactive software package for scientific and engineering numeric computation. MATLAB integrates numerical analysis, matrix computation, signal processing, and graphics in an environment where problems and solutions are expressed just as they are written mathematically.
2 Nov 2022 agat updated to version 0.8.0
Another Gtf/Gff Analysis Toolkit
2 Nov 2022 verifybamid updated to version 2.0.1
verifyBamID is a software that verifies whether the reads in particular file match previously known genotypes for an individual (or group of individuals), and checks whether the reads are contaminated as a mixture of two samples. verifyBamID can detect sample contamination and swaps when external genotypes are available. When external genotypes are not available, verifyBamID still robustly detects sample swaps.
2 Nov 2022 ohana updated to version current
Ohana is a suite of software for analyzing population structure and admixture history using unsupervised learning methods.
1 Nov 2022 patchelf updated to version 0.16
patchelf is a small utility to modify the dynamic linker and RPATH of ELF executables.
1 Nov 2022 metaphlan updated to version 4.0.3
MetaPhlAn is a computational tool for profiling the composition of microbial communities (Bacteria, Archaea, Eukaryotes and Viruses) from metagenomic shotgun sequencing data (i.e. not 16S) with species-level. With the newly added StrainPhlAn module, it is now possible to perform accurate strain-level microbial profiling.
1 Nov 2022 fgbio updated to version 2.0.2
The Fulcrum Genomics tools are a set of utilities for working with BAM files, VCF files, and Unique Molecular IDs. Theey are accessed as subprograms from a Java jar, like GATK or Picard.
1 Nov 2022 boost updated to version 1.80
Boost provides free peer-reviewed portable C++ source libraries. Boost libraries are intended to be widely useful, and usable across a broad spectrum of applications.
31 Oct 2022 EMAN2 updated to version 2.99.35
EMAN2 is a broadly based greyscale scientific image processing suite with a primary focus on processing data from transmission electron microscopes.
30 Oct 2022 OptiType updated to version 1.3.5
OptiType is a HLA genotyping algorithm based on integer linear programming, capable of producing accurate 4-digit HLA genotyping predictions from NGS data by simultaneously selecting all major and minor HLA Class I alleles.
27 Oct 2022 git updated to version 2.38.1
Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency.
26 Oct 2022 merqury updated to version 1.3
Evaluate genome assemblies with k-mers and more
26 Oct 2022 verkko updated to version 1.1
Verkko is a hybrid genome assembly pipeline developed for telomere-to-telomere assembly of PacBio HiFi and Oxford Nanopore reads.
26 Oct 2022 mbg updated to version 1.0.11
Minimizer based sparse de Bruijn Graph constructor.
25 Oct 2022 graphaligner updated to version 1.0.16
Seed-and-extend program for aligning long error-prone reads to genome graphs.
25 Oct 2022 telseq updated to version 0.0.2
TelSeq is a software that estimates telomere length from whole genome sequencing data (BAMs).
24 Oct 2022 circexplorer2 updated to version 2.3.8
A combined strategy to identify circular RNAs (circRNAs and ciRNAs)
24 Oct 2022 VIRTUS updated to version 2.0.1
Bioinformatics pipeline for viral transcriptome detection.
24 Oct 2022 sicer updated to version 2-1.0.3
A clustering approach for identification of enriched domains from histone modification ChIP-Seq data
24 Oct 2022 parallel updated to version 20221022
GNU parallel is a shell tool for executing jobs in parallel using one or more computers.
21 Oct 2022 mantis updated to version 1.0.5
Microsatellite Analysis for Normal-Tumor InStability is a program developed for detecting microsatellite instability from paired-end BAM files.
21 Oct 2022 magma updated to version 1.10
MAGMA is a tool for gene analysis and generalized gene-set analysis of GWAS data. It can be used to analyse both raw genotype data as well as summary SNP p-values from a previous GWAS or meta-analysis.
21 Oct 2022 Scipion updated to version 3.0.12
Scipion is an image processing framework to obtain 3D models of macromolecular complexes using Electron Microscopy (3DEM). It integrates several software packages and presents an unified interface for both biologists and developers. Scipion allows to execute workflows combining different software tools, while taking care of formats and conversions. Additionally, all steps are tracked and can be reproduced later on.
20 Oct 2022 samtools updated to version 1.16.1
The samtools package now provides samtools, bcftools, tabix, and the underlying htslib library.
20 Oct 2022 globus-cli updated to version 3.9.0
Globus command line interface
20 Oct 2022 Huygens updated to version 22.04.0-p6
Huygens is an image restoration, deconvolution, resolution and noise reduction. It can process images from all current optical microscopes, including wide-field, confocal, Nipkow (scanning disk confocal), multiple-photon, and 4Pi microscopes.
20 Oct 2022 iva updated to version 1.0.11
IVA is a de novo assembler designed to assemble virus genomes that have no repeat sequences, using Illumina read pairs sequenced from mixed populations at extremely high and variable depth.
20 Oct 2022 parsnp updated to version 1.7.4
Parsnp is a command-line-tool for efficient microbial core genome alignment and SNP detection. Parsnp was designed to work in tandem with Gingr, a flexible platform for visualizing genome alignments and phylogenetic trees
19 Oct 2022 cmdstan updated to version 2.30.1
Command line interface to stan
19 Oct 2022 randfold updated to version 2.0.1
RandFold computes the probability that, for a given RNA sequence, the Minimum Free Energy (MFE) of the secondary structure is different from a distribution of MFE computed with random sequences.
19 Oct 2022 wuzz updated to version 0.5.0
Interactive cli tool for HTTP inspection
19 Oct 2022 deepsignal updated to version 2-0.1.3
A deep-learning method for detecting DNA methylation state from Oxford Nanopore sequencing reads.
18 Oct 2022 kb-python updated to version 0.27.3
kb-python is a python package for processing single-cell RNA-sequencing. It wraps the kallisto | bustools single-cell RNA-seq command line tools in order to unify multiple processing workflows.
18 Oct 2022 pomoxis updated to version 0.3.10
Pomoxis comprises a set of basic bioinformatic tools tailored to nanopore sequencing. Notably tools are included for generating and analysing draft assemblies. Many of these tools are used by the research data analysis group at Oxford Nanopore Technologies.
18 Oct 2022 GAMESS updated to version 30Sep22-R2
GAMESS is a general ab initio quantum chemistry package.
18 Oct 2022 gsea updated to version 4.3.2
Gene Set Enrichment Analysis (GSEA) is a computational method that determines whether an a priori defined set of genes shows statistically significant, concordant differences between two biological states (e.g. phenotypes).
17 Oct 2022 qpdf updated to version 1.11.1
QPDF is a command-line tool and C++ library that performs content-preserving transformations on PDF files
17 Oct 2022 scallop updated to version 0.10.5
Scallop is a reference-based transcript assembler.
17 Oct 2022 basespace_cli updated to version 1.5.2
Command line interface for Illumina's BaseSpace
17 Oct 2022 tantan updated to version 40
A tool to mask low complexity and short period tandem repeats
17 Oct 2022 foldseek updated to version 3-915ef7d
Fast structural similarity search
17 Oct 2022 RepeatMasker updated to version 4.1.3-p1
RepeatMasker is a program that screens DNA sequences for interspersed repeats and low complexity DNA sequences. The output of the program is a detailed annotation of the repeats that are present in the query sequence as well as a modified version of the query sequence in which all the annotated repeats have been masked (default: replaced by Ns). On average, almost 50% of a human genomic DNA sequence currently will be masked by the program.
14 Oct 2022 bonito updated to version 0.6.1
A PyTorch Basecaller for Oxford Nanopore Reads
14 Oct 2022 GATK updated to version
GATK, from the Broad Institute, is a structured software library that makes writing efficient analysis tools using next-generation sequencing data very easy, and second it's a suite of tools for working with human medical resequencing projects such as 1000 Genomes and The Cancer Genome Atlas. These tools include things like a depth of coverage analyzers, a quality score recalibrator, a SNP/indel caller and a local realigner.
13 Oct 2022 peddy updated to version 0.4.8
peddy is used to compare sex and familial relationships given in a PED file with those inferred from a VCF file
13 Oct 2022 guppy updated to version 6.3.8
Local accelerated basecalling for Nanopore data
13 Oct 2022 THetA updated to version 0.7-20-g94fd772
Tumor Heterogeneity Analysis (THetA) is an algorithm used to estimate tumor purity and clonal/subclonal copy number aberrations simultaneously from high-throughput DNA sequencing data.
13 Oct 2022 fribidi updated to version 1.0.12
The Free Implementation of the Unicode Bidirectional Algorithm.
13 Oct 2022 hicpro updated to version 3.1.0
HiC-Pro: An optimized and flexible pipeline for Hi-C data processing
12 Oct 2022 MySQL updated to version 8.0.31
MySQL is an open-source relational database management system.
12 Oct 2022 SAIGE updated to version
R package for large-scale genetic association studies.
11 Oct 2022 kneaddata updated to version 0.11.0
KneadData is a tool designed to perform quality control on metagenomic and metatranscriptomic sequencing data, especially data from microbiome experiments.
11 Oct 2022 SnapATAC2 updated to version 2.1.2
SnapATAC is a software package for analyzing scATAC-seq datasets. Conventional assays to map regulatory elements via open chromatin analysis of primary tissues is hindered by sample heterogeneity. SnapATAC dissects cellular heterogeneity in an unbiased manner and map the trajectories of cellular states.
7 Oct 2022 quarto updated to version 1.1.179
Quarto is an open-source scientific and technical publishing system built on Pandoc
7 Oct 2022 IMOD updated to version 4.12.25
IMOD is a set of image processing, modeling and display programs used for tomographic reconstruction and for 3D reconstruction of EM serial sections and optical sections.
7 Oct 2022 Julia updated to version 1.8.2
high level, dynamic language for technical computing
7 Oct 2022 fastsurfer updated to version 1.1.1
Fastsurfer is a neuroimaging pipeline based on deep learning.
7 Oct 2022 golang updated to version 1.19.2
The Go programming language
6 Oct 2022 cogentap updated to version 1.5.0
Cogent NGS Analysis Pipeline (CogentAP) is bioinformatic software for analyzing RNA-seq NGS data generated using various takara kits.
6 Oct 2022 apptainer updated to version 1.1.2
Apptainer allows you to build and run Linux containers with emphasis on use in HPC. Apptainer is the Linux Foundation variant of and successor to the widely popular Singularity.
4 Oct 2022 fmriprep updated to version 22.0.2
A Robust Preprocessing Pipeline for fMRI Data
4 Oct 2022 mriqc updated to version 22.0.6
MRIQC is an MRI quality control tool
3 Oct 2022 medaka updated to version 1.7.2
medaka is a tool to create a consensus sequence from nanopore sequencing data. This task is performed using neural networks applied from a pileup of individual sequencing reads against a draft assembly.
3 Oct 2022 TensorQTL updated to version 1.0.7
ensoorQTL leverages general-purpose libraries and graphics processing units (GPUs) to achieve high efficiency of computations at low costR. Using PyTorch or TensorFlow it allows > 200-fold decreases in runtime and ~ 5–10-fold reductions in cost when running on GPUs relative to CPUs.
30 Sep 2022 ont-fast5-api updated to version 4.1.0
Tools to manipulate HDF5 files of the Oxford Nanopore .fast5 file format
29 Sep 2022 flye updated to version 2.9.1
Fast and accurate de novo assembler for single molecule sequencing reads
29 Sep 2022 breseq updated to version 0.37.1
breseq is a computational pipeline for finding mutations relative to a reference sequence in short-read DNA re-sequencing data. It is intended for haploid microbial genomes (<20 Mb).
29 Sep 2022 RELION updated to version 4.0.0
RELION (for REgularised LIkelihood OptimisatioN) is a stand-alone computer program for Maximum A Posteriori refinement of (multiple) 3D reconstructions or 2D class averages in cryo-electron microscopy.
28 Sep 2022 spark updated to version 3.2.2
Apache Spark is a fast and general engine for large-scale data processing. It is commonly used as an in-memory alternative to Hadoop MapReduce.
28 Sep 2022 rclone updated to version 1.59.2
Rclone is a utility for synchronizing directories on a file-based storage system (e.g. /home or /data) with an object store such as Amazon S3. It uses the S3 protocol, and it can be used with the HPC object storage system.
28 Sep 2022 mmseqs updated to version 2-13-45111-219-gaabc78c
MMseqs2 (Many-against-Many sequence searching) is a software suite to search and cluster huge protein and nucleotide sequence sets.
28 Sep 2022 GEM1 updated to version 1.4.3
GEM (Gene-Environment interaction analysis for Millions of samples) is a software program for large-scale gene-environment interaction testing in samples from unrelated individuals. It enables genome-wide association studies in up to millions of samples while allowing for multiple exposures, control for genotype-covariate interactions, and robust inference.
28 Sep 2022 interproscan updated to version 5.57-90.0
InterProScan is the software package that allows sequences (protein and nucleic) to be scanned against InterPro's signatures. Signatures are predictive models, provided by several different databases, that make up the InterPro consortium.
28 Sep 2022 cellphonedb updated to version 3.1.0
A publicly available repository of curated receptors, ligands and their interactions. Subunit architecture is included for both ligands and receptors, representing heteromeric complexes accurately.
27 Sep 2022 colabfold updated to version 1.3.0-226-g2267166
ColabFold batch scripts
26 Sep 2022 STAR-Fusion updated to version 1.11.0
Transcript fusion detection
25 Sep 2022 ldsc updated to version 1.0.1-20200724
ldsc is a command line tool for estimating heritability and genetic correlation from GWAS summary statistics. ldsc also computes LD Scores.
23 Sep 2022 Hail updated to version 0.2.99
Hail is an open-source, scalable framework for exploring and analyzing genomic data.
22 Sep 2022 rosettafoldna updated to version e6053f7
Accurate prediction of nucleic acid and protein-nucleic acid complexes using RoseTTAFoldNA
21 Sep 2022 parabricks updated to version 4.0.0
The Clara Parabricks toolkit is a set of GPU-accelerated genome analysis tools for secondary analysis of next generation sequencing data.
21 Sep 2022 Solar updated to version 9.0.1
SOLAR-Eclipse is an extensive, flexible software package for genetic variance components analysis, including linkage analysis, quantitative genetic analysis, SNP association analysis (QTN and QTLD), and covariate screening.
20 Sep 2022 Beagle updated to version 5.4_22Jul22
Beagle is a package for imputing genotypes, inferring haplotype phase, and performing genetic association analysis. BEAGLE is designed to analyze large-scale data sets with hundreds of thousands of markers genotyped on thousands of samples.
16 Sep 2022 cromwell updated to version 84
A Workflow Management System geared towards scientific workflows.
16 Sep 2022 pydockrmsd updated to version 1.0.0
DockRMSD is capable of deterministically identifying the minimum symmetry-corrected RMSD and is able to do so without significant loss of computational efficiency compared to other methods.
15 Sep 2022 pychopper updated to version 2.7.1
Pychopper v2 is a tool to identify, orient and trim full-length Nanopore cDNA reads. The tool is also able to rescue fused reads.
14 Sep 2022 bakta updated to version 1.5
Rapid & standardized annotation of bacterial genomes, MAGs & plasmids
14 Sep 2022 rsat updated to version 2020.02.29-1
Regulatory sequence analysis tools
13 Sep 2022 pyem updated to version 220913
UCSF pyem is a collection of Python modules and command-line utilities for electron microscopy of biological samples.
13 Sep 2022 plink updated to version 3.6-alpha
PLINK is whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.
8 Sep 2022 deepconsensus updated to version 0.3.1
DeepConsensus uses gap-aware sequence transformers to correct errors in Pacific Biosciences (PacBio) Circular Consensus Sequencing (CCS) data
7 Sep 2022 OmegaFold updated to version 1.1.0
OmegaFold is the first computational method to successfully predict high-resolution protein structure from a single primary sequence alone. Using a new combination of a protein language model that allows us to make predictions from single sequences and a geometry-inspired transformer model trained on protein structures, OmegaFold outperforms RoseTTAFold and achieves similar prediction accuracy to AlphaFold2 on recently released structures.
2 Sep 2022 cellranger-arc updated to version 2.0.2
Cell Ranger ARC is a set of analysis pipelines that process Chromium Single Cell Multiome ATAC + Gene Expression sequencing data to generate a variety of analyses pertaining to gene expression, chromatin accessibility and their linkage. Furthermore, since the ATAC and gene expression measurements are on the very same cell, we are able to perform analyses that link chromatin accessibility and gene expression.
2 Sep 2022 cellranger updated to version 7.0.1
Cell Ranger is a set of analysis pipelines that processes Chromium single cell 3’ RNA-seq output to align reads, generate gene-cell matrices and perform clustering and gene expression analysis.
31 Aug 2022 schicexplorer updated to version 7
scHiCExplorer is a set of programs to process, normalize, analyse and visualize single-cell Hi-C data
30 Aug 2022 trimAl updated to version 1.2rev59
trimAl is a tool for the automated removal of spurious sequences or poorly aligned regions from a multiple sequence alignment. It can consider several parameters, alone or in multiple combinations, in order to select the most-reliable positions in the alignment. These include the proportion of sequences with a gap, the level of residue similarity and, if several alignments for the same set of sequences are provided, the consistency level of columns among alignments. Moreover, trimAl is able to manually select a set of columns to be removed from the alignment.
30 Aug 2022 FEBio updated to version 3.7
FEBio software suite implement a nonlinear implicit finite element (FE) framework, designed specifically for analysis in computational solid biomechanics. FEBio offers modeling scenarios, constitutive models, and boundary conditions, which are relevant to numerous applications in biomechanics. The open-source FEBio software is written in C++, with particular attention to scalar and parallel performance on modern computer architectures.
30 Aug 2022 coverageMaster updated to version 20220706
CoverageMaster (CoM) is a copy number variation (CNV) calling algorithm i ased on depth-of-coverage maps designed to detect CNVs of any size i n exome [whole exome sequencing (WES)] and genome [whole genome sequencing (WGS)] data. The core of the algorithm is the compression of sequencing coverage data in a multiscale Wavelet space and the analysis through an iterative Hidden Markov Model.
Scientific Databases updated in last 3 months
For a full list of scientific databases available on the NIH HPC systems, see this page

Updated Database Format Location
26 Nov 2022NCBI Taxonomytaxonomy/fdb/taxonomy
22 Nov 2022NCBI ntFasta/fdb/fastadb/nt.fas
17 Nov 2022NCBI nrBlast/fdb/blastdb/nr
17 Nov 2022SwissProtBlast/fdb/blastdb/swissprot
17 Nov 2022Protein Data BankBlast/fdb/blastdb/pdbaa
16 Nov 2022NCBI ntBlast/fdb/blastdb/nt
15 Nov 2022Protein Data BankBlast/fdb/blastdb/pdbnt
15 Nov 2022NCBI nrFasta/fdb/fastadb/nr.fas
15 Nov 2022SwissProtFasta/fdb/fastadb/swissprot.aa.fas
15 Nov 2022Protein Data BankFasta/fdb/fastadb/pdb.aa.fas
02 Nov 2022ANNOVARANNOVAR/fdb/annovar/current