High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
Application updates in the last 3 months
To see all versions available for any application, use module avail application_name
All centrally-installed applications are listed on the Applications page
Updated Application
16 Oct 2017 novocraft updated to version 3.08.02
Package includes aligner for single-ended and paired-end reads from the Illumina Genome Analyser. Novoalign finds global optimum alignments using full Needleman-Wunsch algorithm with affine gap penalties.
13 Oct 2017 singularity updated to version 2.4
Singularity is a container platform focused on supporting ``Mobility of Compute``. It allows users to emulate, and share custom Linux environments allowing for the creation of self-contained development stacks.
10 Oct 2017 hgvs updated to version 1.1.0
The hgvs package provides a Python library to facilitate the use of genome, transcript, and protein variants that are represented using the Human Genome Variation Society (varnomen) recommendations. To use, type module load hgvs prior to calling python.
7 Oct 2017 pyDNase updated to version 0.2.5
pyDNase is a suite of tools for analysing DNase-seq data - pyDNase comes with several analysis scripts covering several common use cases of DNase-seq analysis, and also an implementation of the Wellington, Wellington 1D, and Wellington-boostrap footprinting algorithms.
7 Oct 2017 manta updated to version 1.2.0
Structural variant and indel caller for mapped sequencing data
7 Oct 2017 mapDamage updated to version 2.0.8
mapDamage profiles DNA damage patterns in next-generation sequencing analyses of ancient DNA samples.
7 Oct 2017 bali-phy updated to version 3.0-beta3
BAli-Phy is MCMC software developed by Ben Redelings with Marc Suchard for simultaneous Bayesian estimation of alignment and phylogeny (and other parameters). It handles generic Bayesian modeling via probabilistic programming.
7 Oct 2017 clark updated to version
A method based on a supervised sequence classification using discriminative k-mers
6 Oct 2017 exomiser updated to version 8.0.1
The Exomiser is a Java program that functionally annotates variants from whole-exome sequencing data starting from a VCF file.
6 Oct 2017 MAJIQ updated to version 1.0.5
Modeling Alternative Junction Inclusion Quantification. MAJIQ and Voila are two software packages that together define, quantify, and visualize local splicing variations (LSV) from RNA-Seq data.
6 Oct 2017 hotnet2 updated to version 1.0.1-125-g29fe555
HotNet2 is an algorithm for finding significantly altered subnetworks in a large gene interaction network.
6 Oct 2017 annogesic updated to version 0.6.25
ANNOgesic is a transcriptome annotation pipeline for RNA-seq.
5 Oct 2017 Rosetta updated to version 2017.36
The Rosetta++ software suite can perform de novo protein structure predictions, identify low free energy sequences for target protein backbones, predict the structure of a protein-protein complex from the individual structures of the monomer components, incorporate NMR data into the basic Rosetta protocol to accelerate the process of NMR structure prediction, and more...
5 Oct 2017 hicpro updated to version 2.9.0
HiC-Pro: An optimized and flexible pipeline for Hi-C data processing
5 Oct 2017 ncbi-toolkit updated to version 18.0.0
The NCBI C++ Toolkit is a set of executables and libraries for a multitude of sequence analysis functions.
2 Oct 2017 lancet updated to version 1.0.1
Lancet is a somatic variant caller (SNVs and indels) for short read data.
29 Sep 2017 snakemake updated to version 4.1.0
Snakemake aims to reduce the complexity of creating workflows by providing a fast and comfortable execution environment, together with a clean and modern domain specific specification language (DSL) in python style. It is well suited for bioinformatic workflows.
26 Sep 2017 minimap2 updated to version 2.2
Minimap2 is a fast sequence mapping and alignment program that can find overlaps between long noisy reads, or map long reads or their assemblies to a reference genome optionally with detailed alignment (i.e. CIGAR).
26 Sep 2017 maker updated to version 2.31.9
MAKER is an easy-to-configure, portable genome annotation pipeline.
26 Sep 2017 boost updated to version 1.65
Boost provides free peer-reviewed portable C++ source libraries. Boost libraries are intended to be widely useful, and usable across a broad spectrum of applications.
26 Sep 2017 miarma updated to version 1.7.1
miARma-Seq, which stands for miRNA-Seq And RNA-Seq Multiprocess Analysis, is a suite designed to study mRNAs, miRNAs and circRNAs.
25 Sep 2017 CAVIAR updated to version a97e614
CAVIAR (CAusal Variants Identication in Associated Regions) is a statistical framework that quantifies the probability of each variant to be causal while allowing with arbitrary number of causal variants
25 Sep 2017 PAINTOR updated to version 3.0-2c614ef
PAINTOR (Probabilistic Annotation INtegraTOR) is a probabilistic framework that integrates association strength with genomic functional annotation data to improve accuracy in selecting plausible causal variants for functional validation.
22 Sep 2017 agfusion updated to version 0.149
Annotate Gene Fusion (AGFusion) is a package for annotating gene fusions from the human or mouse genomes.
21 Sep 2017 paraview updated to version 5.4.1
ParaView is an open-source, multi-platform data analysis and visualization application.
21 Sep 2017 cellranger updated to version 2.0.2
Cell Ranger is a set of analysis pipelines that processes Chromium single cell 3’ RNA-seq output to align reads, generate gene-cell matrices and perform clustering and gene expression analysis.
18 Sep 2017 QIIME updated to version 2.2017.8
QIIME is an open source software package for comparison and analysis of microbial communities, primarily based on high-throughput amplicon sequencing data (such as SSU rRNA) generated on a variety of platforms, but also supporting analysis of other types of data (such as shotgun metagenomic data).
14 Sep 2017 align2rawsignal updated to version 2.0
Also known as Wiggler, align2rawsignal creates genome-wide raw or normalized signal tracks from aligned sequencing reads (BAM/tagAlign) Wiggler can generate genome-wide signal coverage tracks for ChIP-seq, DNase-seq, FAIRE-seq and MNase-seq datasets.
14 Sep 2017 SNAP updated to version 2013-11-29
(Semi-HMM-based Nucleic Acid Parser) gene prediction tool
14 Sep 2017 LongRanger updated to version 2.1.6
Long Ranger is a set of analysis pipelines that processes GemCode sequencing output to align reads and call and phase SNPs, indels, and structural variants Loupe is a genome browser designed to visualize the Linked-Read data produced by the 10x Chromium Platform.
11 Sep 2017 smrtanalysis updated to version
SMRT® Analysis is a bioinformatics software suite available for analysis of DNA sequencing data from Pacific Biosciences’ SMRT technology. Users can choose from a variety of analysis protocols that utilize PacBio® and third-party tools. Analysis protocols include de novo genome assembly, cDNA mapping, DNA base-modification detection, and long-amplicon analysis to determine phased consensus sequences.
11 Sep 2017 gmap-gsnap updated to version 2017-09-05
A Genomic Mapping and Alignment Programs
8 Sep 2017 cromwell updated to version 28.2
A Workflow Management System geared towards scientific workflows.
7 Sep 2017 Qt updated to version 5.9.1
Qt is a cross-platform application framework that is used for developing application software that can be run on various software and hardware platforms with little or no change in the underlying codebase, while still being a native application with native capabilities and speed.
7 Sep 2017 KMC updated to version 3.0.0
KMC is a disk-based programm for counting k-mers from (possibly gzipped) FASTQ/FASTA files
6 Sep 2017 Gemini updated to version 0.20.1
GEMINI (GEnome MINIng) is designed to be a flexible framework for exploring genetic variation in the context of the wealth of genome annotations available for the human genome. By placing genetic variants, sample genotypes, and useful genome annotations into an integrated database framework, GEMINI provides a simple, flexible, yet very powerful system for exploring genetic variation for for disease and population genetics.
6 Sep 2017 bwtool updated to version 1.0
bwtool is a command-line utility for bigWig files
6 Sep 2017 WGSA updated to version 07
WGSA is an annotation pipeline for human genome re-sequencing studies, to facilitate the functional annotation step of whole genome sequencing (WGS). Currently WGSA supports the annotation of SNVs and indels locally without remote database requests, allowing it to scale up for large WGS studies.
31 Aug 2017 PartekFlow updated to version
Web interface designed specifically for the analysis needs of next generation sequencing applications including RNA, small RNA, and DNA sequencing.
31 Aug 2017 cnvkit updated to version 0.9.0
Copy number variant detection from targeted DNA sequencing
31 Aug 2017 Beagle updated to version 4.1_08Jun17
Beagle is a package for imputing genotypes, inferring haplotype phase, and performing genetic association analysis. BEAGLE is designed to analyze large-scale data sets with hundreds of thousands of markers genotyped on thousands of samples.
30 Aug 2017 Gaussian updated to version G16-A03
Gaussian is a connected system of programs for performing semiempirical and ab initio molecular orbital (MO) calculations.
30 Aug 2017 GCC updated to version 7.2.0
The GNU Compiler Collection includes front ends for C, C++, Objective-C, Fortran, and Go, as well as libraries for these languages (libstdc++, libgfortran,...)
30 Aug 2017 VEP updated to version 90
VEP (Variant Effect Predictor) determines the effect of your variants (SNPs, insertions, deletions, CNVs or structural variants) on genes, transcripts, and protein sequence, as well as regulatory regions.
30 Aug 2017 biobambam2 updated to version 2.0.76
Tools for early stage alignment file processing.
29 Aug 2017 usearch updated to version 10.0.240
USEARCH is a unique sequence analysis tool with thousands of users world-wide. USEARCH offers search and clustering algorithms that are often orders of magnitude faster than BLAST.
29 Aug 2017 defuse updated to version 0.8.1
deFuse is a software package for gene fusion discovery using RNA-Seq data. The software uses clusters of discordant paired end alignments to inform a split read alignment analysis for finding fusion boundaries. The software also employs a number of heuristic filters in an attempt to reduce the number of false positives and produces a fully annotated output for each predicted fusion.
29 Aug 2017 EricScript updated to version 0.5.5
EricScript is a computational framework for the discovery of gene fusions in paired end RNA-seq data.
28 Aug 2017 albacore updated to version 2.0.2
ONT basecaller
22 Aug 2017 PePr updated to version 1.1.20
PePr is a ChIP-Seq Peak-calling and Prioritization pipeline that uses a sliding window approach and models read counts across replicates and between groups with a negative binomial distribution.
18 Aug 2017 RELION updated to version 2.1-beta-1
RELION (for REgularised LIkelihood OptimisatioN) is a stand-alone computer program for Maximum A Posteriori refinement of (multiple) 3D reconstructions or 2D class averages in cryo-electron microscopy.
17 Aug 2017 admixture updated to version 1.3.0
ADMIXTURE is a software tool for maximum likelihood estimation of individual ancestries from multilocus SNP genotype datasets. It uses the same statistical model as STRUCTURE but calculates estimates much more rapidly using a fast numerical optimization algorithm.
17 Aug 2017 lammps updated to version 31Mar17
LAMMPS is a classical molecular dynamics code, and an acronym for Large-scale Atomic/Molecular Massively Parallel Simulator. It runs on a variety of different computer systems, including single processor systems, distributed-memory machines with MPI, and GPU and Xeon Phi systems. LAMMPS is open source software, released under the GNU General Public License.
16 Aug 2017 taxtastic updated to version 0.7.1
Build and maintain reference packages-- i.e. collections of reference trees, reference alignments, profiles, and associated taxonomic information.
16 Aug 2017 svviz updated to version 1.6.1
svviz visualizes high-throughput sequencing data relevant to a structural variant. Only reads supporting the variant or the reference allele will be shown. svviz can operate in both an interactive web browser view to closely inspect individual variants, or in batch mode, allowing multiple variants (annotated in a VCF file) to be analyzed simultaneously.
15 Aug 2017 ctffind updated to version 4.1.8
Programs for finding CTFs of electron micrographs
9 Aug 2017 AfterQC updated to version 0.9.6
Automatic Filtering, Trimming, Error Removing and Quality Control for fastq data.
9 Aug 2017 minc-toolkit updated to version 1.9.15
This metaproject bundles multiple MINC-based packages that historically have been developed somewhat independently
9 Aug 2017 nextflow updated to version 0.25.5
Data-driven computational pipelines
7 Aug 2017 diamond updated to version 0.9.9
DIAMOND is a new high-throughput program for aligning DNA reads or protein sequences against a protein reference database such as NR, at up to 20,000 times the speed of BLAST, with high sensitivity.
4 Aug 2017 atropos updated to version 1.1.4
An NGS read trimming tool that is specific, sensitive, and speedy.
3 Aug 2017 crystfel updated to version 0.6.3
CrystFEL is a suite of programs for processing diffraction data acquired serially in a snapshot manner, such as when using the technique of Serial Femtosecond Crystallography (SFX) with a free-electron laser source.
3 Aug 2017 adapterremoval updated to version 2.2.2
rapid adapter trimming, identification, and read merging
3 Aug 2017 varsim updated to version 0.8.1
A high-fidelity simulation validation framework for high-throughput genome sequencing with cancer applications
2 Aug 2017 squid updated to version 1.0
SQUID is designed to detect transcriptomic structural variations from RNA-seq alignment.
2 Aug 2017 openjdk updated to version 1.8.0_121
OpenJDK (Open Java Development Kit) is a free and open source implementation of the Java Platform, Standard Edition (Java SE). It is the official reference implementation of Java SE since version 7.[
2 Aug 2017 presto updated to version
A bioinformatics toolkit for processing high-throughput lymphocyte receptor sequencing data.
2 Aug 2017 THetA updated to version 0.7-7-g8f93e6c
Tumor Heterogeneity Analysis (THetA) is an algorithm used to estimate tumor purity and clonal/subclonal copy number aberrations simultaneously from high-throughput DNA sequencing data.
1 Aug 2017 ldsc updated to version 1.0.0-92-gcf1707e
ldsc is a command line tool for estimating heritability and genetic correlation from GWAS summary statistics. ldsc also computes LD Scores.
1 Aug 2017 dogpicker updated to version 0.2.1
Particle picker that uses difference of Gaussians (DoG) for picking particles.
1 Aug 2017 GATK updated to version 3.8-0
GATK, from the Broad Institute, is a structured software library that makes writing efficient analysis tools using next-generation sequencing data very easy, and second it's a suite of tools for working with human medical resequencing projects such as 1000 Genomes and The Cancer Genome Atlas. These tools include things like a depth of coverage analyzers, a quality score recalibrator, a SNP/indel caller and a local realigner.
1 Aug 2017 Genome Browser updated to version 352
The Genome Browser Mirror Fragments at Helix Systems is a mirror of the UCSC Genome Browser. The URL is https://hpcnihapps.cit.nih.gov/genome. Users can also access the MySQL databases, supporting files directly, and a huge number of associated executables.
31 Jul 2017 magic updated to version b2103fd
A diffusion-based imputation method reveals gene-gene interactions in single-cell RNA-sequencing data.
31 Jul 2017 DEPICT updated to version 140721
DEPICT is an integrative tool that based on predicted gene functions systematically prioritizes the most likely causal genes at associated loci, highlights enriched pathways, and identifies tissues/cell types where genes from associated loci are highly expressed
31 Jul 2017 MAGMA updated to version 1.06
MAGMA is a tool for gene analysis and generalized gene-set analysis of GWAS data. It can be used to analyse both raw genotype data as well as summary SNP p-values from a previous GWAS or meta-analysis.
31 Jul 2017 BOLT-LMM updated to version 2.2
The BOLT-LMM algorithm computes statistics for testing association between phenotype and genotypes using a linear mixed model (LMM)
31 Jul 2017 smc++ updated to version 1.9.4
SMC++ is a program for estimating the size history of populations from whole genome sequence data.
31 Jul 2017 art updated to version 20160605
ART is a set of simulation tools to generate synthetic next-generation sequencing reads.
29 Jul 2017 fqtools updated to version 2.0
Tools for manipulating fastq files
28 Jul 2017 SGA-ICE updated to version 20170728
The script SGA-ICE (SGA-Iteratively Correcting Errors) implements iterative error correction by using modules from the String Graph Assembler (SGA).
26 Jul 2017 ANNOVAR updated to version 2017-07-16
ANNOVAR is an efficient software tool to utilize update-to-date information to functionally annotate genetic variants detected from diverse genomes.
26 Jul 2017 EDirect updated to version 7.00
Entrez Direct (EDirect) is an advanced method for accessing the NCBI's set of interconnected databases (publication, sequence, structure, gene, variation, expression, etc.) from a UNIX terminal window.
26 Jul 2017 pigz updated to version 2.3.4
pigz (parallel implementation of gzip) is a fully functional replacement for gzip that exploits multiple processors and multiple cores to the hilt when compressing data.
25 Jul 2017 Phenix updated to version 1.12-2829
PHENIX is a software suite for the automated determination of macromolecular structures using X-ray crystallography and other methods.
24 Jul 2017 circos updated to version 0.69-5
Circos is a program for the generation of publication-quality, circularly composited renditions of genomic data and related annotations. Circos is particularly suited for visualizing alignments, conservation and intra and inter-chromosomal relationships. Also, Circos is useful to visualize any type of information that benefits from a circular layout. Thus, although it has been designed for the field of genomics, it is sufficiently flexible to be used in other data domains.
24 Jul 2017 plink updated to version 1.9.0-beta4.4
PLINK is whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.
21 Jul 2017 pbsuite updated to version 15.8.24
The PBSuite contains two projects created for analysis of Pacific Biosciences long-read sequencing data: PBHoney and PBJelly. PBHoney is an implementation of two variant-identification approaches designed to exploit the high mappability of long reads (i.e., greater than 10,000 bp). PBHoney considers both intra-read discordance and soft-clipped tails of long reads to identify structural variants. PBJelly is a highly automated pipeline that aligns long sequencing reads (such as PacBio RS reads or long 454 reads in fasta format) to high-confidence draft assembles. PBJelly fills or reduces as many captured gaps as possible to produce upgraded draft genomes.
20 Jul 2017 oases updated to version 0.2.1
oases is a de novo transcriptome assembler based on the Velvet genome assembler core.
20 Jul 2017 rclone updated to version 1.36
Rclone is a utility for synchronizing directories on a file-based storage system (e.g. /home or /data) with an object store such as Amazon S3. It uses the S3 protocol, and it can be used with the HPC object storage system.
19 Jul 2017 Canu updated to version 1.5
Canu is a fork of the Celera Assembler designed for high-noise single-molecule sequencing (such as the PacBio RSII or Oxford Nanopore MinION). Canu will correct the reads, then trim suspicious regions (such as remaining SMRTbell adapter), then assemble the corrected and cleaned reads into unitigs.
19 Jul 2017 kplogo updated to version 1.1
k-mer probability logo (kpLogo) is a probability-based logo tool for integrated detection and visualization of position-specific ultra-short motifs from a set of aligned sequences
Scientific Databases updated in last 3 months
For a full list of scientific databases available on the NIH HPC systems, see this page

Updated Database Format Location
17 Oct 2017Protein Data BankPDB/pdb/pdb
16 Oct 2017MitoBlast/fdb/blastdb/mito.aa
15 Oct 201716S MicrobialBlast/fdb/blastdb/16SMicrobial
14 Oct 2017Protein Data BankBlast/fdb/blastdb/pdbaa
14 Oct 2017SwissProtBlast/fdb/blastdb/swissprot
13 Oct 2017Rat Genome (Rattus norvegicus) rn5MySQLNIH mirror of UCSC Genome Browser
13 Oct 2017Protein Data BankBlast/fdb/blastdb/pdbnt
12 Oct 2017EST - othersBlast/fdb/blastdb/est_others
12 Oct 2017NCBI nrBlast/fdb/blastdb/nr
11 Oct 2017NCBI ntBlast/fdb/blastdb/nt
10 Oct 2017Refseq Other GenomicFasta/fdb/fastadb/ref.other.genomic.fas
10 Oct 2017Protein Data BankFasta/fdb/fastadb/pdb.nt.fas
10 Oct 2017MitoFasta/fdb/fastadb/mito.nt.fas
10 Oct 2017SwissProtFasta/fdb/fastadb/swissprot.aa.fas
10 Oct 2017Protein Data BankFasta/fdb/fastadb/pdb.aa.fas
10 Oct 2017MitoFasta/fdb/fastadb/mito.aa.fas
10 Oct 2017NCBI nrFasta/fdb/fastadb/nr.aa.fas
06 Oct 2017Rat Genome (Rattus norvegicus) rn4MySQLNIH mirror of UCSC Genome Browser
06 Oct 2017Chicken Genome (Gallus gallus) MySQLNIH mirror of UCSC Genome Browser
06 Oct 2017Drosophila genome (Drosophila melanogaster) fb5MySQLNIH mirror of UCSC genome browser
03 Oct 2017NCBI ntFasta/fdb/fastadb/nt.fas
29 Sep 2017Mouse Genome (Mus musculus) mm8MySQLNIH mirror of UCSC Genome Browser
31 Aug 2017Human Genome hg19dbNSFP/fdb/dbNSFP/
29 Aug 2017Human Genome hg19VEP data/fdb/VEP/81/cache/homo_sapiens/81_GRCh37
29 Aug 2017Mouse Genome (Mus musculus) mm10VEP data/fdb/VEP/81/cache/mus_musculus/81_GRCm38
24 Aug 2017Human Genome hg19Fasta/fdb/genome/human-feb2009/
23 Aug 2017HTGsBlast/fdb/blastdb/htgs
10 Aug 2017DSSPDSSP/fdb/DSSP/
25 Jul 2017Refseq Human GenomicFasta/fdb/fastadb/ref.human.genomic.fas
23 Jul 2017Refseq Human GenomicBlast/fdb/blastdb/human_genomic