High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
Application updates in the last 3 months
To see all versions available for any application, use module avail application_name
All centrally-installed applications are listed on the Applications page
Updated Application
26 Jun 2017 singularity updated to version 2.3.1
Singularity is a container platform focused on supporting ``Mobility of Compute``. It allows users to emulate, and share custom Linux environments allowing for the creation of self-contained development stacks.
23 Jun 2017 circexplorer updated to version 1.1.10
A combined strategy to identify circular RNAs (circRNAs and ciRNAs)
22 Jun 2017 icgc-get updated to version 0.5.8
ICGC data resides in many data repositories and compute clouds around the world. A coordinated mechanism to bootstrap and streamline the data access process is highly desirable. This is the problem the icgc-get tool helps to solve.
22 Jun 2017 VEP updated to version 89
VEP (Variant Effect Predictor) determines the effect of your variants (SNPs, insertions, deletions, CNVs or structural variants) on genes, transcripts, and protein sequence, as well as regulatory regions.
22 Jun 2017 integrative updated to version default
Software Pipeline for Integrative Genetic Association Analysis: Probabilistic Assessment of Enrichment and Colocalization
21 Jun 2017 stampy updated to version 1.0.31
Short read aligner
19 Jun 2017 IgBlast updated to version 1.7.0
IgBlast is a sequence analysis tool for immunoglobulin variable domains.
19 Jun 2017 preseq updated to version 2.0.3
predicting library complexity and genome coverage in high-throughput sequencing
19 Jun 2017 mothur updated to version 1.39.5
mothur is a tool for analyzing 16S rRNA gene sequences generated on multiple platforms as part of microbial ecology projects.
17 Jun 2017 rmats updated to version 3.1.0
MATS is a computational tool to detect differential alternative splicing events from RNA-Seq data.
16 Jun 2017 smrtanalysis updated to version 4.0.0.190159
SMRT® Analysis is a bioinformatics software suite available for analysis of DNA sequencing data from Pacific Biosciences’ SMRT technology. Users can choose from a variety of analysis protocols that utilize PacBio® and third-party tools. Analysis protocols include de novo genome assembly, cDNA mapping, DNA base-modification detection, and long-amplicon analysis to determine phased consensus sequences.
16 Jun 2017 prokka updated to version 1.12
Prokka is a software tool for the rapid annotation of prokaryotic genomes.
15 Jun 2017 SeqMonk updated to version 1.38.1
SeqMonk is a program to enable the visualization and analysis of mapped sequence data. It was written for use with mapped next generation sequence data but can in theory be used for any dataset which can be expressed as a series of genomic positions.
13 Jun 2017 Spark updated to version 2.1.1
Apache Spark is a fast and general engine for large-scale data processing. It is commonly used as an in-memory alternative to Hadoop MapReduce.
13 Jun 2017 Scala updated to version 2.12.2
General purpose language; multiparadigm (object-oriented, functional, concurrent elements); statically typed, type-safe
12 Jun 2017 Comsol updated to version 5.3.a
The COMSOL Multiphysics engineering simulation software environment facilitates all steps in the modeling process − defining your geometry, meshing, specifying your physics, solving, and then visualizing your results.
12 Jun 2017 plastid updated to version 0.4.8
Position-wise analysis of sequencing and genomics data
6 Jun 2017 bds updated to version 0.99999l
BDS, or Big Data Script, is a s cross-system workflow language for working with big data pipelines in computer systems of different sizes and capabilities.
6 Jun 2017 picard updated to version 2.9.2
Picard comprises Java-based command-line utilities that manipulate SAM files, and a Java API (SAM-JDK) for creating new programs that read and write SAM files. Both SAM text format and SAM binary (BAM) format are supported.
5 Jun 2017 synapseclient updated to version 1.6.2
The synapseclient package provides an interface to Synapse, a collaborative workspace for reproducible, data intensive research projects
1 Jun 2017 quast updated to version 4.5
QUAST stands for QUality ASsessment Tool. The tool evaluates genome assemblies by computing various metrics. The package includes the general QUAST tool for genome assemblies, MetaQUAST, the extension for metagenomic datasets, and Icarus, interactive visualizer for these tools.
1 Jun 2017 svtyper updated to version 0.1.4
Svtyper is a Bayesian genotyper for structural variants.
1 Jun 2017 subread updated to version 1.5.2
High-performance read alignment, quantification and mutation discovery
1 Jun 2017 Schrodinger updated to version 2017.1
A limited number of Schrödinger applications are available on the Biowulf cluster through the Molecular Modeling Interest Group. Most are available through the Maestro GUI.
1 Jun 2017 alleleCount updated to version 3.2.2
Calculates genotype frequencies of a SNPMatrix. This component tests each SNP for its Hardy-Weinberg equilibrium. If there are NA values, the frequencies of missing value per sample in the input file are calculated.
31 May 2017 smart updated to version 2.1.5
Specific Methylation Analysis and Report Tool (SMART) uses the signal from bisulfite sequencing experiments across multiple samples to identify genome segments with similar methylation secificities.
31 May 2017 RepeatMasker updated to version 4.0.7
RepeatMasker is a program that screens DNA sequences for interspersed repeats and low complexity DNA sequences. The output of the program is a detailed annotation of the repeats that are present in the query sequence as well as a modified version of the query sequence in which all the annotated repeats have been masked (default: replaced by Ns). On average, almost 50% of a human genomic DNA sequence currently will be masked by the program.
31 May 2017 Julia updated to version 0.5.2
high level, dynamic language for technical computing
31 May 2017 genometools updated to version 1.5.9
collection of bioinformatic tools
30 May 2017 TRF updated to version 4.09
A tandem repeat in DNA is two or more adjacent, approximate copies of a pattern of nucleotides. Tandem Repeats Finder is a program to locate and display tandem repeats in DNA sequences.
30 May 2017 agrep updated to version 0.8.0-6fb7206
approximate GREP for fast fuzzy string searching. This is the TRE implementation of the tool. TRE is a lightweight, robust, and efficient POSIX compliant regexp matching library with some special features such as approximate (fuzzy) matching.
30 May 2017 EMAN2 updated to version 2.2
EMAN2 is a broadly based greyscale scientific image processing suite with a primary focus on processing data from transmission electron microscopes.
30 May 2017 PartekFlow updated to version 6.0.17.0514
Web interface designed specifically for the analysis needs of next generation sequencing applications including RNA, small RNA, and DNA sequencing.
25 May 2017 htseq updated to version 0.7.2
HTSeq is a Python package that provides infrastructure to process data from high-throughput sequencing assays.
24 May 2017 GAMESS updated to version 20Apr17-R1-sockets
GAMESS is a general ab initio quantum chemistry package.
24 May 2017 SVPV updated to version 1.01
SVPV (Structural Variant Prediction Viewer) enables visualisation of predicted structural variant regions in paired-end whole genome sequencing alignments, and allows comparison of calls from differenct structural variant prediction algorithms.
23 May 2017 parpipe updated to version current
Complete analysis pipeline for PAR-CLIP data
22 May 2017 jq updated to version 1.5
Command line json processor
22 May 2017 apt updated to version 1.19.0
apt - Affymetrix Power Tools - is a set of cross-platform command line programs that implement algorithms for analyzing and working with Affymetrix GeneChipR arrays.
17 May 2017 gem updated to version 3.0
High resolution peak calling and motif discovery for ChIP-seq and ChIP-exo data
16 May 2017 crossmap updated to version 0.2.6
CrossMap is a program for convenient conversion of genome coordinates between different assemblies (e.g. mm9->mm10). It can convert SAM, BAM, bed, GTF, GFF, wig/bigWig, and VCF files
15 May 2017 SOAP3-dp updated to version 2.3.178+20170103
SOAP3-dp is a GPU-based software for aligning short reads to a reference sequence. It improves on SOAP3 in terms of both speed and sensitivity by exploitation of whole-genome indexing and dynamic programming on a GPU. SOAP3 is limited to find alignments with at most 4 mismatches, while SOAP3-dp can find alignments involving mismatches, INDELs, and small gaps. The number of reads aligned, especially for paired-end data, typically increases 5 to 10 percent from SOAP3 to SOAP3-dp.
15 May 2017 cellranger updated to version 2.0.0
Cell Ranger is a set of analysis pipelines that processes Chromium single cell 3’ RNA-seq output to align reads, generate gene-cell matrices and perform clustering and gene expression analysis.
12 May 2017 I-TASSER updated to version 5.1
I-TASSER (Iterative Threading ASSEmbly Refinement) is a hierarchical approach to protein structure and function prediction.
10 May 2017 Matlab updated to version 9.2.0.556344
MATLAB is a high-performance interactive software package for scientific and engineering numeric computation. MATLAB integrates numerical analysis, matrix computation, signal processing, and graphics in an environment where problems and solutions are expressed just as they are written mathematically.
10 May 2017 deeptools updated to version 2.5.0.1
deepTools is a suite of user-friendly tools for the visualization, quality control and normalization of data from deep-sequencing DNA sequencing experiments.
9 May 2017 RELION updated to version 2.0.6
RELION (for REgularised LIkelihood OptimisatioN) is a stand-alone computer program for Maximum A Posteriori refinement of (multiple) 3D reconstructions or 2D class averages in cryo-electron microscopy.
9 May 2017 bison updated to version 62bf61f7
BISON is a bisulfite-converted short-read aligner that can natively utilize high-performance computing clusters to increase speed.
8 May 2017 fastq_screen updated to version 0.11.1
FastQ Screen allows you to screen a library of sequences in FastQ format against a set of sequence databases so you can see if the composition of the library matches with what you expect.
4 May 2017 DNAWorks updated to version 3.2.4
DNAWorks is a computer program that automates the design of oligonucleotides for gene synthesis by PCR-based gene assembly. The program requires simple input information: an amino acid sequence of the target protein or a DNA sequence, and a desired annealing temperature. It is a web-based tool available at https://hpcwebapps.cit.nih.gov/dnaworks/.
4 May 2017 vasttools updated to version 1.2.0
A toolset for profiling alternative splicing events in RNA-Seq data.
3 May 2017 meka updated to version 1.9.1
A Multi-label Extension to WEKA
3 May 2017 GCC updated to version 7.1.0
The GNU Compiler Collection includes front ends for C, C++, Objective-C, Fortran, and Go, as well as libraries for these languages (libstdc++, libgfortran,...)
3 May 2017 methylflow updated to version 0.1.0-pre
Cell-specific methylation pattern reconstruction
3 May 2017 multiqc updated to version 0.9
aggregates results for various frequently used bioinformatics tools across multiple samples into a nice visual report
2 May 2017 parallel updated to version 20170422
GNU parallel is a shell tool for executing jobs in parallel using one or more computers.
2 May 2017 emspring updated to version 0.84
SPRING (Single Particle Reconstruction from Images of kN own Geometry) is a single-particle based helical reconstruction package for electron cryo-micrographs and has been used to determine 3D structures of a variety of highly ordered and less ordered specimens.
28 Apr 2017 bedops updated to version 2.4.26
Bedops is a suite of tools to address common questions raised in genomic studies - mostly with regard to overlap and proximity relationships between data sets - BEDOPS aims to be scalable and flexible, facilitating the efficient and accurate analysis and management of large-scale genomic data.
27 Apr 2017 STAR updated to version 2.5.3a
Spliced Transcripts Alignment to a Reference
27 Apr 2017 freebayes updated to version 1.1.0
Bayesian haplotype-based polymorphism discovery and genotyping
27 Apr 2017 hap.py updated to version 0.3.7
A set of programs based on htslib to benchmark variant calls against gold standard truth datasets.
27 Apr 2017 rilseq updated to version 0.47
RILseq computational protocol
26 Apr 2017 graph-tool updated to version 2.22
Graph-tool is an efficient Python module for manipulation and statistical analysis of graphs (a.k.a. networks).
26 Apr 2017 viennarna updated to version 2.3.5
RNA Secondary Structure Prediction and Comparison
26 Apr 2017 Gemini updated to version 0.20.0
GEMINI (GEnome MINIng) is designed to be a flexible framework for exploring genetic variation in the context of the wealth of genome annotations available for the human genome. By placing genetic variants, sample genotypes, and useful genome annotations into an integrated database framework, GEMINI provides a simple, flexible, yet very powerful system for exploring genetic variation for for disease and population genetics.
26 Apr 2017 FSL updated to version 5.0.10
FSL is a comprehensive library of image analysis and statistical tools for FMRI, MRI and DTI brain imaging data.
25 Apr 2017 vcf2db updated to version 7dfc48a
vcf2db creates a gemini-compatible database from a VCF.
25 Apr 2017 annogesic updated to version 0.5.6
ANNOgesic is a transcriptome annotation pipeline for RNA-seq.
24 Apr 2017 plink updated to version 1.9.0-beta4.1
PLINK is whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.
21 Apr 2017 Phenix updated to version 1.11.1-2575
PHENIX is a software suite for the automated determination of macromolecular structures using X-ray crystallography and other methods.
20 Apr 2017 snakemake updated to version 3.11.2
Snakemake aims to reduce the complexity of creating workflows by providing a fast and comfortable execution environment, together with a clean and modern domain specific specification language (DSL) in python style. It is well suited for bioinformatic workflows.
20 Apr 2017 JUMPg updated to version 2.3.1
JUMPg is a proteogenomics software pipeline for analyzing large mass spectrometry (MS) and functional genomics datasets. The pipeline includes customized database building, tag-based database search, peptide-spectrum match filtering, and data visualization.
20 Apr 2017 stacks updated to version 1.46
Stacks is a software pipeline for building loci from short-read sequences, such as those generated on the Illumina platform. Stacks was developed to work with restriction enzyme-based data, such as RAD-seq, for the purpose of building genetic maps and conducting population genomics and phylogeography.
19 Apr 2017 Rosetta updated to version 2017.13
The Rosetta++ software suite can perform de novo protein structure predictions, identify low free energy sequences for target protein backbones, predict the structure of a protein-protein complex from the individual structures of the monomer components, incorporate NMR data into the basic Rosetta protocol to accelerate the process of NMR structure prediction, and more...
19 Apr 2017 samtools updated to version 1.4
The samtools package now provides samtools, bcftools, tabix, and the underlying htslib library.
19 Apr 2017 cellprofiler updated to version 2.2.0
An open-source application for biological image analysis
18 Apr 2017 IDR updated to version 2.0.3
The IDR (Irreproducible Discovery Rate) framework is a unified approach to measure the reproducibility of findings identified from replicate experiments and provide highly stable thresholds based on reproducibility. The IDR method compares a pair of ranked lists of identifications (such as ChIP-seq peaks).
18 Apr 2017 scallop updated to version 0.9.8
Scallop is a reference-based transcript assembler.
17 Apr 2017 crispresso updated to version 1.0.5
Software pipeline for the analysis of CRISPR-Cas9 genome editing outcomes from sequencing data
17 Apr 2017 bcl2fastq updated to version 2.19.0
a tool to handle bcl conversion and demultiplexing
17 Apr 2017 PRANK updated to version 150803
PRANK is a probabilistic multiple alignment program for DNA, codon and amino-acid sequences. PRANK is based on a novel algorithm that treats insertions correctly and avoids over-estimation of the number of deletion events.
13 Apr 2017 gffcompare updated to version 0.9.8
gffcompare can be used to compare and evaluate the accuracy of RNA-Seq transcript assemblers (Cufflinks, Stringtie). It can collapse (merge) duplicate transcripts from multiple GTF/GFF3 files (e.g. resulted from assembly of different samples) and classify transcripts from one or multiple GTF/GFF3 files as they relate to reference transcripts provided in a annotation file (also in GTF/GFF3 format).
13 Apr 2017 manta updated to version 1.1.0
Structural variant and indel caller for mapped sequencing data
13 Apr 2017 strelka updated to version 2.7.1
Strelka is an analysis package designed to detect somatic SNVs and small indels from the aligned sequencing reads of matched tumor-normal samples.
13 Apr 2017 genesis updated to version 2.4
GENESIS (GEneral NEural SImulation System) is a software platform for the simulation of neural systems ranging from subcellular components and biochemical reactions to complex models of single neurons, large networks, and systems-level processes.
12 Apr 2017 fuma updated to version 3.0.5
Fuma reporting overlap in RNA-seq detected fusion genes
11 Apr 2017 PRINSEQ updated to version 0.20.4
PRINSEQ is a tool that generates summary statistics of sequence and quality data and that is used to filter, reformat and trim next-generation sequence data. It is particular designed for 454/Roche data, but can also be used for other types of sequence data.
11 Apr 2017 RNAmmer updated to version 1.2
RNAmmer predicts ribosomal RNA genes in full genome sequences by utilising two levels of Hidden Markov Models: An initial spotter model searches both strands. The spotter model is constructed from highly conserved loci within a structural alignment of known rRNA sequences. Once the spotter model detects an approximate position of a gene, flanking regions are extracted and parsed to the full model which matches the entire gene.
11 Apr 2017 PePr updated to version 1.1.18
PePr is a ChIP-Seq Peak-calling and Prioritization pipeline that uses a sliding window approach and models read counts across replicates and between groups with a negative binomial distribution.
11 Apr 2017 TMHMM updated to version 2.0c
TMHMM predicts transmembrane helices in proteins.
10 Apr 2017 signalp updated to version 4.1
SignalP predicts the presence and location of signal peptide cleavage sites in amino acid sequences from different organisms: Gram-positive bacteria, Gram-negative bacteria, and eukaryotes. The method incorporates a prediction of cleavage sites and a signal peptide/non-signal peptide prediction based on a combination of several artificial neural networks.
10 Apr 2017 hgvs updated to version 1.0.0
The hgvs package provides a Python library to facilitate the use of genome, transcript, and protein variants that are represented using the Human Genome Variation Society (varnomen) recommendations. To use, type module load hgvs prior to calling python.
10 Apr 2017 Solar updated to version 8.2.0
SOLAR is a program for multipoint, oligogenic, variance component linkage analysis in pedigrees of arbitrary size and complexity (Almasy L; Blangero J, 1998).
10 Apr 2017 methylQA updated to version 0.1.8
methylQA is a methylation sequencing data quality assessment tool for MeDIP-seq and MRE-seq. It provides basic mapping status of next generating sequencing data, like number of total reads, number of mapped reads, etc. It also provides CpG status information such as how many CpG have been covered by one experiment, how many times one CpG have been covered, etc. methylQA can also process general ChIP-seq data like Histone/TF ChIP-seq data, generate read density and mapping statistics.
10 Apr 2017 ORFfinder updated to version 0.4.0
ORF finder searches for open reading frames (ORFs) in the DNA sequence you enter. The program returns the range of each ORF, along with its protein translation. Use ORF finder to search newly sequenced DNA for potential protein encoding segments, verify predicted protein using newly developed SMART BLAST or regular BLASTP.
7 Apr 2017 THetA updated to version 0.7-6-g4f12904
Tumor Heterogeneity Analysis (THetA) is an algorithm used to estimate tumor purity and clonal/subclonal copy number aberrations simultaneously from high-throughput DNA sequencing data.
7 Apr 2017 mc updated to version 4.8.19
GNU Midnight Commander is a visual file manager, with a feature rich full-screen text mode application that allows you to copy, move and delete files and whole directory trees, search for files and run commands in the subshell. Type module load mc and then the command mc to get started.
6 Apr 2017 stringtie updated to version 1.3.3
StringTie is a fast and highly efficient assembler of RNA-Seq alignments into potential transcripts. It is primarily a genome-guided transcriptome assembler, although it can borrow algorithmic techniques from de novo genome assembly to help with transcript assembly.
4 Apr 2017 HLA-PRG-LA updated to version f0833ed
Stands for HLA PRG, linear approximation. The basic idea is to seed graph alignments with linear alignments to the sequences that the graph consists of.
4 Apr 2017 CSD updated to version 5.38
The Cambridge Structural Database is the world repository of small molecule crystal structures. Available on Helix only.
4 Apr 2017 tantan updated to version 13
A tool to mask low complexity and short period tandem repeats
4 Apr 2017 GiniClust updated to version 2017-03-22
GiniClust is a clustering method implemented in Python and R for detecting rare cell-types from large-scale single-cell gene expression data.
3 Apr 2017 MySQL updated to version 5.5.54
MySQL is an open-source relational database management system.
31 Mar 2017 iSAAC updated to version 02.16.03.09
iSAAC is an ultrafast DNA sequence aligner (Isaac Genome Alignment Software) that takes advantage of high-memory hardware (>48 GB) and variant caller (Isaac Variant Caller)
Scientific Databases updated in last 3 months
For a full list of scientific databases available on the NIH HPC systems, see this page

Updated Database Format Location
27 Jun 2017Protein Data BankFasta/fdb/fastadb/pdb.nt.fas
27 Jun 2017NCBI ntFasta/fdb/fastadb/nt.fas
27 Jun 2017MitoFasta/fdb/fastadb/mito.nt.fas
27 Jun 2017SwissProtFasta/fdb/fastadb/swissprot.aa.fas
27 Jun 2017Protein Data BankFasta/fdb/fastadb/pdb.aa.fas
27 Jun 2017MitoFasta/fdb/fastadb/mito.aa.fas
27 Jun 2017NCBI nrFasta/fdb/fastadb/nr.aa.fas
27 Jun 2017Protein Data BankPDB/pdb/pdb
26 Jun 2017Protein Data BankBlast/fdb/blastdb/pdbaa
23 Jun 2017SwissProtBlast/fdb/blastdb/swissprot
23 Jun 2017Protein Data BankBlast/fdb/blastdb/pdbnt
20 Jun 2017Refseq Human GenomicFasta/fdb/fastadb/ref.human.genomic.fas
20 Jun 2017Refseq Other GenomicFasta/fdb/fastadb/ref.other.genomic.fas
19 Jun 2017NCBI ntBlast/fdb/blastdb/nt
18 Jun 2017HTGsBlast/fdb/blastdb/htgs
18 Jun 201716S MicrobialBlast/fdb/blastdb/16SMicrobial
16 Jun 2017Refseq Human GenomicBlast/fdb/blastdb/human_genomic
09 Jun 2017Mouse Genome (Mus musculus) mm8MySQLNIH mirror of UCSC Genome Browser
09 Jun 2017NCBI nrBlast/fdb/blastdb/nr
08 Jun 2017EST - othersBlast/fdb/blastdb/est_others
24 May 2017Refseq Other GenomicBlast/fdb/blastdb/other_genomic
14 Apr 2017Rat Genome (Rattus norvegicus) rn5MySQLNIH mirror of UCSC Genome Browser
31 Mar 2017Rat Genome (Rattus norvegicus) rn4MySQLNIH mirror of UCSC Genome Browser