High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
Application updates in the last 3 months
To see all versions available for any application, use module avail application_name
All centrally-installed applications are listed on the Applications page
Updated Application
19 May 2018 paraview updated to version 5.4.1
ParaView is an open-source, multi-platform data analysis and visualization application.
19 May 2018 steme updated to version 1.9.1
An efficient accurate motif finder based on MEME and implemented using suffix arrays.
18 May 2018 treemix updated to version 1.12
TreeMix is a method for inferring the patterns of population splits and mixtures in the history of a set of populations.
17 May 2018 Mendel updated to version 16.0
Mendel is a comprehensive package for exact statistical genetic analysis of qualitative and quantitative traits.
17 May 2018 meka updated to version 1.9.2
A Multi-label Extension to WEKA
17 May 2018 mfold updated to version 3.6
MFOLD predicts DNA and RNA secondary structure.
17 May 2018 PyLOH updated to version 1.4.3
Deconvolving tumor purity and ploidy by integrating copy number alterations and loss of heterozygosity
17 May 2018 loki updated to version 2.4.7_4
Loki is a linkage analysis package, primarily for large and complex pedigrees, which uses Markov chain Monte Carlo (MCMC) techniques to avoid many of the computational problems that prevent exact computational methods being used for large pedigrees.
16 May 2018 OpenCV updated to version 3.4.1
OpenCV (Open Source Computer Vision Library) is an open source computer vision and machine learning software library. OpenCV was built to provide a common infrastructure for computer vision applications and to accelerate the use of machine perception in the commercial products.
16 May 2018 lefse updated to version 1.0.7
LEfSe (Linear discriminant analysis Effect Size) determines the features (organisms, clades, operational taxonomic units, genes, or functions) most likely to explain differences between classes by coupling standard tests for statistical significance with additional tests encoding biological consistency and effect relevance.
16 May 2018 kneaddata updated to version 0.7.0
KneadData is a tool designed to perform quality control on metagenomic and metatranscriptomic sequencing data, especially data from microbiome experiments.
16 May 2018 kaiju updated to version 1.6.2
Kaiju is a program for the taxonomic classification of high-throughput sequencing reads, e.g., Illumina or Roche/454, from whole-genome sequencing of metagenomic DNA.
16 May 2018 fqtools updated to version 2.0
Tools for manipulating fastq files
16 May 2018 SimNIBS updated to version 2.0.1
SimNIBS 2.0 is a free software package for the Simulation of Non-invasive Brain Stimulation. It allows for realistic calculations of the electric field induced by transcranial magnetic stimulation (TMS) and transcranial direct current stimulation (tDCS).
15 May 2018 fusioncatcher updated to version 1.00
FusionCatcher searches for novel/known somatic fusion genes, translocations, and chimeras in RNA-seq data (paired-end or single-end reads from Illumina NGS platforms like Solexa/HiSeq/NextSeq/MiSeq) from diseased samples.
15 May 2018 bioawk updated to version 1.0
Regular awk with support for several common biological data formats, including optionally gzip'ed BED, GFF, SAM, VCF, FASTA/Q and TAB-delimited formats with column names.
15 May 2018 CAVIAR updated to version 2.2
CAVIAR (CAusal Variants Identication in Associated Regions) is a statistical framework that quantifies the probability of each variant to be causal while allowing with arbitrary number of causal variants
15 May 2018 elastix updated to version 4.9
a toolbox for rigid and nonrigid registration of images.
15 May 2018 samtools updated to version 1.8
The samtools package now provides samtools, bcftools, tabix, and the underlying htslib library.
14 May 2018 minimap2 updated to version 2.10
Minimap2 is a fast sequence mapping and alignment program that can find overlaps between long noisy reads, or map long reads or their assemblies to a reference genome optionally with detailed alignment (i.e. CIGAR).
14 May 2018 weblogo updated to version 3.6
contains seqlogo utility to create sequence logo summarizing sequence alignments
14 May 2018 tRNAscan-SE updated to version 2.0.0
tRNAscan-SE 2.0 has advanced the state-of-the-art methodology in tRNA gene detection and functional prediction, captured by rich new content of the companion Genomic tRNA Database
12 May 2018 freebayes updated to version 1.2.0
Bayesian haplotype-based polymorphism discovery and genotyping
11 May 2018 IDL/ENVI updated to version 8.5/5.3
IDL and ENVI are a complete computing environment for the interactive analysis and visualization of data. IDL integrates an array-oriented language with mathematical analysis and graphical display techniques. ENVI is designed for extracting information from geospatial and medical imagery.
11 May 2018 mrtrix updated to version 3.0_RC2
MRtrix provides a large suite of tools for image processing, analysis and visualisation, with a focus on the analysis of white matter using diffusion-weighted MRI.
10 May 2018 kallisto updated to version 0.44.0
kallisto is a program for quantifying abundances of transcripts from RNA-Seq data, or more generally of target sequences using high-throughput sequencing reads. It is based on the novel idea of pseudoalignment for rapidly determining the compatibility of reads with targets, without the need for alignment.
10 May 2018 rmblast updated to version 2.6.0.2
RMBlast is a RepeatMasker-compatible version of the standard NCBI blastn program. RMBlast supports RepeatMasker searches by adding a few necessary features to the stock NCBI blastn program.
10 May 2018 CD-HIT updated to version 4.6.8
CD-HIT is a very widely used program for clustering and comparing protein or nucleotide sequences.
10 May 2018 FastQTL updated to version 2.184
In order to discover quantitative trait loci (QTLs), multi-dimensional genomic datasets combining DNA-seq and ChiP-/RNA-seq require methods that rapidly correlate tens of thousands of molecular phenotypes with millions of genetic variants while appropriately controlling for multiple testing. FastQTL implements a popular cis-QTL mapping strategy in a user- and cluster-friendly tool. FastQTL also proposes an efficient permutation procedure to control for multiple testing.
9 May 2018 Juicer updated to version 1.5.6
A One-Click System for Analyzing Loop-Resolution Hi-C Experiments
9 May 2018 Matlab updated to version 9.4.0.813654
MATLAB is a high-performance interactive software package for scientific and engineering numeric computation. MATLAB integrates numerical analysis, matrix computation, signal processing, and graphics in an environment where problems and solutions are expressed just as they are written mathematically.
8 May 2018 mosdepth updated to version 0.2.3
Fast BAM/CRAM depth calculation for WGS, exome, or targeted sequencing.
8 May 2018 cromwell updated to version 31
A Workflow Management System geared towards scientific workflows.
8 May 2018 Clairvoyante updated to version 0.1
The accurate identification of DNA sequence variants is particularly difficult for single molecule sequencing, which has a high per-nucleotide error rate (~5%-15%). Clairvoyante implements a multitask five-layer convolutional neural network model for predicting variant type (SNP or indel), zygosity, alternative allele and indel length from aligned reads. Using well-characterized tesing data, Clairvoyante achieved 99.73%, 97.68% and 95.36% precision on known variants, and 98.65%, 92.57%, 77.89% F1-score for whole-genome analysis, using Illumina, PacBio, and Oxford Nanopore data, respectively.
8 May 2018 phase updated to version 2.1.1
infers haplotypes from population genotype data
8 May 2018 clark updated to version 1.2.5
A method based on a supervised sequence classification using discriminative k-mers
7 May 2018 casper updated to version 0.8.2
CASPER (Context-Aware Scheme for Paired-End Read) is state-of-the art merging tool in terms of accuracy and robustness. Using this sophisticated merging method, we could get elongated reads from the forward and reverse reads.
7 May 2018 PyCharm updated to version 2018.1.2
A Python IDE
7 May 2018 mageck-vispr updated to version 0.5.4
MAGeCK-VISPR is a comprehensive quality control, analysis and visualization workflow for CRISPR/Cas9 screens.
7 May 2018 pyclone updated to version 0.13.1
PyClone is statistical model and software tool designed to infer the prevalence of point mutations in heterogeneous cancer samples.
7 May 2018 ChromHMM updated to version 1.15
ChromHMM is software for learning and characterizing chromatin states.
7 May 2018 sailfish updated to version 0.10.0
Sailfish is a tool for transcript quantification from RNA-seq data. It requires a set of target transcripts (either from a reference or de-novo assembly) to quantify. All that is needed to run sailfish is a fasta file containing your reference transcripts and a (set of) fasta/fastq file(s) containing your RNA-Seq reads.
6 May 2018 roary updated to version 3.12.0
Roary is a high speed stand alone pan genome pipeline, which takes annotated assemblies in GFF3 format (produced by Prokka) and calculates the pan genome.
4 May 2018 lofreq updated to version 2.1.3.1
LoFreq is a fast and sensitive variant-caller for inferring SNVs and indels from next-generation sequencing data.
4 May 2018 bali-phy updated to version 3.1
BAli-Phy is MCMC software developed by Ben Redelings with Marc Suchard for simultaneous Bayesian estimation of alignment and phylogeny (and other parameters). It handles generic Bayesian modeling via probabilistic programming.
4 May 2018 pyDNase updated to version 0.2.6
pyDNase is a suite of tools for analysing DNase-seq data - pyDNase comes with several analysis scripts covering several common use cases of DNase-seq analysis, and also an implementation of the Wellington, Wellington 1D, and Wellington-boostrap footprinting algorithms.
3 May 2018 fastqtools updated to version 0.8
fastq-tools a collection of small and efficient programs for performing some common and uncommon tasks with FASTQ files.
3 May 2018 atac_dnase_pipelines updated to version 0.3.4-19-gcbd2a00
This pipeline is designed for automated end-to-end quality control and processing of ATAC-seq or DNase-seq data
3 May 2018 leveldb updated to version 1.20
LevelDB is a fast key-value storage library written at Google that provides an ordered mapping from string keys to string values.
2 May 2018 lancet updated to version 1.0.7
Lancet is a somatic variant caller (SNVs and indels) for short read data.
2 May 2018 snappy updated to version 1.1.7
Snappy is a compression/decompression library. It does not aim for maximum compression, or compatibility with any other compression library; instead, it aims for very high speeds and reasonable compression.
2 May 2018 bison updated to version 0.4.0
BISON is a bisulfite-converted short-read aligner that can natively utilize high-performance computing clusters to increase speed.
2 May 2018 locuszoom updated to version 1.3
LocusZoom is designed to facilitate viewing of local association results together with useful information about a locus, such as the location and orientation of the genes it includes, linkage disequilibrium coefficients and local estimates of recombination rates
2 May 2018 tetoolkit updated to version 1.5.1
A package for including transposable elements in differential enrichment analysis of sequencing datasets.
2 May 2018 Octave updated to version 4.4.0
GNU Octave is a high-level language, primarily intended for numerical computations. It provides a convenient command line interface for solving linear and nonlinear problems numerically, and for performing other numerical experiments using a language that is mostly compatible with Matlab.
2 May 2018 poretools updated to version 0.6.1a1
Poretools is a toolkit for manipulating and exploring nanopore sequencing data sets. Poretools operates on individual FAST5 files, directory of FAST5 files, and tar archives of FAST5 files.
2 May 2018 jq updated to version 1.5
Command line json processor
2 May 2018 cisTEM updated to version 1.0.0-beta
cisTEM is user-friendly software to process cryo-EM images of macromolecular complexes and obtain high-resolution 3D reconstructions from them.
1 May 2018 iva updated to version 1.0.3
IVA is a de novo assembler designed to assemble virus genomes that have no repeat sequences, using Illumina read pairs sequenced from mixed populations at extremely high and variable depth.
1 May 2018 smalt updated to version 0.7.6
SMALT efficiently aligns DNA sequencing reads with a reference genome.
1 May 2018 pindel updated to version 0.2.5b8
Pindel can detect breakpoints of large deletions, medium sized insertions, inversions, tandem duplications and other structural variants at single-based resolution from next-gen sequence data. It uses a pattern growth approach to identify the breakpoints of these variants from paired-end short reads.
1 May 2018 KMC updated to version 3.0.0
KMC is a disk-based programm for counting k-mers from (possibly gzipped) FASTQ/FASTA files
1 May 2018 porechop updated to version 0.2.3
Trim/demultiplex Oxford Nanopore reads
30 Apr 2018 dcm2niix updated to version 1.0.20171215
DICOM to NIfTI converter
30 Apr 2018 Rstudio updated to version 0.98
RStudio is a set of integrated tools designed to help you be more productive with R. It includes a console, syntax-highlighting editor that supports direct code execution, as well as tools for plotting, history, debugging and workspace management.
30 Apr 2018 somaticsniper updated to version 1.0.5.0
The purpose of this program is to identify single nucleotide positions that are different between tumor and normal (or, in theory, any two bam files). It takes a tumor bam and a normal bam and compares the two to determine the differences. It outputs a file in a format very similar to Samtools consensus format.
28 Apr 2018 STAR-Fusion updated to version 1.3.2
Transcript fusion detection
27 Apr 2018 genesis updated to version 2.4
GENESIS (GEneral NEural SImulation System) is a software platform for the simulation of neural systems ranging from subcellular components and biochemical reactions to complex models of single neurons, large networks, and systems-level processes.
27 Apr 2018 fcgene updated to version 1.0.7
FCgene is a Format Converting tool for genotyped Data (e.g.PLINK-MACH,MACH-PLINK)
27 Apr 2018 LMDB updated to version 0.9.14
LMDB is an ultra-fast, ultra-compact key-value embedded data store developed by Symas for the OpenLDAP Project. It uses memory-mapped files, so it has the read performance of a pure in-memory database while still offering the persistence of standard disk-based databases, and is only limited to the size of the virtual address space, (it is not limited to the size of physical RAM).
27 Apr 2018 EDirect updated to version 8.60
Entrez Direct (EDirect) is an advanced method for accessing the NCBI's set of interconnected databases (publication, sequence, structure, gene, variation, expression, etc.) from a UNIX terminal window.
27 Apr 2018 singularity updated to version 2.5.0
Singularity is a container platform focused on supporting ``Mobility of Compute``. It allows users to emulate, and share custom Linux environments allowing for the creation of self-contained development stacks.
27 Apr 2018 asciinema updated to version 2.0.1
asciinema [as-kee-nuh-muh] is a free and open source solution for recording terminal sessions and sharing them.
Type 'module load asciinema' then 'asciinema' to run.
27 Apr 2018 Grace updated to version 5.1.25
Grace is a WYSIWYG 2D plotting tool for the X-Window system. It is a successor to Xmgr.
Type 'module load grace', then 'xmgrace' or 'gracebat' to run.
25 Apr 2018 bcbio-nextgen updated to version 1.0.9
Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis
24 Apr 2018 VEP updated to version 92
VEP (Variant Effect Predictor) determines the effect of your variants (SNPs, insertions, deletions, CNVs or structural variants) on genes, transcripts, and protein sequence, as well as regulatory regions.
24 Apr 2018 GEMMA updated to version 0.96
GEMMA is the software implementing the Genome-wide Efficient Mixed Model Association algorithm for a standard linear mixed model and some of its close relatives for genome-wide association studies (GWAS).
24 Apr 2018 crystfel updated to version 0.6.3
CrystFEL is a suite of programs for processing diffraction data acquired serially in a snapshot manner, such as when using the technique of Serial Femtosecond Crystallography (SFX) with a free-electron laser source.
23 Apr 2018 glog updated to version 0.3.5
The glog library implements application-level logging. This library provides logging APIs based on C++-style streams and various helper macros.
23 Apr 2018 gflags updated to version 2.2.1
The gflags package contains a library that implements commandline flags processing. It includes built-in support for C++ types like string and the ability to define flags in the source file in which they're used.
18 Apr 2018 smrtanalysis updated to version 5.1.0.26412
SMRT® Analysis is a bioinformatics software suite available for analysis of DNA sequencing data from Pacific Biosciences’ SMRT technology. Users can choose from a variety of analysis protocols that utilize PacBio® and third-party tools. Analysis protocols include de novo genome assembly, cDNA mapping, DNA base-modification detection, and long-amplicon analysis to determine phased consensus sequences.
18 Apr 2018 clustalo updated to version 1.2.4
Clustal-Omega is a general purpose multiple sequence alignment (MSA) program for proteins and DNA/RNA. It produces high quality MSAs and is capable of handling data-sets of hundreds of thousands of sequences in reasonable time.
18 Apr 2018 FFmpeg updated to version 3.4.2
A complete, cross-platform solution to record, convert and stream audio and video.
17 Apr 2018 emspring updated to version 0-86-1661
SPRING (Single Particle Reconstruction from Images of kN own Geometry) is a single-particle based helical reconstruction package for electron cryo-micrographs and has been used to determine 3D structures of a variety of highly ordered and less ordered specimens.
17 Apr 2018 MEGAN updated to version 6.11.1
MEtaGenome ANalyzer that takes a file of reads and a Blast output from comparison against a reference genome, and automatically calculate a taxonomic classification of the reads and if desired, a functional classification.
17 Apr 2018 Meep updated to version 1.4.3
Meep (or MEEP) is a free finite-difference time-domain (FDTD) simulation software package developed at MIT to model electromagnetic systems, along with the MPB eigenmode package.
17 Apr 2018 exomiser updated to version 10.0.1
The Exomiser is a Java program that functionally annotates variants from whole-exome sequencing data starting from a VCF file.
17 Apr 2018 Keras updated to version 2.1.5
Keras is a high-level neural networks API, written in Python and capable of running on top of TensorFlow, CNTK, or Theano. It was developed with a focus on enabling fast experimentation. Being able to go from idea to result with the least possible delay is key to doing good research.
16 Apr 2018 prokka updated to version 1.13
Prokka is a software tool for the rapid annotation of prokaryotic genomes.
16 Apr 2018 TeraStitcher updated to version 1.10.12
TeraStitcher is a free tool that enables the stitching of Teravoxel-sized tiled microscopy images even on workstations with relatively limited resources of memory (<8 GB) and processing power.
16 Apr 2018 supernova updated to version 2.0.1
Supernova generates highly-contiguous, phased, whole-genome de novo assemblies from a Chromium-prepared library.
16 Apr 2018 wuzz updated to version 0.4.0
Interactive cli tool for HTTP inspection
16 Apr 2018 Quantum Espresso (QE) updated to version 6.2.0-CPU
Quantum Espresso (QE) is an integrated suite of computer codes for electronic-structure calculations and materials modeling, based on density-functional theory, plane waves, and pseudopotentials (norm-conserving, ultrasoft, and projector-augmented wave). The Quantum Espresso distribution contains the core packages PWscf (Plane-Wave Self-Consistent Field) and CP (Car-Parrinello) for the calculation of electronic-structure properties within Density-Functional Theory (DFT), using a Plane-Wave (PW) basis set and pseudopotentials.
13 Apr 2018 Neuron updated to version 7.5
NEURON is a simulation environment for modeling individual neurons and networks of neurons. It provides tools for conveniently building, managing, and using models in a way that is numerically sound and computationally efficient. It is particularly well-suited to problems that are closely linked to experimental data, especially those that involve cells with complex anatomical and biophysical properties.
13 Apr 2018 protobuf updated to version 3.5.1
Protocol buffers are Google's language-neutral, platform-neutral, extensible mechanism for serializing structured data. Think XML, but smaller, faster, and simpler.
13 Apr 2018 Scipion updated to version 1.2
Scipion is an image processing framework to obtain 3D models of macromolecular complexes using Electron Microscopy (3DEM). It integrates several software packages and presents an unified interface for both biologists and developers. Scipion allows to execute workflows combining different software tools, while taking care of formats and conversions. Additionally, all steps are tracked and can be reproduced later on.
12 Apr 2018 EPACTS updated to version 3.2.6
EPACTS (Efficient and Parallelizable Association Container Toolbox) is a versatile software pipeline to perform various statistical tests for identifying genome-wide association from sequence data through a user-friendly interface, both to scientific analysts and to method developers.
12 Apr 2018 FastTree updated to version 2.1.10
FastTree infers approximately-maximum-likelihood phylogenetic trees from alignments of nucleotide or protein sequences. FastTree can handle alignments with up to a million of sequences in a reasonable amount of time and memory. For large alignments, FastTree is 100-1,000 times faster than PhyML 3.0 or RAxML 7.
11 Apr 2018 FSL updated to version 5.0.11
FSL is a comprehensive library of image analysis and statistical tools for FMRI, MRI and DTI brain imaging data.
10 Apr 2018 mirdeep2 updated to version 2.0.0.8
miRDeep2 is a completely overhauled tool which discovers microRNA genes by analyzing sequenced RNAs.
10 Apr 2018 randfold updated to version 2.0
RandFold computes the probability that, for a given RNA sequence, the Minimum Free Energy (MFE) of the secondary structure is different from a distribution of MFE computed with random sequences.
9 Apr 2018 cutadapt updated to version 1.16
cutadapt removes adapter sequences from DNA high-throughput sequencing data. This is usually necessary when the read length of the machine is longer than the molecule that is sequenced, such as in microRNA data.
9 Apr 2018 nextflow updated to version 0.28.2
Data-driven computational pipelines
9 Apr 2018 mash updated to version 2.0
mash is a command line tool and library to provide fast genome and metagenome distance estimation using MinHash. Only command line tool is installed
9 Apr 2018 aria2 updated to version 1.33.1
multiprotocol download utility
Type 'module load aria2'then 'aria2c --help' for more info.
7 Apr 2018 asciigenome updated to version 1.13.0
ASCIIGenome is a text-only command line genome browser.
7 Apr 2018 humann2 updated to version 0.11.1
HUMAnN is a pipeline for efficiently and accurately profiling the presence/absence and abundance of microbial pathways in a community from metagenomic or metatranscriptomic sequencing data (typically millions of short DNA/RNA reads).
6 Apr 2018 abyss updated to version 2.0.3
Abyss represents Assembly By Short Sequences - a de novo, parallel, paired-end sequence assembler. The parallel version is implemented using MPI and is capable of assembling larger genomes.
5 Apr 2018 mitosuite updated to version 1.0.9b
mitosuite is a graphical tool for human mitochondrial genome profiling in massively parallel sequencing
4 Apr 2018 ResMap updated to version 1.9
ResMap (Resolution Map) is a Python (NumPy/SciPy) application with a Tkinter GUI and a command-line interface. It is a software package for computing the local resolution of 3D density maps studied in structural biology, primarily electron cryo-microscopy (cryo-EM).
4 Apr 2018 xHLA updated to version 2018-04-04
The HLA gene complex on human chromosome 6 is one of the most polymorphic regions in the human genome and contributes in large part to the diversity of the immune system. Accurate typing of HLA genes with short-read sequencing data has historically been difficult due to the sequence similarity between the polymorphic alleles. xHLA iteratively refines the mapping results at the amino acid level to achieve high typing accuracy for both class I and II HLA genes.
3 Apr 2018 golang updated to version 1.10.1
The Go programming language
3 Apr 2018 gzip updated to version 1.9
the data compression program
3 Apr 2018 flye updated to version 2.3.3
Fast and accurate de novo assembler for single molecule sequencing reads
3 Apr 2018 abruijn updated to version 1.0
ABruijn is a assembler for long reads from, for example, PacBio and Oxford Nanopore Technologies sequencers.
2 Apr 2018 defuse updated to version 0.8.1
deFuse is a software package for gene fusion discovery using RNA-Seq data. The software uses clusters of discordant paired end alignments to inform a split read alignment analysis for finding fusion boundaries. The software also employs a number of heuristic filters in an attempt to reduce the number of false positives and produces a fully annotated output for each predicted fusion.
30 Mar 2018 Arioc updated to version 1.24
A GPU-accelerated short read aligner
30 Mar 2018 vcfanno updated to version 0.2.9
annotate a VCF with other VCFs/BEDs/tabixed files
28 Mar 2018 merlin updated to version 1.1.2
MERLIN uses sparse trees to represent gene flow in pedigrees and is one of the fastest pedigree analysis packages around.
27 Mar 2018 PRSice updated to version 2.1.1.beta
PRSice is a Polygenic Risk Score software for calculating, applying, evaluating and plotting the results of polygenic risk scores (PRS) analyses.
27 Mar 2018 TVC updated to version 5.8.0
TVC is the standalone Torrent Variant Caller, part of the Ion Torrent Suite.
26 Mar 2018 LongRanger updated to version 2.2.1
Long Ranger is a set of analysis pipelines that processes GemCode sequencing output to align reads and call and phase SNPs, indels, and structural variants Loupe is a genome browser designed to visualize the Linked-Read data produced by the 10x Chromium Platform.
26 Mar 2018 Q-Chem updated to version 5.0.1
Q-Chem is a comprehensive ab initio quantum chemistry package for accurate predictions of molecular structures, reactivities, and vibrational, electronic and NMR spectra.
26 Mar 2018 QIIME updated to version 2-2018.2
QIIME is an open source software package for comparison and analysis of microbial communities, primarily based on high-throughput amplicon sequencing data (such as SSU rRNA) generated on a variety of platforms, but also supporting analysis of other types of data (such as shotgun metagenomic data).
26 Mar 2018 Canu updated to version 1.7
Canu is a fork of the Celera Assembler designed for high-noise single-molecule sequencing (such as the PacBio RSII or Oxford Nanopore MinION). Canu will correct the reads, then trim suspicious regions (such as remaining SMRTbell adapter), then assemble the corrected and cleaned reads into unitigs.
26 Mar 2018 pear updated to version 0.9.11
PEAR is an ultrafast, memory-efficient and highly accurate pair-end read merger. It is fully parallelized and can run with as low as just a few kilobytes of memory.
24 Mar 2018 ACFS updated to version 20180316
ACFS is an Accurate CircRNA Finder Suite for discovering circRNAs from RNA-Seq data. CircRNAs are generated through splicing, or to be precise, back-splicing where the downstream splice donor attacks an upstream splice acceptor. Identifying the exact site of back-splice lies in the heart of circRNA discovery. No prior knowledge of gene annotation is needed for circRNA prediection. ACFS is designed for Single-end RNA-Seq reads. Paired-end data is also supported, albeit with lower sensitivity.
23 Mar 2018 pvactools updated to version 1.0.2
pVACtools is a cancer immunotherapy suite consisting of pVACseq, pVACfuse, pVACvector
23 Mar 2018 cnvnator updated to version 0.3.3
CNVnator is a tool for CNV discovery and genotyping from depth of read mapping.
23 Mar 2018 ROOT updated to version 6.13.02
The ROOT system provides a set of Object-Oriented frameworks with all the functionality needed to handle and analyse large amounts of data in a very efficient way.
23 Mar 2018 nirvana updated to version 2.0.4
Nirvana provides clinical-grade annotation of genomic variants (SNVs, MNVs, insertions, deletions, indels, and SVs (including CNVs). It can be run as a stand-alone package or integrated into larger software tools that require variant annotation.
23 Mar 2018 Ruby updated to version 2.5.0
A dynamic, open source programming language with a focus on simplicity and productivity
22 Mar 2018 conpair updated to version 10102016
Concordance and contamination estimator for tumor–normal pairs
22 Mar 2018 PePr updated to version 1.1.21
PePr is a ChIP-Seq Peak-calling and Prioritization pipeline that uses a sliding window approach and models read counts across replicates and between groups with a negative binomial distribution.
21 Mar 2018 AdmixTools updated to version 4.1
ADMIXTOOLS is a software package that supports formal tests of whether admixture occurred, and makes it possible to infer admixture proportions and dates.
21 Mar 2018 bamreadcount updated to version 0.8.0
Bam-readcount generates metrics at single nucleotide positions. There are number of metrics generated which can be useful for filtering out false positive calls.
21 Mar 2018 Julia updated to version 0.6.2
high level, dynamic language for technical computing
20 Mar 2018 rdfind updated to version 1.3.5
rdfind is a program that finds duplicate files. It is useful for compressing backup directories or just finding duplicate files. It compares files based on their content, NOT on their file names. After typing module load rdfind, type man rdfind for more information.
20 Mar 2018 squashfs-tools updated to version 4.3
Squashfs is a highly compressed read-only filesystem for Linux. Squashfs compresses both files, inodes and directories, and supports block sizes up to 1Mbytes for greater compression.
20 Mar 2018 ngsqctoolkit updated to version 2.3.3
A toolkit for the quality control (QC) of next generation sequencing (NGS) data.
20 Mar 2018 breakdancer updated to version 1.4.5
provides genome-wide detection of structural variants from next generation paired-end sequencing reads.
19 Mar 2018 viennarna updated to version 2.4.4
RNA Secondary Structure Prediction and Comparison
19 Mar 2018 Ghostscript updated to version 9.22
Ghostscript is an interpreter for the PostScript language and for PDF.
19 Mar 2018 IDBA updated to version 1.1.3
IDBA is a practical iterative De Bruijn Graph De Novo Assembler for sequence assembly in bioinfomatics. Most assemblers based on de Bruijn graph build a de Bruijn graph with a specific k to perform the assembling task. For all of them, it is very crucial to find a specific value of k. If k is too large, there will be a lot of gap problems in the graph. If k is too small, there will a lot of branch problems. IDBA uses not only one specific k but a range of k values to build the iterative de Bruijn graph. It can keep all the information in graphs with different k values.
19 Mar 2018 vt updated to version 0.577
vt is a variant tool set that discovers short variants from Next Generation Sequencing data.
19 Mar 2018 PyMOL updated to version 2.1.0
A comprehensive molecular visualization product for rendering and animating 3D molecular structures.
19 Mar 2018 miso updated to version 0.5.4
MISO (Mixture-of-Isoforms) is a probabilistic framework that quantitates the expression level of alternatively spliced genes from RNA-Seq data
15 Mar 2018 homer updated to version 4.9.1
HOMER (Hypergeometric Optimization of Motif EnRichment) is a suite of tools for Motif Discovery and ChIP-Seq analysis.
15 Mar 2018 shapeit updated to version 2.r904
SHAPEIT is a fast and accurate haplotype inference software
15 Mar 2018 minc-toolkit updated to version 1.9.16
This metaproject bundles multiple MINC-based packages that historically have been developed somewhat independently
15 Mar 2018 mafft updated to version 7.394
Multiple alignment program for amino acid or nucleotide sequences
14 Mar 2018 plinkseq updated to version 0.10
library for the analysis of genetic variation data
14 Mar 2018 spades updated to version 3.11.0
SPAdes – St. Petersburg genome assembler – is intended for both standard isolates and single-cell MDA bacteria assemblies.
14 Mar 2018 bam2fastq updated to version 1.1.0
This tool is used to extract raw sequences (with qualities) from bam files.
13 Mar 2018 KAT updated to version 2.4.0-h2
KAT (K-mer Analysis Toolkit) is a suite of tools that analyse Jellyfish hashes or sequence files (fasta or fastq) using kmer counts.
13 Mar 2018 ORFfinder updated to version 0.4.0
ORF finder searches for open reading frames (ORFs) in the DNA sequence you enter. The program returns the range of each ORF, along with its protein translation. Use ORF finder to search newly sequenced DNA for potential protein encoding segments, verify predicted protein using newly developed SMART BLAST or regular BLASTP.
13 Mar 2018 KmerGenie updated to version 1.7044
KmerGenie estimates the best k-mer length for genome de novo assembly.
13 Mar 2018 cnvkit updated to version 0.9.3
Copy number variant detection from targeted DNA sequencing
13 Mar 2018 Maven updated to version 3.5.3
Apache Maven is a software project management and comprehension tool. Based on the concept of a project object model (POM), Maven can manage a project's build, reporting and documentation from a central piece of information.
12 Mar 2018 stringtie updated to version 1.3.4
StringTie is a fast and highly efficient assembler of RNA-Seq alignments into potential transcripts. It is primarily a genome-guided transcriptome assembler, although it can borrow algorithmic techniques from de novo genome assembly to help with transcript assembly.
12 Mar 2018 trimgalore updated to version 0.4.5
Consistent quality and adapter trimming for RRBS or standard FastQ files.
12 Mar 2018 OpenBabel updated to version 2.4.1
Open Babel is a chemical toolbox designed to speak the many languages of chemical data.
12 Mar 2018 oases updated to version 0.2.09
oases is a de novo transcriptome assembler based on the Velvet genome assembler core.
12 Mar 2018 PyPy updated to version 5.10.0
PyPy is a fast, compliant alternative implementation of the Python language.
12 Mar 2018 EukRep updated to version 20180308
Microbial eukaryotes are integral components of natural microbial communities and their inclusion is critical for many ecosystem studies, yet the majority of published metagenome analyses ignore eukaryotes. In order to include eukaryotes in environmental studies, eukaryotic genomes shoould be recovered from complex metagenomic samples. A key step for genome recovery is separation of eukaryotic and prokaryotic fragments. EukRep is a kmer- and SVM-based strategy for eukaryotic sequence identification from environmental samples.
9 Mar 2018 matio updated to version 1.5.12
Matio is an open-source C library for reading and writing binary MATLAB MAT files. This library is designed for use by programs/libraries that do not have access or do not want to rely on MATLAB's shared libraries.
9 Mar 2018 summovie updated to version 1.0.2
Summovie calculates movie frame sums, using the alignment results from a prior run of Unblur.
9 Mar 2018 unblur updated to version 1.0.2
Unblur is used to align the frames of movies recorded on an electron microscope to reduce image blurring due to beam-induced motion.
9 Mar 2018 Gromacs updated to version 2018
Gromacs is a versatile package to perform molecular dynamics, i.e. simulate the Newtonian equations of motion for systems with hundreds to millions of particles. It is primarily designed for biochemical molecules like proteins and lipids that have a lot of complicated bonded interactions, but since GROMACS is extremely fast at calculating the nonbonded interactions (that usually dominate simulations) many groups are also using it for research on non-biological systems, e.g. polymers.
8 Mar 2018 deeptools updated to version 3.0.1
deepTools is a suite of user-friendly tools for the visualization, quality control and normalization of data from deep-sequencing DNA sequencing experiments.
8 Mar 2018 delly updated to version 0.7.8
DELLY is an integrated structural variant prediction method that can detect deletions, tandem duplications, inversions and translocations at single-nucleotide resolution in short-read massively parallel sequencing data. It uses paired-ends and split-reads to sensitively and accurately delineate genomic rearrangements throughout the genome.
8 Mar 2018 Gctf updated to version 1.06
Gctf provides accurate estimation of the contrast transfer function (CTF) for near-atomic resolution cryo electron microscopy (cryoEM) reconstruction using GPUs.
8 Mar 2018 velvet updated to version 1.2.10
Velvet is a de novo genomic assembler specially designed for short read sequencing technologies, such as Solexa or 454
8 Mar 2018 GAMESS updated to version 14Feb18-R1-sockets
GAMESS is a general ab initio quantum chemistry package.
8 Mar 2018 peakranger updated to version 1.18
A ChIP-Seq peak caller for narrow and broad peaks
7 Mar 2018 CSD updated to version 5.39
The Cambridge Structural Database is the world repository of small molecule crystal structures.
7 Mar 2018 cellranger updated to version 2.1.1
Cell Ranger is a set of analysis pipelines that processes Chromium single cell 3’ RNA-seq output to align reads, generate gene-cell matrices and perform clustering and gene expression analysis.
7 Mar 2018 vcf2maf updated to version 1.6.16
A smarter, more reproducible, and more configurable tool for converting a VCF to a MAF.
7 Mar 2018 BFC updated to version 1.0-7-g69ab176
BFC is a standalone tool for correcting sequencing errors from Illumina sequencing data. It is specifically designed for high-coverage whole-genome human data, though also performs well for small genomes.
7 Mar 2018 sratoolkit updated to version 2.9.0
The NCBI SRA Toolkit enables reading ("dumping") of sequencing files from the SRA database and writing ("loading") files into the .sra format.
7 Mar 2018 picard updated to version 2.17.11
Picard comprises Java-based command-line utilities that manipulate SAM files, and a Java API (SAM-JDK) for creating new programs that read and write SAM files. Both SAM text format and SAM binary (BAM) format are supported.
6 Mar 2018 RAxML updated to version 8.2.11
RAxML-VI-HPC (randomized axelerated maximum likelihood for high performance computing) is a sequential and parallel program for inference of large phylogenies with maximum likelihood (ML).
6 Mar 2018 snptest updated to version 2.5.4beta3
SNPTEST is a program for the analysis of single SNP association in genome-wide studies. The tests implemented include * Binary (case-control) phenotypes, single and multiple quantitative phenotypes * Bayesian and Frequentist tests * Ability to condition upon an arbitrary set of covariates * Various different methods for the dealing with imputed SNPs. The program is designed to work seamlessly with the output of both the genotype calling program CHIAMO, the genotype imputation program IMPUTE and the program GTOOL.
6 Mar 2018 hgvs updated to version 1.1.1
The hgvs package provides a Python library to facilitate the use of genome, transcript, and protein variants that are represented using the Human Genome Variation Society (varnomen) recommendations. To use, type module load hgvs prior to calling python.
6 Mar 2018 sambamba updated to version 0.6.7
Sambamba is a high performance modern robust and fast tool (and library), written in the D programming language, for working with SAM and BAM files. Current parallelised functionality is an important subset of samtools functionality, including view, index, sort, markdup, and depth.
6 Mar 2018 NGSutils updated to version 0.5.9
NGSUtils is a suite of software tools for working with next-generation sequencing datasets.
4 Mar 2018 Fiji updated to version 1.51s
Fiji Is Just ImageJ. It is a distribution of ImageJ (and ImageJ2) together with Java, Java3D and a lot of plugins.
2 Mar 2018 EMAN2 updated to version 2.21a
EMAN2 is a broadly based greyscale scientific image processing suite with a primary focus on processing data from transmission electron microscopes.
2 Mar 2018 PyTorch updated to version 0.2.0
PyTotch implements Tensors, which are conceptually identical to the Numpy multidimensional arrays, but, unlike the Numpy arrays, can be used on GPU nodes to accelerate numerical computations. It also implements automatic differentiation to automate the computation of backward passes in neural networks.
1 Mar 2018 GATK updated to version 4.0.2.0
GATK, from the Broad Institute, is a structured software library that makes writing efficient analysis tools using next-generation sequencing data very easy, and second it's a suite of tools for working with human medical resequencing projects such as 1000 Genomes and The Cancer Genome Atlas. These tools include things like a depth of coverage analyzers, a quality score recalibrator, a SNP/indel caller and a local realigner.
28 Feb 2018 Eigen updated to version 3.3.4
Eigen is a C++ template library for linear algebra: matrices, vectors, numerical solvers, and related algorithms.
28 Feb 2018 MUSCLE updated to version 3.8.31
Fast Multiple Sequence Alignment program.
28 Feb 2018 MUMmer updated to version 4.0.0beta2
Mummer is a system for aligning entire genomes extremely rapidly.
27 Feb 2018 PartekFlow updated to version 7.0.18.0218
Web interface designed specifically for the analysis needs of next generation sequencing applications including RNA, small RNA, and DNA sequencing.
27 Feb 2018 interproscan updated to version 5.27-66.0
InterProScan is the software package that allows sequences (protein and nucleic) to be scanned against InterPro's signatures. Signatures are predictive models, provided by several different databases, that make up the InterPro consortium.
27 Feb 2018 libarchive updated to version 3.3.2
Multi-format archive and compression library
26 Feb 2018 Acemd updated to version 3212u1
ACEMD is a high performance molecular dynamics code for biomolecular systems designed specifically for NVIDIA GPUs. Simple and fast, ACEMD uses very similar commands and input files of NAMD and output files as NAMD or Gromacs.
26 Feb 2018 trinity updated to version 2.6.5
Trinity, developed at the Broad Institute and the Hebrew University of Jerusalem, represents a novel method for the efficient and robust de novo reconstruction of transcriptomes from RNA-seq data.
26 Feb 2018 htseq updated to version 0.9.1
HTSeq is a Python package that provides infrastructure to process data from high-throughput sequencing assays.
26 Feb 2018 jellyfish updated to version 2.2.7
Jellyfish is a tool for fast, memory-efficient counting of k-mers in DNA.
23 Feb 2018 LEMON updated to version 1.3.1
LEMON stands for Library for Efficient Modeling and Optimization in Networks. It is a C++ template library providing efficient implementations of common data structures and algorithms with focus on combinatorial optimization tasks connected mainly with graphs and networks.
23 Feb 2018 glpk updated to version 4.65
The GLPK (GNU Linear Programming Kit) package is intended for solving large-scale linear programming (LP), mixed integer programming (MIP), and other related problems. It is a set of routines written in ANSI C and organized in the form of a callable library.
23 Feb 2018 salmon updated to version 0.9.1
a tool for quantifying the expression of transcripts using RNA-seq data.
22 Feb 2018 gautomatch updated to version 0.56
Fully automatic acccurate, convenient and extremely fast particle picking for EM
21 Feb 2018 bamliquidator updated to version 1.3
bamliquidator is a set of tools for analyzing the density of short DNA sequence read alignments in the BAM file format.
21 Feb 2018 hap.py updated to version 0.3.7
A set of programs based on htslib to benchmark variant calls against gold standard truth datasets.
20 Feb 2018 Aspera updated to version 3.7..4
High-speed fasp-powered file transfers. Mostly used to download data from NCBI, which has an Aspera server. See the data transfer page for details.
20 Feb 2018 Tcl/Tk updated to version 8.6.3
Tcl (Tool Command Language) is a very powerful but easy to learn dynamic programming language. Tk is a graphical user interface toolkit that takes developing desktop applications to a higher level than conventional approaches.
20 Feb 2018 bedops updated to version 2.4.26
Bedops is a suite of tools to address common questions raised in genomic studies - mostly with regard to overlap and proximity relationships between data sets - BEDOPS aims to be scalable and flexible, facilitating the efficient and accurate analysis and management of large-scale genomic data.
20 Feb 2018 digits updated to version 5.0
DIGITS simplifies common deep learning tasks such as managing data, designing and training neural networks on multi-GPU systems, monitoring performance in real time with advanced visualizations, and selecting the best performing model from the results browser for deployment. DIGITS is completely interactive so that data scientists can focus on designing and training networks rather than programming and debugging.
20 Feb 2018 kraken updated to version 1.1
Kraken is a system for assigning taxonomic labels to short DNA sequences, usually obtained through metagenomic studies
Scientific Databases updated in last 3 months
For a full list of scientific databases available on the NIH HPC systems, see this page

Updated Database Format Location
19 May 2018NCBI Taxonomytaxonomy/fdb/taxonomy
19 May 2018MitoBlast/fdb/blastdb/mito.aa
19 May 2018Protein Data BankPDB/pdb/pdb
18 May 2018Mouse Genome (Mus musculus) mm8MySQLNIH mirror of UCSC Genome Browser
17 May 2018Simons Genome Diversity Project (SGDP)VCF/fdb/SGDP
15 May 2018MitoFasta/fdb/fastadb/mito.nt.fas
15 May 2018MitoFasta/fdb/fastadb/mito.aa.fas
13 May 201816S MicrobialBlast/fdb/blastdb/16SMicrobial
11 May 2018Rat Genome (Rattus norvegicus) rn4MySQLNIH mirror of UCSC Genome Browser
08 May 2018Refseq Other GenomicFasta/fdb/fastadb/ref.other.genomic.fas
08 May 2018Protein Data BankFasta/fdb/fastadb/pdb.nt.fas
08 May 2018NCBI ntFasta/fdb/fastadb/nt.fas
08 May 2018SwissProtFasta/fdb/fastadb/swissprot.aa.fas
08 May 2018Protein Data BankFasta/fdb/fastadb/pdb.aa.fas
08 May 2018NCBI nrFasta/fdb/fastadb/nr.aa.fas
04 May 2018Rhesus genome rheMac2MySQLNIH mirror of UCSC genome browser
01 May 2018Protein Data BankBlast/fdb/blastdb/pdbaa
01 May 2018SwissProtBlast/fdb/blastdb/swissprot
01 May 2018Protein Data BankBlast/fdb/blastdb/pdbnt
29 Apr 2018NCBI ntBlast/fdb/blastdb/nt
22 Apr 2018HTGsBlast/fdb/blastdb/htgs
20 Apr 2018Chicken Genome (Gallus gallus) MySQLNIH mirror of UCSC Genome Browser
17 Apr 2018ANNOVARANNOVAR/fdb/annovar/current
13 Apr 2018EST - othersBlast/fdb/blastdb/est_others
13 Apr 2018Drosophila genome (Drosophila melanogaster) fb5MySQLNIH mirror of UCSC genome browser
03 Apr 2018IMPCMySQL/fdb/IMPC/IMPC_data_release_7.0.sql.gz
21 Mar 2018ViralBlast/fdb/blastdb/viral
20 Mar 2018NCBI nrBlast/fdb/blastdb/nr
05 Mar 2018ViralBlast/fdb/blastdb/viral