High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
Application updates in the last 3 months
To see all versions available for any application, use module avail application_name
All centrally-installed applications are listed on the Applications page
Updated Application
16 Jul 2018 Connectome Workbench updated to version 1.3.1
Tools to browse, download, explore, and analyze data from the Human Connectome Project (HCP). Allows users to compare their own data to that of the HCP.
10 Jul 2018 gnuplot updated to version 5.2.2
Gnuplot is a portable command-line driven graphing utility to visualize mathematical functions and data interactively, and can support many non-interactive uses such as web scripting.
Type 'gnuplot' to run, or 'module avail gnuplot' to see other available versions.
10 Jul 2018 scanpy updated to version 1.2.2
Scanpy is a scalable toolkit for analyzing single-cell gene expression data. It includes preprocessing, visualization, clustering, pseudotime and trajectory inference and differential expression testing. The Python-based implementation efficiently deals with datasets of more than one million cells.
10 Jul 2018 boost updated to version 1.67
Boost provides free peer-reviewed portable C++ source libraries. Boost libraries are intended to be widely useful, and usable across a broad spectrum of applications.
10 Jul 2018 mothur updated to version 1.40.5
mothur is a tool for analyzing 16S rRNA gene sequences generated on multiple platforms as part of microbial ecology projects.
9 Jul 2018 Solar updated to version 8.4.0
SOLAR is a program for multipoint, oligogenic, variance component linkage analysis in pedigrees of arbitrary size and complexity (Almasy L; Blangero J, 1998).
6 Jul 2018 cmtk updated to version 3.3.1
CMTK is a Software toolkit for computational morphometry of biomedical images. CMTK provides a set of command line tools for processing and I/O.
5 Jul 2018 BRASS updated to version 6.1.2
BRASS analyses one or more related BAM files of paired-end sequencing to determine potential rearrangement breakpoints.
5 Jul 2018 BEAST updated to version 1.10.0,2.4.7
BEAST (Bayesian Evolutionary Analysis Sampling Trees) is a cross-platform program for Bayesian MCMC analysis of molecular sequences.
5 Jul 2018 jo updated to version 1.1
A small utility to create JSON objects from command line arguments.
3 Jul 2018 tailseeker updated to version 3.1.7-6-g34b5ba9
Tailseeker is the official pipeline for TAIL-seq, which measures poly(A) tail lengths and 3′-end modifications with Illumina SBS sequencers.
3 Jul 2018 singularity updated to version 2.5.2
Singularity is a container platform focused on supporting ``Mobility of Compute``. It allows users to emulate, and share custom Linux environments allowing for the creation of self-contained development stacks.
29 Jun 2018 seqoutbias updated to version 1.1.3
Correct aligned HTS read counts for enzyme bias and mappability.
27 Jun 2018 NGMLR updated to version 0.2.7
NGMLR is a long-read mapper designed to align PacBio or Oxford Nanopore (standard and ultra-long) to a reference genome with a focus on reads that span structural variations.
27 Jun 2018 hichipper updated to version 0.7.3
hichipper is a preprocessing and QC pipeline for HiChIP data. This package takes output from a HiC-Pro run and a sample manifest file (.yaml) that coordinates optional high-quality peaks (identified through ChIP-Seq) and restriction fragment locations (see folder here) as input and produces output that can be used to 1) determine library quality, 2) identify and characterize DNA loops and 3) interactively visualize loops.
27 Jun 2018 Mesa updated to version 17.0.0
Mesa is an open-source implementation of the OpenGL specification. OpenGL is a programming library for writing interactive 3D applications.
26 Jun 2018 Schrodinger updated to version 2018.1
A limited number of Schrödinger applications are available on the Biowulf cluster through the Molecular Modeling Interest Group. Most are available through the Maestro GUI.
25 Jun 2018 matio updated to version 1.5.12
Matio is an open-source C library for reading and writing binary MATLAB MAT files. This library is designed for use by programs/libraries that do not have access or do not want to rely on MATLAB's shared libraries.
25 Jun 2018 HDF5 updated to version 1.10.1
HDF5 is a data model, library, and file format for storing and managing data. It supports an unlimited variety of datatypes, and is designed for flexible and efficient I/O and for high volume and complex data. HDF5 is portable and is extensible, allowing applications to evolve in their use of HDF5. The HDF5 Technology suite includes tools and applications for managing, manipulating, viewing, and analyzing data in the HDF5 format.
25 Jun 2018 genometools updated to version 1.5.9
collection of bioinformatic tools
22 Jun 2018 situs updated to version 2.8
Situs is a package for the modeling of atomic resolution structures into low-resolution density maps e.g. from electron microscopy, tomography, or small angle X-ray scattering.
22 Jun 2018 minimap2 updated to version 2.11
Minimap2 is a fast sequence mapping and alignment program that can find overlaps between long noisy reads, or map long reads or their assemblies to a reference genome optionally with detailed alignment (i.e. CIGAR).
21 Jun 2018 raptorx updated to version 37223
RaptorX is a protein structure prediction server developed by Xu group, excelling at predicting 3D structures for protein sequences without close homologs in the Protein Data Bank (PDB). Given an input sequence, RaptorX predicts its secondary and tertiary structures as well as solvent accessibility and disordered regions.
20 Jun 2018 ncbi-vdb updated to version 2.9.1
The SRA Toolkit and SDK from NCBI is a collection of tools and libraries for using data in the INSDC Sequence Read Archives.
20 Jun 2018 sratoolkit updated to version 2.9.1
The NCBI SRA Toolkit enables reading ("dumping") of sequencing files from the SRA database and writing ("loading") files into the .sra format.
20 Jun 2018 atactk updated to version 0.1.6
A toolkit for working with ATAC-seq data.
20 Jun 2018 Rosetta updated to version 2018.21
The Rosetta++ software suite can perform de novo protein structure predictions, identify low free energy sequences for target protein backbones, predict the structure of a protein-protein complex from the individual structures of the monomer components, incorporate NMR data into the basic Rosetta protocol to accelerate the process of NMR structure prediction, and more...
20 Jun 2018 hicpro updated to version 2.10.0
HiC-Pro: An optimized and flexible pipeline for Hi-C data processing
18 Jun 2018 mr-mega updated to version 0.1.5
Meta-Regression of Multi-Ethnic Genetic Association
18 Jun 2018 Huygens updated to version 18.04.0-p4
Huygens is an image restoration, deconvolution, resolution and noise reduction. It can process images from all current optical microscopes, including wide-field, confocal, Nipkow (scanning disk confocal), multiple-photon, and 4Pi microscopes.
18 Jun 2018 svtyper updated to version 0.1.4
Svtyper is a Bayesian genotyper for structural variants.
15 Jun 2018 mantra updated to version 1
Transethnic meta-analysis of genomewide association studies
15 Jun 2018 mscentipede updated to version 1.0
msCentipede is an algorithm for accurately inferring transcription factor binding sites using chromatin accessibility data (Dnase-seq, ATAC-seq)
14 Jun 2018 HLA-PRG-LA updated to version 0.79.b1a7531
Stands for HLA PRG, linear approximation. The basic idea is to seed graph alignments with linear alignments to the sequences that the graph consists of.
12 Jun 2018 purge_haplotigs updated to version 0~20180529.e6fffc9
purge_haplotigs is a pipeline to help with curating heterozygous diploid genome assemblies.
12 Jun 2018 edd updated to version 1.1.19
EDD is a ChIP-seq peak caller for detection of megabase domains of enrichment.
12 Jun 2018 sniffles updated to version 1.0.8
Sniffles is a structural variation caller using third generation sequencing (PacBio or Oxford Nanopore). It detects all types of SVs (10bp+) using evidence from split-read alignments, high-mismatch regions, and coverage analysis.
12 Jun 2018 BOLT-LMM updated to version 2.3.2
The BOLT-LMM algorithm computes statistics for testing association between phenotype and genotypes using a linear mixed model (LMM)
12 Jun 2018 tetoolkit updated to version 2.0.3
A package for including transposable elements in differential enrichment analysis of sequencing datasets.
8 Jun 2018 gem updated to version 3.0
High resolution peak calling and motif discovery for ChIP-seq and ChIP-exo data
7 Jun 2018 PePr updated to version 1.1.24
PePr is a ChIP-Seq Peak-calling and Prioritization pipeline that uses a sliding window approach and models read counts across replicates and between groups with a negative binomial distribution.
7 Jun 2018 SURVIVOR updated to version 1.0.3
SURVIVOR is a tool set for simulating/evaluating SVs, merging and comparing SVs within and among samples, and includes various methods to reformat or summarize SVs.
7 Jun 2018 umitools updated to version 0.5.3
tools for dealing with Unique Molecular Identifiers (UMIs)/Random Molecular Tags (RMTs) and single cell RNA-Seq cell barcodes
6 Jun 2018 mixcr updated to version 2.1.10
MiXCR is a universal software for fast and accurate analysis of T- and B- cell receptor repertoire sequencing data.
6 Jun 2018 interproscan updated to version 5.29-68.0
InterProScan is the software package that allows sequences (protein and nucleic) to be scanned against InterPro's signatures. Signatures are predictive models, provided by several different databases, that make up the InterPro consortium.
6 Jun 2018 zUMIs updated to version 0.0.6
zUMIs is a fast and flexible pipeline to process RNA-seq data with UMIs.
5 Jun 2018 mono updated to version 5.12.0
4 Jun 2018 htgts updated to version v2
High-Throughput Genome-Wide Translocation Sequencing pipeline
4 Jun 2018 CCP4 updated to version 7.0.057
CCP4 is a suite of programs for protein crystallography and structural biology.
2 Jun 2018 Mendel updated to version 16.0
Mendel is a comprehensive package for exact statistical genetic analysis of qualitative and quantitative traits.
1 Jun 2018 bbtools updated to version 38.06
An extensive set of bioinformatics tools including bbmap (short read aligner), bbnorm (kmer based normalization), dedupe (deduplication and clustering of unaligned reads), reformat (formatting and trimming reads) and many more.
1 Jun 2018 snakemake updated to version 5.1.3
Snakemake aims to reduce the complexity of creating workflows by providing a fast and comfortable execution environment, together with a clean and modern domain specific specification language (DSL) in python style. It is well suited for bioinformatic workflows.
1 Jun 2018 probabel updated to version 0.4.3
ProbABEL is a Tool for genome-wide association analysis of imputed genetic data. It was designed to perform such regression in fast, memory-efficient and consequently genome-wide feasible manner. Currently, ProbABEL implements linear, logistic regression, and Cox proportional hazards models.
1 Jun 2018 patchelf updated to version 0.9
patchelf is a small utility to modify the dynamic linker and RPATH of ELF executables.
1 Jun 2018 scalpel updated to version 0.5.3
Bioinformatics pipeline for discovery of genetic variants from NGS reads.
1 Jun 2018 csvkit updated to version 1.0.3
csvkit is a suite of command-line tools for converting to and working with CSV, the king of tabular file formats.
31 May 2018 preseq updated to version 2.0.3
predicting library complexity and genome coverage in high-throughput sequencing
31 May 2018 sicer updated to version 1.1
A clustering approach for identification of enriched domains from histone modification ChIP-Seq data
31 May 2018 metal updated to version 2017-12-21
The METAL software is designed to facilitate meta-analysis of large datasets (such as several whole genome scans) in a convenient, rapid and memory efficient manner.
29 May 2018 circleseq updated to version 1.0
Circleseq takes sample-specific paired end FASTQ files as input and produces a list of CIRCLE-seq detected off-target cleavage sites as output.
29 May 2018 lammps updated to version 16Mar18
LAMMPS is a classical molecular dynamics code, and an acronym for Large-scale Atomic/Molecular Massively Parallel Simulator. It runs on a variety of different computer systems, including single processor systems, distributed-memory machines with MPI, and GPU and Xeon Phi systems. LAMMPS is open source software, released under the GNU General Public License.
24 May 2018 SomaticSeq updated to version 2.7.2
SomaticSeq is an ensemble approach to accurately detect somatic mutations. It incorporates multiple somatic mutation caller(s) to obtain a combined call set, and then uses machine learning to distinguish true mutations from false positives from that call set.
24 May 2018 KmerGenie updated to version 1.7048
KmerGenie estimates the best k-mer length for genome de novo assembly.
24 May 2018 gffcompare updated to version 0.10.5
gffcompare can be used to compare and evaluate the accuracy of RNA-Seq transcript assemblers (Cufflinks, Stringtie). It can collapse (merge) duplicate transcripts from multiple GTF/GFF3 files (e.g. resulted from assembly of different samples) and classify transcripts from one or multiple GTF/GFF3 files as they relate to reference transcripts provided in a annotation file (also in GTF/GFF3 format).
23 May 2018 virtualgl updated to version 2.5.2
VirtualGL is an open source toolkit that gives any Unix or Linux remote display software the ability to run OpenGL applications with full 3D hardware acceleration.
23 May 2018 freec updated to version 11.4
Control-FREEC is a tool for detection of copy-number changes and allelic imbalances (including LOH) using deep-sequencing data
23 May 2018 presto updated to version 0.5.7
pRESTO performs all stages of raw sequence processing prior to alignment against reference germline sequences.
22 May 2018 stampy updated to version 1.0.32
Short read aligner
21 May 2018 smc++ updated to version 1.13.1
SMC++ is a program for estimating the size history of populations from whole genome sequence data.
21 May 2018 smart updated to version 2.2.8
Specific Methylation Analysis and Report Tool (SMART) uses the signal from bisulfite sequencing experiments across multiple samples to identify genome segments with similar methylation secificities.
21 May 2018 ScanIndel updated to version 1.3
ScanIndel is a python program to detect indels (insertions and deletions) from NGS data by re-align and de novo assemble soft clipped reads.
21 May 2018 circos updated to version 0.69-6
Circos is a program for the generation of publication-quality, circularly composited renditions of genomic data and related annotations. Circos is particularly suited for visualizing alignments, conservation and intra and inter-chromosomal relationships. Also, Circos is useful to visualize any type of information that benefits from a circular layout. Thus, although it has been designed for the field of genomics, it is sufficiently flexible to be used in other data domains.
21 May 2018 khmer updated to version 2.1.2
Library and suite of command line tools for working with short-read, DNA sequences, taking a k-mer-centric approach to sequence analysis.
21 May 2018 scallop updated to version 0.10.2
Scallop is a reference-based transcript assembler.
21 May 2018 tantan updated to version 13
A tool to mask low complexity and short period tandem repeats
21 May 2018 peddy updated to version 0.3.1
peddy is used to compare sex and familial relationships given in a PED file with those inferred from a VCF file
21 May 2018 INRICH updated to version 1.1
INRICH is a pathway analysis tool for genome wide association studies, designed for detecting enriched association signals of LD-independent genomic regions within biologically relevant gene sets.
19 May 2018 paraview updated to version 5.4.1
ParaView is an open-source, multi-platform data analysis and visualization application.
19 May 2018 steme updated to version 1.9.1
An efficient accurate motif finder based on MEME and implemented using suffix arrays.
18 May 2018 treemix updated to version 1.12
TreeMix is a method for inferring the patterns of population splits and mixtures in the history of a set of populations.
17 May 2018 meka updated to version 1.9.2
A Multi-label Extension to WEKA
17 May 2018 mfold updated to version 3.6
MFOLD predicts DNA and RNA secondary structure.
17 May 2018 PyLOH updated to version 1.4.3
Deconvolving tumor purity and ploidy by integrating copy number alterations and loss of heterozygosity
17 May 2018 loki updated to version 2.4.7_4
Loki is a linkage analysis package, primarily for large and complex pedigrees, which uses Markov chain Monte Carlo (MCMC) techniques to avoid many of the computational problems that prevent exact computational methods being used for large pedigrees.
16 May 2018 OpenCV updated to version 3.4.1
OpenCV (Open Source Computer Vision Library) is an open source computer vision and machine learning software library. OpenCV was built to provide a common infrastructure for computer vision applications and to accelerate the use of machine perception in the commercial products.
16 May 2018 lefse updated to version 1.0.7
LEfSe (Linear discriminant analysis Effect Size) determines the features (organisms, clades, operational taxonomic units, genes, or functions) most likely to explain differences between classes by coupling standard tests for statistical significance with additional tests encoding biological consistency and effect relevance.
16 May 2018 kneaddata updated to version 0.7.0
KneadData is a tool designed to perform quality control on metagenomic and metatranscriptomic sequencing data, especially data from microbiome experiments.
16 May 2018 kaiju updated to version 1.6.2
Kaiju is a program for the taxonomic classification of high-throughput sequencing reads, e.g., Illumina or Roche/454, from whole-genome sequencing of metagenomic DNA.
16 May 2018 fqtools updated to version 2.0
Tools for manipulating fastq files
16 May 2018 SimNIBS updated to version 2.0.1
SimNIBS 2.0 is a free software package for the Simulation of Non-invasive Brain Stimulation. It allows for realistic calculations of the electric field induced by transcranial magnetic stimulation (TMS) and transcranial direct current stimulation (tDCS).
15 May 2018 fusioncatcher updated to version 1.00
FusionCatcher searches for novel/known somatic fusion genes, translocations, and chimeras in RNA-seq data (paired-end or single-end reads from Illumina NGS platforms like Solexa/HiSeq/NextSeq/MiSeq) from diseased samples.
15 May 2018 bioawk updated to version 1.0
Regular awk with support for several common biological data formats, including optionally gzip'ed BED, GFF, SAM, VCF, FASTA/Q and TAB-delimited formats with column names.
15 May 2018 CAVIAR updated to version 2.2
CAVIAR (CAusal Variants Identication in Associated Regions) is a statistical framework that quantifies the probability of each variant to be causal while allowing with arbitrary number of causal variants
15 May 2018 elastix updated to version 4.9
a toolbox for rigid and nonrigid registration of images.
15 May 2018 samtools updated to version 1.8
The samtools package now provides samtools, bcftools, tabix, and the underlying htslib library.
14 May 2018 weblogo updated to version 3.6
contains seqlogo utility to create sequence logo summarizing sequence alignments
14 May 2018 tRNAscan-SE updated to version 2.0.0
tRNAscan-SE 2.0 has advanced the state-of-the-art methodology in tRNA gene detection and functional prediction, captured by rich new content of the companion Genomic tRNA Database
12 May 2018 freebayes updated to version 1.2.0
Bayesian haplotype-based polymorphism discovery and genotyping
11 May 2018 IDL/ENVI updated to version 8.5/5.3
IDL and ENVI are a complete computing environment for the interactive analysis and visualization of data. IDL integrates an array-oriented language with mathematical analysis and graphical display techniques. ENVI is designed for extracting information from geospatial and medical imagery.
11 May 2018 mrtrix updated to version 3.0_RC2
MRtrix provides a large suite of tools for image processing, analysis and visualisation, with a focus on the analysis of white matter using diffusion-weighted MRI.
10 May 2018 kallisto updated to version 0.44.0
kallisto is a program for quantifying abundances of transcripts from RNA-Seq data, or more generally of target sequences using high-throughput sequencing reads. It is based on the novel idea of pseudoalignment for rapidly determining the compatibility of reads with targets, without the need for alignment.
10 May 2018 rmblast updated to version 2.6.0.2
RMBlast is a RepeatMasker-compatible version of the standard NCBI blastn program. RMBlast supports RepeatMasker searches by adding a few necessary features to the stock NCBI blastn program.
10 May 2018 CD-HIT updated to version 4.6.8
CD-HIT is a very widely used program for clustering and comparing protein or nucleotide sequences.
10 May 2018 FastQTL updated to version 2.184
In order to discover quantitative trait loci (QTLs), multi-dimensional genomic datasets combining DNA-seq and ChiP-/RNA-seq require methods that rapidly correlate tens of thousands of molecular phenotypes with millions of genetic variants while appropriately controlling for multiple testing. FastQTL implements a popular cis-QTL mapping strategy in a user- and cluster-friendly tool. FastQTL also proposes an efficient permutation procedure to control for multiple testing.
9 May 2018 Juicer updated to version 1.5.6
A One-Click System for Analyzing Loop-Resolution Hi-C Experiments
9 May 2018 Matlab updated to version 9.4.0.813654
MATLAB is a high-performance interactive software package for scientific and engineering numeric computation. MATLAB integrates numerical analysis, matrix computation, signal processing, and graphics in an environment where problems and solutions are expressed just as they are written mathematically.
8 May 2018 mosdepth updated to version 0.2.3
Fast BAM/CRAM depth calculation for WGS, exome, or targeted sequencing.
8 May 2018 cromwell updated to version 31
A Workflow Management System geared towards scientific workflows.
8 May 2018 Clairvoyante updated to version 0.1
The accurate identification of DNA sequence variants is particularly difficult for single molecule sequencing, which has a high per-nucleotide error rate (~5%-15%). Clairvoyante implements a multitask five-layer convolutional neural network model for predicting variant type (SNP or indel), zygosity, alternative allele and indel length from aligned reads. Using well-characterized tesing data, Clairvoyante achieved 99.73%, 97.68% and 95.36% precision on known variants, and 98.65%, 92.57%, 77.89% F1-score for whole-genome analysis, using Illumina, PacBio, and Oxford Nanopore data, respectively.
8 May 2018 phase updated to version 2.1.1
infers haplotypes from population genotype data
8 May 2018 clark updated to version 1.2.5
A method based on a supervised sequence classification using discriminative k-mers
7 May 2018 casper updated to version 0.8.2
CASPER (Context-Aware Scheme for Paired-End Read) is state-of-the art merging tool in terms of accuracy and robustness. Using this sophisticated merging method, we could get elongated reads from the forward and reverse reads.
7 May 2018 PyCharm updated to version 2018.1.2
A Python IDE
7 May 2018 mageck-vispr updated to version 0.5.4
MAGeCK-VISPR is a comprehensive quality control, analysis and visualization workflow for CRISPR/Cas9 screens.
7 May 2018 pyclone updated to version 0.13.1
PyClone is statistical model and software tool designed to infer the prevalence of point mutations in heterogeneous cancer samples.
7 May 2018 ChromHMM updated to version 1.15
ChromHMM is software for learning and characterizing chromatin states.
7 May 2018 sailfish updated to version 0.10.0
Sailfish is a tool for transcript quantification from RNA-seq data. It requires a set of target transcripts (either from a reference or de-novo assembly) to quantify. All that is needed to run sailfish is a fasta file containing your reference transcripts and a (set of) fasta/fastq file(s) containing your RNA-Seq reads.
6 May 2018 roary updated to version 3.12.0
Roary is a high speed stand alone pan genome pipeline, which takes annotated assemblies in GFF3 format (produced by Prokka) and calculates the pan genome.
4 May 2018 lofreq updated to version 2.1.3.1
LoFreq is a fast and sensitive variant-caller for inferring SNVs and indels from next-generation sequencing data.
4 May 2018 bali-phy updated to version 3.1
BAli-Phy is MCMC software developed by Ben Redelings with Marc Suchard for simultaneous Bayesian estimation of alignment and phylogeny (and other parameters). It handles generic Bayesian modeling via probabilistic programming.
4 May 2018 pyDNase updated to version 0.2.6
pyDNase is a suite of tools for analysing DNase-seq data - pyDNase comes with several analysis scripts covering several common use cases of DNase-seq analysis, and also an implementation of the Wellington, Wellington 1D, and Wellington-boostrap footprinting algorithms.
3 May 2018 fastqtools updated to version 0.8
fastq-tools a collection of small and efficient programs for performing some common and uncommon tasks with FASTQ files.
3 May 2018 atac_dnase_pipelines updated to version 0.3.4-19-gcbd2a00
This pipeline is designed for automated end-to-end quality control and processing of ATAC-seq or DNase-seq data
3 May 2018 leveldb updated to version 1.20
LevelDB is a fast key-value storage library written at Google that provides an ordered mapping from string keys to string values.
2 May 2018 lancet updated to version 1.0.7
Lancet is a somatic variant caller (SNVs and indels) for short read data.
2 May 2018 snappy updated to version 1.1.7
Snappy is a compression/decompression library. It does not aim for maximum compression, or compatibility with any other compression library; instead, it aims for very high speeds and reasonable compression.
2 May 2018 bison updated to version 0.4.0
BISON is a bisulfite-converted short-read aligner that can natively utilize high-performance computing clusters to increase speed.
2 May 2018 locuszoom updated to version 1.3
LocusZoom is designed to facilitate viewing of local association results together with useful information about a locus, such as the location and orientation of the genes it includes, linkage disequilibrium coefficients and local estimates of recombination rates
2 May 2018 Octave updated to version 4.4.0
GNU Octave is a high-level language, primarily intended for numerical computations. It provides a convenient command line interface for solving linear and nonlinear problems numerically, and for performing other numerical experiments using a language that is mostly compatible with Matlab.
2 May 2018 poretools updated to version 0.6.1a1
Poretools is a toolkit for manipulating and exploring nanopore sequencing data sets. Poretools operates on individual FAST5 files, directory of FAST5 files, and tar archives of FAST5 files.
2 May 2018 jq updated to version 1.5
Command line json processor
2 May 2018 cisTEM updated to version 1.0.0-beta
cisTEM is user-friendly software to process cryo-EM images of macromolecular complexes and obtain high-resolution 3D reconstructions from them.
1 May 2018 iva updated to version 1.0.3
IVA is a de novo assembler designed to assemble virus genomes that have no repeat sequences, using Illumina read pairs sequenced from mixed populations at extremely high and variable depth.
1 May 2018 smalt updated to version 0.7.6
SMALT efficiently aligns DNA sequencing reads with a reference genome.
1 May 2018 pindel updated to version 0.2.5b8
Pindel can detect breakpoints of large deletions, medium sized insertions, inversions, tandem duplications and other structural variants at single-based resolution from next-gen sequence data. It uses a pattern growth approach to identify the breakpoints of these variants from paired-end short reads.
1 May 2018 KMC updated to version 3.0.0
KMC is a disk-based programm for counting k-mers from (possibly gzipped) FASTQ/FASTA files
1 May 2018 porechop updated to version 0.2.3
Trim/demultiplex Oxford Nanopore reads
30 Apr 2018 dcm2niix updated to version 1.0.20171215
DICOM to NIfTI converter
30 Apr 2018 Rstudio updated to version 1.1.447
RStudio is a set of integrated tools designed to help you be more productive with R. It includes a console, syntax-highlighting editor that supports direct code execution, as well as tools for plotting, history, debugging and workspace management.
30 Apr 2018 somaticsniper updated to version 1.0.5.0
The purpose of this program is to identify single nucleotide positions that are different between tumor and normal (or, in theory, any two bam files). It takes a tumor bam and a normal bam and compares the two to determine the differences. It outputs a file in a format very similar to Samtools consensus format.
28 Apr 2018 STAR-Fusion updated to version 1.3.2
Transcript fusion detection
27 Apr 2018 genesis updated to version 2.4
GENESIS (GEneral NEural SImulation System) is a software platform for the simulation of neural systems ranging from subcellular components and biochemical reactions to complex models of single neurons, large networks, and systems-level processes.
27 Apr 2018 fcgene updated to version 1.0.7
FCgene is a Format Converting tool for genotyped Data (e.g.PLINK-MACH,MACH-PLINK)
27 Apr 2018 LMDB updated to version 0.9.14
LMDB is an ultra-fast, ultra-compact key-value embedded data store developed by Symas for the OpenLDAP Project. It uses memory-mapped files, so it has the read performance of a pure in-memory database while still offering the persistence of standard disk-based databases, and is only limited to the size of the virtual address space, (it is not limited to the size of physical RAM).
27 Apr 2018 EDirect updated to version 8.60
Entrez Direct (EDirect) is an advanced method for accessing the NCBI's set of interconnected databases (publication, sequence, structure, gene, variation, expression, etc.) from a UNIX terminal window.
27 Apr 2018 asciinema updated to version 2.0.1
asciinema [as-kee-nuh-muh] is a free and open source solution for recording terminal sessions and sharing them.
Type 'module load asciinema' then 'asciinema' to run.
27 Apr 2018 Grace updated to version 5.1.25
Grace is a WYSIWYG 2D plotting tool for the X-Window system. It is a successor to Xmgr.
Type 'module load grace', then 'xmgrace' or 'gracebat' to run.
25 Apr 2018 bcbio-nextgen updated to version 1.0.9
Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis
24 Apr 2018 VEP updated to version 92
VEP (Variant Effect Predictor) determines the effect of your variants (SNPs, insertions, deletions, CNVs or structural variants) on genes, transcripts, and protein sequence, as well as regulatory regions.
24 Apr 2018 GEMMA updated to version 0.96
GEMMA is the software implementing the Genome-wide Efficient Mixed Model Association algorithm for a standard linear mixed model and some of its close relatives for genome-wide association studies (GWAS).
24 Apr 2018 crystfel updated to version 0.6.3
CrystFEL is a suite of programs for processing diffraction data acquired serially in a snapshot manner, such as when using the technique of Serial Femtosecond Crystallography (SFX) with a free-electron laser source.
23 Apr 2018 glog updated to version 0.3.5
The glog library implements application-level logging. This library provides logging APIs based on C++-style streams and various helper macros.
23 Apr 2018 gflags updated to version 2.2.1
The gflags package contains a library that implements commandline flags processing. It includes built-in support for C++ types like string and the ability to define flags in the source file in which they're used.
18 Apr 2018 smrtanalysis updated to version 5.1.0.26412
SMRT® Analysis is a bioinformatics software suite available for analysis of DNA sequencing data from Pacific Biosciences’ SMRT technology. Users can choose from a variety of analysis protocols that utilize PacBio® and third-party tools. Analysis protocols include de novo genome assembly, cDNA mapping, DNA base-modification detection, and long-amplicon analysis to determine phased consensus sequences.
18 Apr 2018 clustalo updated to version 1.2.4
Clustal-Omega is a general purpose multiple sequence alignment (MSA) program for proteins and DNA/RNA. It produces high quality MSAs and is capable of handling data-sets of hundreds of thousands of sequences in reasonable time.
18 Apr 2018 FFmpeg updated to version 3.4.2
A complete, cross-platform solution to record, convert and stream audio and video.
Scientific Databases updated in last 3 months
For a full list of scientific databases available on the NIH HPC systems, see this page

Updated Database Format Location
16 Jul 2018NCBI Taxonomytaxonomy/fdb/taxonomy
16 Jul 2018MitoBlast/fdb/blastdb/mito.aa
15 Jul 201816S MicrobialBlast/fdb/blastdb/16SMicrobial
14 Jul 2018Protein Data BankPDB/pdb/pdb
13 Jul 2018Rat Genome (Rattus norvegicus) rn4MySQLNIH mirror of UCSC Genome Browser
12 Jul 2018Protein Data BankBlast/fdb/blastdb/pdbaa
12 Jul 2018SwissProtBlast/fdb/blastdb/swissprot
10 Jul 2018MitoFasta/fdb/fastadb/mito.nt.fas
10 Jul 2018MitoFasta/fdb/fastadb/mito.aa.fas
09 Jul 2018Protein Data BankBlast/fdb/blastdb/pdbnt
09 Jul 2018NCBI ntBlast/fdb/blastdb/nt
03 Jul 2018ANNOVARANNOVAR/fdb/annovar/current
03 Jul 2018EST - othersBlast/fdb/blastdb/est_others
29 Jun 2018Human Genome hg18MySQLNIH mirror of UCSC Genome Browser
27 Jun 2018ViralBlast/fdb/blastdb/viral
27 Jun 2018HTGsBlast/fdb/blastdb/htgs
19 Jun 2018NCBI nrBlast/fdb/blastdb/nr
15 Jun 2018Mouse Genome (Mus musculus) mm8MySQLNIH mirror of UCSC Genome Browser
12 Jun 2018Refseq Other GenomicFasta/fdb/fastadb/ref.other.genomic.fas
12 Jun 2018Protein Data BankFasta/fdb/fastadb/pdb.nt.fas
12 Jun 2018NCBI ntFasta/fdb/fastadb/nt.fas
12 Jun 2018SwissProtFasta/fdb/fastadb/swissprot.aa.fas
12 Jun 2018Protein Data BankFasta/fdb/fastadb/pdb.aa.fas
12 Jun 2018NCBI nrFasta/fdb/fastadb/nr.aa.fas
01 Jun 2018Rhesus genome rheMac2MySQLNIH mirror of UCSC genome browser
01 Jun 2018Drosophila genome (Drosophila melanogaster) fb5MySQLNIH mirror of UCSC genome browser
17 May 2018Simons Genome Diversity Project (SGDP)VCF/fdb/SGDP