Scientific Applications on NIH HPC Systems

Application Areas

Deep Learning

Development Tools

Editors

High-Throughput Sequencing

Image Analysis

Linkage/Phylogenetics

Mass Spectrometry

Mathematical/Statistics

Molecular Modeling/Graphics

The NIH HPC staff maintains several hundred scientific programs, packages and databases for our users. Below is a list of system-installed software available on Biowulf and Helix. Click on the application name to get to site-specific instructions on how to run a given package on the cluster, including links to the original application documentation.

In almost all cases, applications are made available through the use of environment modules.

Users are welcome to install additional applications in their own /home or /data areas, and create their own personal modules for those applications. Some applications are difficult to install or require admin privileges; in those cases, you can email staff@hpc.nih.gov with a request to install the application.

Computational Chemistry

Acemd (3.5.1)

ACEMD is a high performance molecular dynamics code for biomolecular systems designed specifically for NVIDIA GPUs. Simple and fast, ACEMD uses very similar commands and input files of NAMD and output files as NAMD or Gromacs.

AMBER (22)

AMBER (Assisted Model Building with Energy Refinement) is a package of molecular simulation programs.

AMPL (1.5.1)

The Accelerating Therapeutics for Opportunites in Medicine (ATOM) Consortium Modeling PipeLine for Drug Discovery. AMPL is an open-source, modular, extensible software pipeline for building and sharing models to advance in silico drug discovery.

APBS (3.4.1)

APBS (Adaptive Poisson-Boltzmann Solver) is a software package for the numerical solution of the Poisson-Boltzmann equation (PBE), one of the most popular continuum models for describing electrostatic interactions between molecular solutes in salty, aqueous media.

Autodock (4.2.6)

Autodock is a suite of automated docking tools. It is designed to predict how small molecules, such as substrates or drug candidates, bind to a receptor of known 3D structure.

AutodockVina (1_1_2)

AutoDock Vina is a program for drug discovery, molecular docking and virtual screening, offering multi-core capability, high performance and enhanced accuracy and ease of use. It is closely tied to Autodock.

CHARMM (c46b1)

CHARMM is a general and flexible software application for modeling the structure and behavior of molecular systems.

GAMESS (30Sep22-R2)

GAMESS is a general ab initio quantum chemistry package.

Gaussian (G16-C02)

Gaussian is a connected system of programs for performing semiempirical and ab initio molecular orbital (MO) calculations.

gromacs (2024.1)

Gromacs is a versatile package to perform molecular dynamics, i.e. simulate the Newtonian equations of motion for systems with hundreds to millions of particles. It is primarily designed for biochemical molecules like proteins and lipids that have a lot of complicated bonded interactions, but since GROMACS is extremely fast at calculating the nonbonded interactions (that usually dominate simulations) many groups are also using it for research on non-biological systems, e.g. polymers.

NAMD (3.0beta6)

NAMD is a parallel molecular dynamics program for UNIX platforms designed for high-performance simulations in structural biology. VMD, the associated molecular visualization program, is also available.

parmed (4.0.0)

Cross-program parameter and topology file editor and molecular mechanical simulator engine.

Psi4 (1.6.1)

Psi4 is an ab-initio electronic structure code that supports various methods for calculating energies and gradients of molecular systems.

Q-Chem (5.0.1)

Q-Chem is a comprehensive ab initio quantum chemistry package for accurate predictions of molecular structures, reactivities, and vibrational, electronic and NMR spectra.

Schrodinger (2023.1)

A limited number of Schrödinger applications are available on the Biowulf cluster through the Molecular Modeling Interest Group. Most are available through the Maestro GUI.

vmd (1.9.3)

VMD is a molecular visualization program for displaying, animating, and analyzing large biomolecular systems using 3-D graphics and built-in scripting. To use, type vmd at the prompt.

Cryo-Electron Microscopy

cisTEM (1.0.0-beta)

cisTEM is user-friendly software to process cryo-EM images of macromolecular complexes and obtain high-resolution 3D reconstructions from them.

cryoDRGN (1.1.0)

CryoDRGN is an algorithm that leverages the representation power of deep neural networks to directly reconstruct continuous distributions of 3D density maps and map per-particle heterogeneity of single-particle cryo-EM datasets. It contains interactive tools to visualize a dataset’s distribution of per-particle variability, generate density maps for exploratory analysis, extract particle subsets for use with other tools and generate trajectories to visualize molecular motions.

cryosparc (4.4.1)

CryoSPARC (Cryo-EM Single Particle Ab-Initio Reconstruction and Classification) is a state of the art HPC software solution for complete processing of single-particle cryo-electron microscopy (cryo-EM) data. CryoSPARC is useful for solving cryo-EM structures of membrane proteins, viruses, complexes, flexible molecules, small particles, phase plate data and negative stain data.

ctffind (4.1.14)

Programs for finding CTFs of electron micrographs

DeepEmhancer (0.15)

A deep learning solution for cryo-EM map.

EMAN2 (2.99)

EMAN2 is a broadly based greyscale scientific image processing suite with a primary focus on processing data from transmission electron microscopes.

emready (2.0)

An application improving the quality and interpretability of cryo-EM maps by local and non-local deep learning.

Frealign (9.11_151031)

Frealign is a program for high-resolution refinement of 3D reconstructions from cryo-EM images of single particles.

frealix (1.1.0)

Frealix is a program for the refinement of helical filament reconstructions from cryo electron micrographs. It is primarily used to process images of amyloid fibrils, though it has also been tested on TMV and actin filaments.

gautomatch (0.56_cuda-9.1)

Fully automatic acccurate, convenient and extremely fast particle picking for EM

Gctf (1.06)

Gctf provides accurate estimation of the contrast transfer function (CTF) for near-atomic resolution cryo electron microscopy (cryoEM) reconstruction using GPUs.

model-angelo (1.0.12)

ModelAngelo is an automatic atomic model building program for cryo-EM maps.

MotionCor2 (1.5.0)

MotionCor2 is a multi-GPU accelerated program that provides iterative, patch-based motion detection combining spatial and temporal constraints and dose weighting for both single particle and tomographic cryo-electon microscopy images.

motioncor3 (1.0.1)

An improved implementation of MotionCor2 with addition of CTF estimation, Multi-GPU accelerated software package that enables single-pixel level correction of anisotropic beam induced sample motion for cryoEM and cryET images.

pf_refinement (20Nov19)

Protofilament Refinement is a software package to further refine microtubule structures, by aligning individual protofilaments rather than full microtubule segements.

pyem (240209)

UCSF pyem is a collection of Python modules and command-line utilities for electron microscopy of biological samples.

RELION (4.0.1)

RELION (for REgularised LIkelihood OptimisatioN) is a stand-alone computer program for Maximum A Posteriori refinement of (multiple) 3D reconstructions or 2D class averages in cryo-electron microscopy.

ResMap (1.9)

ResMap (Resolution Map) is a Python (NumPy/SciPy) application with a Tkinter GUI and a command-line interface. It is a software package for computing the local resolution of 3D density maps studied in structural biology, primarily electron cryo-microscopy (cryo-EM).

Scipion (3.0.12)

Scipion is an image processing framework to obtain 3D models of macromolecular complexes using Electron Microscopy (3DEM). It integrates several software packages and presents an unified interface for both biologists and developers. Scipion allows to execute workflows combining different software tools, while taking care of formats and conversions. Additionally, all steps are tracked and can be reproduced later on.

topaz (0.2.5-ba91e19)

topaz is a pipeline for particle detection in cryo-electron microscopy images using convolutional neural networks trained from positive and unlabeled examples. Topaz also includes methods for micrograph and tomogram denoising using deep denoising models.

Deep Learning

B-SOID (1.3)

B-SOID (Behavioral Segmentation in Deeplabcut) is an unsupervised learning algorithm that serves to discover and classify behaviors that are not pre-defined by users. It segregates statistically different, sub-second rodent behaviors with a single bottom-up perspective video cameraR by performing a novel expectation maximization fitting of Gaussian mixture models on t-Distributed Stochastic Neighbor Embedding (t-SNE).

BioBERT (v20200409)

BioBERT is a biomedical language representation model designed for biomedical text mining tasks such as biomedical named entity recognition, relation extraction, question answering, etc.

BioGANs (20210916)

BioGANs is a novel application of Generative Adversarial Networks (GAN) to the synthesis of cells imaged by fluorescence microscopy. It allows to infer the correlation between the spatial pattern of different fluorescent proteins that reflects important biological functions. The synthesized images capture these relationships, which are relevant for biological applications.

cuDNN (8.0.3)

The NVIDIA CUDA Deep Neural Network library (cuDNN) is a GPU-accelerated library of primitives for deep neural networks. cuDNN provides highly tuned implementations for standard routines such as forward and backward convolution, pooling, normalization, and activation layers.

DanQ (20220825)

DanQ is a hybrid convolutional and recurrent deep neural network for quantifying the function of DNA sequences

deepcadrt (0.1.0)

Real-time denoising of fluorescence time-lapse imaging using deep self-supervised learning

DeepCell-tf (0.12.6)

The DeepCell-tf library allows users to apply pre-existing models to imaging data as well as to develop new deep learning models for single-cell analysis. The library specializes in models for cell segmentation (whole-cell and nuclear) in 2D and 3D images as well as cell tracking in 2D time-lapse datasets. The models are applicable to data ranging from multiplexed images of tissues to dynamic live-cell imaging movies.

DeepLabCut (2.3.9)

DeepLabCut is an open source toolbox that builds on a state-of-the-art human pose estimation algorithm. It allows training of a deep neural network by using limited training data to precisely track user-defined features, so that the human labeling accuracy will be matched.

deepmedic (0.8.4)

This project aims to offer easy access to Deep Learning for segmentation of structures of interest in biomedical 3D scans. It is a system that allows the easy creation of a 3D Convolutional Neural Network, which can be trained to detect and segment structures if corresponding ground truth labels are provided for training. The system processes NIFTI images, making its use straightforward for many biomedical tasks.

DeepMM (20220830)

DeepMM implements fully automated de novo structure modeling method, MAINMAST, which builds three-dimensional models of a protein from a near-atomic resolution EM map. The method directly traces the protein’s main-chain and identifies Cα positions as tree-graph structures in the EM map.

GCN_Cancer (20221105)

The GCN_Cancer application employs graph convolutional network (GCN) models to classify the gene expression data samples from The Cancer Genonme Atlas (TCAG) as 33 designated tumor types or as normal. It has been trained on 10,340 cancer samples and 731 normal tissue samples from TCGA dataset.

HBD / heme_binder_diffusion (20240319)

RoseTTAFold All-Atom (RFAA), a deep network capable of modeling full biological assemblies containing proteins, nucleic acids, small molecules, metals, and covalent modifications given the sequences of the polymers and the atomic bonded geometry of the small molecules and covalent modifications. The heme_binder_diffusion pipeline employs RoseTTAFold All-Atom to perform de novo heme binding protein design.

IsoNet (0.2.1)

IsoNet is a deep learning-based software package that iteratively reconstructs the missing-wedge information and increases signal-to-noise ratio, using the knowledge learned from raw tomograms. Without the need for sub-tomogram averaging, IsoNet generates tomograms with significantly reduced resolution anisotropy.

keras (2.4.3; 2.9.1)

Keras is a deep learning API written in Python, running on top of the machine learning platform TensorFlow. It was developed with a focus on enabling fast experimentation. Being able to go from idea to result as fast as possible is key to doing good research.

NeST-VNN (20240321)

NeST-VNN is an interpretable neural network-based model that predicts cell response to a drug. This framework integrates information across multiple levels of cancer cell biology to understand drug response, and can serve to identify and explain biomarkers for clinical application.

PyTom (1.0)

PyTom is a software package for the analysis of volumetric data obtained by cryo electron tomography (cryo-ET). It covers a complete pipeline of processing steps for tomogram reconstruction, localization of macromolecular complexes in tomograms, fine alignment of subtomograms extracted at these locations, and their classification.

ReLeaSE (20220825)

ReLeaSE (Reinforcement Learning for Structural Evolution) is an application for de-novo Drug Design based on Reinforcement Learning. It integrates two deep neural networks: generative and predictive, that are trained separately but are used jointly to generate novel targeted chemical libraries. ReLeaSE uses simple representation of molecules by their simplified molecular input line entry specification (SMILES) strings only.

RFdiffusion (1.1.0)

Rosetta Fold (RF) dissusion is an open source method for structure generation, with or without conditional information (a motif, target etc). It can perform motif scaffolding, unconditional protein generation, and other tasks.

SpliceAI (1.3.1)

SpliceAI is a deep neural network that accurately predicts splice junctions from an arbitrary pre-mRNA transcript sequence, enabling precise prediction of noncoding genetic variants that cause cryptic splicing.

talos (1.0)

Talos is a hyperparameter optimization package for deep learning. It works with any Keras, TensorFlow (tf.keras) or PyTorch model, takes minutes to implement, involves no new syntax to learn and adds zero new overhead to your workflow.

TensorQTL (1.0.9)

ensoorQTL leverages general-purpose libraries and graphics processing units (GPUs) to achieve high efficiency of computations at low costR. Using PyTorch or TensorFlow it allows > 200-fold decreases in runtime and ~ 5–10-fold reductions in cost when running on GPUs relative to CPUs.

Tybalt (202208265)

Tybalt implements a Variational EutoEncoder (VAE), a deep neural network approach capable of generating meaningful latent spaces for image and text data. Tybalt has been trained on The Cancer Genome Atlas (TCGA) pan-cancer RNA-seq data and used to identify specific patterns in the VAE encoded features.

UNet (20220825)

U-Net is an image segmentation tool. It relies on the strong use of data augmentation to use the available annotated samples more efficiently. The architecture consists of a contracting path to capture context and a symmetric expanding path that enables precise localization.

Editors

Emacs (28.1)

Emacs is a text and source code editor for text terminals and X. It has a vast set of features and is well suited for doing everything from reading mail and simple text editing to managing and editing large programming projects. It has its own help and tutorial which can be accessed by typing Ctrl-h i and Ctrl-h t respectively. Type emacs [filename] to edit a file. For more info, see here.

ESS (24.01.0)

Emacs Speaks Statistics is an Emacs mode for interactive statistical programming and data analysis. Languages supported: the S family (S, S-PLUS and R), SAS, BUGS/JAGS, Stata and XLispStat. First load an R module per our R applications page. Putting the line (load "/usr/local/share/emacs/site-lisp/ess-17.11/lisp/ess-site") in your .emacs file, or its one-time equivalent M-x load-library /usr/local/share/emacs/site-lisp/ess-17.11/lisp/ess-site, will make an *ESS* buffer available.

Nano (2.3.1)

Nano is a simple, user-friendly text editor derived from the editor in the Pine email client. Type nano [filename] to edit a file and use the key commands listed at the bottom of the screen to access various functions. It is equivalent to Pico.

nedit (5.7)

NEdit is an GUI style editor for plain text and source code files. It provides mouse based editing and a streamlined editing style, based on popular Macintosh and MS Windows editors, using the X-window system. NEdit requires an X-based workstation or X-Terminal. Type nedit [filename] to edit a file.

neovim (0.9.5)

Neovim is a refactor, and sometimes redactor, in the tradition of Vim (which itself derives from Stevie). It is not a rewrite but a continuation and extension of Vim.

SciTE (5.4.3)

SciTE or SCIntilla based Text Editor is a cross-platform text editor. Lightweight and built for speed, it is designed mainly for source editing, and performs syntax highlighting and inline function reference for many different languages.

Tex (2024)

Tex + Latex + associated packages for high-quality text formatting.

texinfo (6.8)

Texinfo is the official documentation format of the GNU project

vim (9.0)

a text editor that is upwards compatible to Vi. It can be used to edit all kinds of plain text. It is especially useful for editing programs with syntactical coloring.

vscode (n/a)

Visual Studio Code is a lightweight but powerful source code editor which runs on your desktop and is available for Windows, macOS and Linux. It comes with built-in support for JavaScript, TypeScript and Node.js and has a rich ecosystem of extensions for other languages (such as C++, C#, Java, Python, PHP, Go) and runtimes (such as .NET and Unity).

High-Throughput Sequencing

abyss (2.3.5)

Abyss represents Assembly By Short Sequences - a de novo, parallel, paired-end sequence assembler. The parallel version is implemented using MPI and is capable of assembling larger genomes.

AfterQC (0.9.7)

Automatic Filtering, Trimming, Error Removing and Quality Control for fastq data.

angsd (0.940)

ANGSD is a software for analyzing next generation sequencing data. The software can handle a number of different input types from mapped reads to imputed genotype probabilities. Most methods take genotype uncertainty into account instead of basing the analysis on called genotypes. This is especially useful for low and medium depth data. The software is written in C++ and has been used on large sample sizes.

AnnotSV (3.3.7)

AnnotSV is a program designed for annotating Structural Variations (SV). This tool compiles functionally, regulatory and clinically relevant information and aims at providing annotations useful to i) interpret SV potential pathogenicity and ii) filter out SV potential false positives.

ANNOVAR (2020-06-08)

ANNOVAR is an efficient software tool to utilize update-to-date information to functionally annotate genetic variants detected from diverse genomes.

ascatNgs (4.5.0)

AscatNGS contains the Cancer Genome Projects workflow implementation of the ASCAT copy number algorithm for paired end sequencing.

augustus (3.4.0)

AUGUSTUS is a program that predicts genes in eukaryotic genomic sequences.

autoreject (0.3.1)

This is a library to automatically reject bad trials and repair bad sensors in magneto-/electroencephalography (M/EEG) data.

bam2fastq (1.1.0)

This tool is used to extract raw sequences (with qualities) from bam files.

bamliquidator (1.5.2)

bamliquidator is a set of tools for analyzing the density of short DNA sequence read alignments in the BAM file format.

bamreadcount (1.0.1)

Bam-readcount generates metrics at single nucleotide positions. There are number of metrics generated which can be useful for filtering out false positive calls.

bamtools (2.5.2)

BamTools provides a fast, flexible C++ API & toolkit for reading, writing, and manipulating BAM files.

bamUtil (1.0.15)

bamUtil is a repository that contains several programs that perform operations on SAM/BAM files. All of these programs are built into a single executable, bam.

Bartender (1.1)

Bartender is an accurate clustering algorithm to detect barcodes and their abundances from raw next-generation sequencing data. In contrast with existing methods that cluster based on sequence similarity alone, Bartender uses a modified two-sample proportion test that also considers cluster size. This modification results in higher accuracy and lower rates of under- and over-clustering artifacts.

basespace_cli (1.5.2)

Command line interface for Illumina's BaseSpace

bazam (1.0.1)

A tool to extract paired reads in FASTQ format from coordinate sorted BAM files

bbtools (39.06)

An extensive set of bioinformatics tools including bbmap (short read aligner), bbnorm (kmer based normalization), dedupe (deduplication and clustering of unaligned reads), reformat (formatting and trimming reads) and many more.

bcl-convert (4.1.5)

The Illumina BCL Convert is a standalone local software app that converts the Binary Base Call (BCL) files produced by Illumina sequencing systems to FASTQ files. BCL Convert also provides adapter handling (through masking and trimming) and UMI trimming and produces metric outputs.

bcl2fastq (2.20)

a tool to handle bcl conversion and demultiplexing

bedops (2.4.41)

Bedops is a suite of tools to address common questions raised in genomic studies - mostly with regard to overlap and proximity relationships between data sets - BEDOPS aims to be scalable and flexible, facilitating the efficient and accurate analysis and management of large-scale genomic data.

bedtools (2.31.1)

The BEDTools utilities allow one to address common genomics tasks such finding feature overlaps and computing coverage. In addition, one can develop sophisticated pipelines that answer complicated research questions by "streaming" several BEDTools together.

BETA (1.0.7)

Binding and expression target analysis (BETA) is a software package that integrates ChIP-seq of TFs or chromatin regulators with differential gene expression data to infer direct target genes. The combination of ChIP-seq and transcriptome analysis is a compelling approach to unravel the regulation of gene expression.

bigSCale2 (20191119)

bigSCale is a complete framework for the analysis and visualization of single cell data. It allows to cluster, phenotype, perform pseudotime analysis, infer gene regulatory networks and reduce large datasets in smaller datasets with higher quality.

bioawk (1.0)

Regular awk with support for several common biological data formats, including optionally gzip'ed BED, GFF, SAM, VCF, FASTA/Q and TAB-delimited formats with column names.

biobakery_workflows (3.1)

bioBakery is a meta’omic analysis environment and collection of individual software tools with the capacity to process raw shotgun sequencing data into actionable microbial community feature profiles, summary reports, and publication-ready figures. It includes a collection of preconfigured analysis modules also joined into workflows for reproducibility. Each individual module has been developed to perform a particular task, e.g. quantitative taxonomic profiling or statistical analysis.

biobambam2 (2.0.185-release-20221211202123)

Tools for early stage alignment file processing.

biom-format (2.1.15)

tool (and library) to manipulate Biological Observation Matrix (BIOM) Format files

bismark (0.23.1)

Bismark is a program to map bisulfite treated sequencing reads to a genome of interest and perform methylation calls in a single step. The output can be easily imported into a genome viewer, such as SeqMonk, and enables a researcher to analyse the methylation levels of their samples straight away.

bonito (0.7.3)

A PyTorch Basecaller for Oxford Nanopore Reads

bowtie (1.3.1)

bowtie is an ultrafast, memory-efficient short read aligner geared toward quickly aligning large sets of short DNA sequences (reads) to large genomes.

bowtie2 (2.5.3)

A version of bowtie that's particularly good at aligning reads of about 50 up to 100s or 1,000s of characters, and particularly good at aligning to relatively long (e.g. mammalian) genomes

BRASS (6.3.4)

BRASS analyses one or more related BAM files of paired-end sequencing to determine potential rearrangement breakpoints.

breakdancer (1.4.5)

provides genome-wide detection of structural variants from next generation paired-end sequencing reads.

breseq (0.37.1)

breseq is a computational pipeline for finding mutations relative to a reference sequence in short-read DNA re-sequencing data. It is intended for haploid microbial genomes (<20 Mb).

bsmap (2.90)

BSMAP is a short reads mapping software for bisulfite sequencing reads. Bisulfite treatment converts unmethylated Cytosines into Uracils (sequenced as Thymine) and leave methylated Cytosines unchanged, hence provides a way to study DNA cytosine methylation at single nucleotide resolution. BSMAP aligns the Ts in the reads to both Cs and Ts in the reference.

busco (5.4.7)

BUSCO completeness assessments employ sets of Benchmarking Universal Single-Copy Orthologs from OrthoDB (www.orthodb.org) to provide quantitative measures of the completeness of genome assemblies, annotated gene sets, and transcriptomes in terms of expected gene content.

bwa (0.7.17)

BWA is a fast light-weighted tool that aligns short sequences to a sequence database, such as the human reference genome.

bwa-mem2 (2.2.1)

The next version of the bwa-mem algorithm in bwa.

CADD (1.6.post1)

CADD (Combined Annotation Dependent Depletion) is a tool for scoring the deleteriousness of single nucleotide variants as well as insertion/deletions variants in the human genome. Currently, it supports the builds: GRCh37/hg19 and GRCh38/hg38.

Canu (2.1)

Canu is a fork of the Celera Assembler designed for high-noise single-molecule sequencing (such as the PacBio RSII or Oxford Nanopore MinION). Canu will correct the reads, then trim suspicious regions (such as remaining SMRTbell adapter), then assemble the corrected and cleaned reads into unitigs.

Canvas (1.40)

Canvas is a tool for calling copy number variants (CNVs) from human DNA sequencing data.

ccbrpipeliner (5)

CCBR Pipeliner provides access to a set of best-practices NGS pipelines developed, tested, and benchmarked by experts at CCBR and NCBR for Biowulf. Contact CCBR_Pipeliner@mail.nih.gov with questions

ceas (1.0.2)

Cis-regulatory Element Annotation System is a tool designed to characterize genome-wide protein-DNA interaction patterns from ChIP-chip and ChIP-Seq of both sharp and broad binding factors.

cellbender (0.3.1)

CellBender is a software package for eliminating technical artifacts from high-throughput single-cell omics data, including scRNA-seq, snRNA-seq, and CITE-seq.

cellprofiler (4.2.5)

An open-source application for biological image analysis

cellranger (8.0.0)

Cell Ranger is a set of analysis pipelines that processes Chromium single cell 3’ RNA-seq output to align reads, generate gene-cell matrices and perform clustering and gene expression analysis.

cellranger-arc (2.0.2)

Cell Ranger ARC is a set of analysis pipelines that process Chromium Single Cell Multiome ATAC + Gene Expression sequencing data to generate a variety of analyses pertaining to gene expression, chromatin accessibility and their linkage. Furthermore, since the ATAC and gene expression measurements are on the very same cell, we are able to perform analyses that link chromatin accessibility and gene expression.

cellranger-atac (2.1.0)

Cell Ranger ATAC is a set of analysis pipelines that process Chromium Single Cell ATAC data.

cellsnp-lite (1.2.2)

Efficient genotyping bi-allelic SNPs on single cells

cgpBattenberg (3.5.3)

Detect subclonality and copy number in matched NGS data

checkm2 (1.0.2)

Rapid assessment of genome bin quality using machine learning.

CHESS (0.3.7)

The CHESS (Comparison of Hi-C Experiments using Structural Similarity) application implements an algorithm for the comparison of chromatin contact maps and automatic differential feature extraction.

chopper (0.7.0)

Filtering and trimming for long-read sequencing data (PacBio/ONT).

ChromHMM (1.23)

ChromHMM is software for learning and characterizing chromatin states.

circexplorer2 (2.3.8)

A combined strategy to identify circular RNAs (circRNAs and ciRNAs)

circos (0.69-9)

Circos is a program for the generation of publication-quality, circularly composited renditions of genomic data and related annotations. Circos is particularly suited for visualizing alignments, conservation and intra and inter-chromosomal relationships. Also, Circos is useful to visualize any type of information that benefits from a circular layout. Thus, although it has been designed for the field of genomics, it is sufficiently flexible to be used in other data domains.

Clair3 (1.0.4)

Clair3 is a small variant caller for Illumina, PacBio and ONT long reads. Compare to PEPPER (r0.4), Clair3 (v0.1) shows a better SNP F1-score with ≤30-fold of ONT data (precisionFDA Truth Challenge V2), and a better Indel F1-score, while runs generally four times faster.

clark (1.2.6.1)

A method based on a supervised sequence classification using discriminative k-mers

ClinSV (1.1)

Robust detection of clinically relevant structural and copy number variation from whole genome sequencing data

cnvkit (0.9.9)

Copy number variant detection from targeted DNA sequencing

cnvnator (0.4.1)

CNVnator is a tool for CNV discovery and genotyping from depth of read mapping.

cogentap (2.0.1)

Cogent NGS Analysis Pipeline (CogentAP) is bioinformatic software for analyzing RNA-seq NGS data generated using various takara kits.

combp (0.50.6)

A library to combine, analyze, group and correct p-values in BED files. Unique tools involve correction for spatial autocorrelation. This is useful for ChIP-Seq probes and Tiling arrays, or any data with spatial correlation.

conpair (0.2)

Concordance and contamination estimator for tumor–normal pairs

crispresso (2.2.14)

Software pipeline for the analysis of CRISPR-Cas9 genome editing outcomes from sequencing data

crossmap (0.7.0)

CrossMap is a program for convenient conversion of genome coordinates between different assemblies (e.g. mm9->mm10). It can convert SAM, BAM, bed, GTF, GFF, wig/bigWig, and VCF files

csvkit (1.5.0)

csvkit is a suite of command-line tools for converting to and working with CSV, the king of tabular file formats.

cufflinks (2.2.1)

Cufflinks assembles transcripts and estimates their abundances in RNA-Seq samples. It accepts aligned RNA-Seq reads and assembles the alignments into a parsimonious set of transcripts. Cufflinks then estimates the relative abundances of these transcripts based on how many reads support each one.

cutadapt (4.7)

cutadapt removes adapter sequences from DNA high-throughput sequencing data. This is usually necessary when the read length of the machine is longer than the molecule that is sequenced, such as in microRNA data.

cutruntools2 (2.0)

cutruntools2 is a major update of CutRunTools, including a set of new features specially designed for CUT&RUN and CUT&Tag experiments. Both of the bulk and single-cell data can be processed, analyzed and interpreted.

DANPOS (3.1.1)

A toolkit for Dynamic Analysis of Nucleosome and Protein Occupancy by Sequencing, version 2

DeepCell-tf (0.12.6)

deepconsensus (0.3.1)

DeepConsensus uses gap-aware sequence transformers to correct errors in Pacific Biosciences (PacBio) Circular Consensus Sequencing (CCS) data

deepsignal (2-0.1.3)

A deep-learning method for detecting DNA methylation state from Oxford Nanopore sequencing reads.

deeptools (3.5.4)

deepTools is a suite of user-friendly tools for the visualization, quality control and normalization of data from deep-sequencing DNA sequencing experiments.

deepvariant (1.6.1)

DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data.

defuse (0.8.1)

deFuse is a software package for gene fusion discovery using RNA-Seq data. The software uses clusters of discordant paired end alignments to inform a split read alignment analysis for finding fusion boundaries. The software also employs a number of heuristic filters in an attempt to reduce the number of false positives and produces a fully annotated output for each predicted fusion.

delly (1.1.6)

DELLY is an integrated structural variant prediction method that can detect deletions, tandem duplications, inversions and translocations at single-nucleotide resolution in short-read massively parallel sequencing data. It uses paired-ends and split-reads to sensitively and accurately delineate genomic rearrangements throughout the genome.

DETONATE (1.11)

DETONATE is a tool for evaluation of de novo transcriptome assemblies from RNA-Seq data. It consists of two component packages, RSEM-EVAL and REF-EVAL. RSEM-EVAL is a reference-free evaluation method based on a novel probabilistic model that depends only on an assembly and the RNA-Seq reads used for its construction. REF-EVAL is a toolkit of reference-based measures.

difftf (1.9)

Genome-wide quantification of differential transcription factor activity

distiller-nf (0.3.3)

A modular Hi-C mapping pipeline for reproducible data analysis, it was used for Micro-C analysis too.

dorado (0.6.0)

Dorado is a high-performance, easy-to-use, open source basecaller for Oxford Nanopore reads.

drop (1.3.3)

A pipeline to find aberrant events in RNA-Seq data.

ea-utils (1.04.807)

Command-line tools for processing biological sequencing data. Barcode demultiplexing, adapter trimming, etc.

edd (1.1.19)

EDD is a ChIP-seq peak caller for detection of megabase domains of enrichment.

encode-atac-seq-pipeline (2.1.0)

This pipeline is designed for automated end-to-end quality control and processing of ATAC-seq or DNase-seq data.

EPACTS (3.4.2)

EPACTS (Efficient and Parallelizable Association Container Toolbox) is a versatile software pipeline to perform various statistical tests for identifying genome-wide association from sequence data through a user-friendly interface, both to scientific analysts and to method developers.

eukdetect (1.2)

EukDetect: Detect eukaryotes from shotgun metagenomic data

exomiser (14.0.0)

The Exomiser is a Java program that functionally annotates variants from whole-exome sequencing data starting from a VCF file.

express (1.5.1)

eXpress is a streaming tool for quantifying the abundances of a set of target sequences from sampled subsequences.

fade (0.6.0)

Fragmentase Artifact Detection and Elimination

fanc (0.9.21)

FAN-C is a toolkit for the analysis and visualization of Hi-C data. Beyond objects generated within FAN-C, the toolkit is largely compatible with Hi-C files from Cooler and Juicer.

fastqc (0.12.1)

It provide quality control functions to next gen sequencing data.

fastqtools (0.8.3)

fastq-tools a collection of small and efficient programs for performing some common and uncommon tasks with FASTQ files.

fastq_screen (0.15.3)

FastQ Screen allows you to screen a library of sequences in FastQ format against a set of sequence databases so you can see if the composition of the library matches with what you expect.

fastxtoolkit (0.0.14)

The FASTX-Toolkit is a collection of command line tools for Short-Reads FASTA/FASTQ files preprocessing.

fcs (0.5.0)

FCS is a toolset to remove contaminant sequences from a genome assembly.

fgbio (2.0.2)

The Fulcrum Genomics tools are a set of utilities for working with BAM files, VCF files, and Unique Molecular IDs. Theey are accessed as subprograms from a Java jar, like GATK or Picard.

fithic (2.0.8)

Fit-Hi-C is a tool for assigning statistical confidence estimates to intra-chromosomal contact maps produced by genome-wide genome architecture assays such as Hi-C.

flanker (0.1.5)

Gene-flank analysis tool

flashpca (2.0)

FlashPCA performs fast principal component analysis (PCA) of single nucleotide polymorphism (SNP) data, similar to smartpca from EIGENSOFT (http://www.hsph.harvard.edu/alkes-price/software/) and shellfish (https://github.com/dandavison/shellfish). FlashPCA is based on the https://github.com/yixuan/spectra/ library.

Flexbar (3.5.0)

Flexbar preprocesses high-throughput sequencing data efficiently. It demultiplexes barcoded runs and removes adapter sequences. Moreover, trimming and filtering features are provided. Flexbar increases read mapping rates and improves genome and transcriptome assemblies. It supports next-generation sequencing data in fasta and fastq format, e.g. from Illumina and the Roche 454 platform

flye (2.9.1)

Fast and accurate de novo assembler for single molecule sequencing reads

fqtools (2.0)

Tools for manipulating fastq files

freebayes (1.3.5)

Bayesian haplotype-based polymorphism discovery and genotyping

freec (11.6)

Control-FREEC is a tool for detection of copy-number changes and allelic imbalances (including LOH) using deep-sequencing data

fuseq-wes (1.0.0)

Fusion Gene Detection Using Whole-Exome Sequencing Data in Cancer Patients

fusioncatcher (1.33)

FusionCatcher searches for novel/known somatic fusion genes, translocations, and chimeras in RNA-seq data (paired-end or single-end reads from Illumina NGS platforms like Solexa/HiSeq/NextSeq/MiSeq) from diseased samples.

GATK (4.5.0.0)

GATK, from the Broad Institute, is a structured software library that makes writing efficient analysis tools using next-generation sequencing data very easy, and second it's a suite of tools for working with human medical resequencing projects such as 1000 Genomes and The Cancer Genome Atlas. These tools include things like a depth of coverage analyzers, a quality score recalibrator, a SNP/indel caller and a local realigner.

gem (3.4)

High resolution peak calling and motif discovery for ChIP-seq and ChIP-exo data

Gemini (0.30.2)

GEMINI (GEnome MINIng) is designed to be a flexible framework for exploring genetic variation in the context of the wealth of genome annotations available for the human genome. By placing genetic variants, sample genotypes, and useful genome annotations into an integrated database framework, GEMINI provides a simple, flexible, yet very powerful system for exploring genetic variation for for disease and population genetics.

GEMMA (0.98.5)

GEMMA is the software implementing the Genome-wide Efficient Mixed Model Association algorithm for a standard linear mixed model and some of its close relatives for genome-wide association studies (GWAS).

genomad (1.7.6)

geNomad's primary goal is to identify viruses and plasmids in sequencing data (isolates, metagenomes, and metatranscriptomes).

Genome Browser (457)

The Genome Browser Mirror Fragments is a mirror of the UCSC Genome Browser. The URL is https://hpcnihapps.cit.nih.gov/genome. Users can also access the MySQL databases, supporting files directly, and a huge number of associated executables.

genrich (0.6)

Genrich is a peak-caller for genomic enrichment assays (e.g. ChIP-seq, ATAC-seq). It analyzes alignment files generated following the assay and produces a file detailing peaks of significant enrichment.

GeoMX NGS Pipeline (3.1.1.6)

The GeoMx NGS Pipeline, processes RNA-sequencing files (FASTQ files) from Illumina sequencers according to parameters defined in the Configuration File (which is generated from the GeoMx DSP run). The Pipeline processes information from these files and outputs .dcc files, which can then be uploaded to the GeoMx DSP system for data analysis.

Gistic (2.0.23)

Facilitates sensitive and confident localization of the targets of focal somatic copy-number alteration in human cancers.

gmap-gsnap (2021-12-17)

A Genomic Mapping and Alignment Programs

gopeaks (1.0.0)

GoPeaks is a peak caller designed for CUT&TAG/CUT&RUN sequencing data. GoPeaks by default works best with narrow peaks such as H3K4me3 and transcription factors. However, broad epigenetic marks like H3K27Ac/H3K4me1 require different the step, slide, and minwidth parameters.

graphaligner (1.0.18)

Seed-and-extend program for aligning long error-prone reads to genome graphs.

gridss (2.13.2)

GRIDSS is a module software suite containing tools useful for the detection of genomic rearrangements. GRIDSS includes a genome-wide break-end assembler, as well as a structural variation caller for Illumina sequencing data. GRIDSS calls variants based on alignment-guided positional de Bruijn graph genome-wide break-end assembly, split read, and read pair evidence.

gtex_rnaseq (V10)

This module makes available the tools used in the GTEX RNA-Seq pipeline.

gtool (0.7.5)

GTOOL is a program for transforming sets of genotype data for use with the programs SNPTEST and IMPUTE. GTOOL can be used to (a) generate subsets of genotype data, (b) to convert genotype data between the PED file format and the FILE FORMAT used by SNPTEST and IMPUTE.

gunc (1.0.5)

Genome UNClutterer (GUNC) is a tool for detection of chimerism and contamination in prokaryotic genomes resulting from mis-binning of genomic contigs from unrelated lineages

guppy (6.5.7)

Local accelerated basecalling for Nanopore data

Hail (0.2.99)

Hail is an open-source, scalable framework for exploring and analyzing genomic data.

hap.py (0.3.14)

A set of programs based on htslib to benchmark variant calls against gold standard truth datasets.

hicexplorer (3.7.2)

Tools to process, normalize and visualize Hi-C data

hichipper (0.7.7)

hichipper is a preprocessing and QC pipeline for HiChIP data. This package takes output from a HiC-Pro run and a sample manifest file (.yaml) that coordinates optional high-quality peaks (identified through ChIP-Seq) and restriction fragment locations (see folder here) as input and produces output that can be used to 1) determine library quality, 2) identify and characterize DNA loops and 3) interactively visualize loops.

hicpro (3.1.0)

HiC-Pro: An optimized and flexible pipeline for Hi-C data processing

hic_breakfinder (1.0)

A framework that integrates optical mapping, high-throughput chromosome conformation capture (Hi-C), and whole genome sequencing to systematically detect SVs in a variety of normal or cancer samples and cell lines.

hifiasm (0.19.8)

Hifiasm is a fast haplotype-resolved de novo assembler initially designed for PacBio HiFi reads. Its latest release supports telomere-to-telomere assembly by utilizing ultralong Oxford Nanopore reads. It can produce better haplotype-resolved assemblies when given parental short reads or Hi-C data.

hint (2.2.7)

a computational method to detect CNVs and Translocations from Hi-C data.

hipstr (0.7)

Tool for genotyping short tandom repeats from Illumina sequencing data

hisat (2.2.2.1-ngs3.0.1)

HISAT is a fast and sensitive spliced alignment program which uses Hierarchical Indexing for Spliced Alignment of Transcripts.

HLA-LA (1.0.3)

Fast HLA type inference from whole-genome data. Previously known as HLA-PRG-LA.

HMMRATAC (1.2.10)

HMMRATAC peak caller for ATAC-seq data

homer (4.11.1)

HOMER (Hypergeometric Optimization of Motif EnRichment) is a suite of tools for Motif Discovery and ChIP-Seq analysis.

htgts (2)

High-Throughput Genome-Wide Translocation Sequencing pipeline

htsbox (r346)

HTSbox is a fork of early HTSlib. It is a collection of small experimental tools manipulating HTS-related files. While some of these tools are already part of the official SAMtools package, others are for niche use cases.

htseq (2.0.4)

HTSeq is a Python package that provides infrastructure to process data from high-throughput sequencing assays.

humann (3.6.0)

HUMAnN is a pipeline for efficiently and accurately profiling the presence/absence and abundance of microbial pathways in a community from metagenomic or metatranscriptomic sequencing data (typically millions of short DNA/RNA reads).

IDBA (1.1.3)

IDBA is a practical iterative De Bruijn Graph De Novo Assembler for sequence assembly in bioinfomatics. Most assemblers based on de Bruijn graph build a de Bruijn graph with a specific k to perform the assembling task. For all of them, it is very crucial to find a specific value of k. If k is too large, there will be a lot of gap problems in the graph. If k is too small, there will a lot of branch problems. IDBA uses not only one specific k but a range of k values to build the iterative de Bruijn graph. It can keep all the information in graphs with different k values.

iDiffIR (20220121)

iDiffIR is a tool for identifying differential IR from RNA-seq data. It accepts any sorted, indexed BAM file for single- or paired-end reads.

IDR (2.0.3)

The IDR (Irreproducible Discovery Rate) framework is a uniﬁed approach to measure the reproducibility of ﬁndings identiﬁed from replicate experiments and provide highly stable thresholds based on reproducibility. The IDR method compares a pair of ranked lists of identifications (such as ChIP-seq peaks).

IGV (2.12.3)

The Integrative Genomics Viewer is a high-performance visualization tool for interactive exploration of large, integrated genomic datasets.

IGVTools (2.12.3)

IGVTools provides utilities for working with ascii file formats used by the Integrated Genome Viewer. The files can be sorted, tiled, indexed, and counted.

IMPUTE (2.3.2)

Impute is a program for estimating ("imputing") unobserved genotypes in SNP association studies.

InterVar (2.1.2, 2.1.3)

In 2015, the American College of Medical Genetics and Genomics (ACMG) and the Association for Molecular Pathology (AMP) published updated standards and guidelines for the clinical interpretation of sequence variants with respect to human diseases on the basis of 28 criteria. However, variability between individual interpreters can be extensive because of reasons such as the different understandings of these guidelines and the lack of standard algorithms for implementing them, yet computational tools for semi-automated variant interpretation are not available. To address these problems, InterVar implements these criteria to help human reviewers interpret the clinical significance of variants. InterVar can take a pre-annotated or VCF file as input and generate automated interpretation on 18 criteria.

intervene (0.6.5)

a tool for intersection and visualization of multiple genomic region sets

iva (1.0.11)

IVA is a de novo assembler designed to assemble virus genomes that have no repeat sequences, using Illumina read pairs sequenced from mixed populations at extremely high and variable depth.

iVar (1.3.1)

iVar is a computational package that contains functions broadly useful for viral amplicon-based sequencing. Additional tools for metagenomic sequencing are actively being incorporated into iVar. While each of these functions can be accomplished using existing tools, iVar contains an intersection of functionality from multiple tools that are required to call iSNVs and consensus sequences from viral sequencing data across multiple replicates.

Juicer (1.6)

A One-Click System for Analyzing Loop-Resolution Hi-C Experiments

jvarkit (20211013)

Java tools for bioinformatics

KAT (2.4.2)

KAT (K-mer Analysis Toolkit) is a suite of tools that analyse Jellyfish hashes or sequence files (fasta or fastq) using kmer counts.

kb-python (0.27.3)

kb-python is a python package for processing single-cell RNA-sequencing. It wraps the kallisto | bustools single-cell RNA-seq command line tools in order to unify multiple processing workflows.

KMC (3.1.0)

KMC is a disk-based programm for counting k-mers from (possibly gzipped) FASTQ/FASTA files

KmerGO2 (2.0.1)

KmerGO is a user-friendly tool to identify the group-specific sequences on two groups of high throughput sequencing datasets.

kneaddata (0.11.0)

KneadData is a tool designed to perform quality control on metagenomic and metatranscriptomic sequencing data, especially data from microbiome experiments.

kraken (2.1.2)

Kraken is a system for assigning taxonomic labels to short DNA sequences, usually obtained through metagenomic studies

lefse (1.1.2)

LEfSe (Linear discriminant analysis Effect Size) determines the features (organisms, clades, operational taxonomic units, genes, or functions) most likely to explain differences between classes by coupling standard tests for statistical significance with additional tests encoding biological consistency and effect relevance.

LJA (0.1)

The La Jolla Assembler (LJA) is a tool for assemblies of long and accurate reads. It reduces the error rate in these reads by three orders of magnitude (making them nearly error-free) and constructs the de Bruijn graph for large genomes and large k-mer sizes. Since the de Bruijn graph constructed for a fixed k-mer size is typically either too tangled or too fragmented, LJA uses a new concept of a multiplex de Bruijn graph with varying k-mer sizes.

locuszoom (1.3)

LocusZoom is designed to facilitate viewing of local association results together with useful information about a locus, such as the location and orientation of the genes it includes, linkage disequilibrium coefficients and local estimates of recombination rates

lofreq (2.1.5)

LoFreq is a fast and sensitive variant-caller for inferring SNVs and indels from next-generation sequencing data.

lumpy (0.3.1)

A probabilistic framework for structural variant discovery.

mach2qtl (1.1.3)

mach2qtl uses dosages/posterior probabilities inferred with MACH as predictors in a linear regression to test association with a quantitative trait

macs (3)

Model-based Analysis of ChIP-Seq (MACS) on short reads sequencers such as Genome Analyzer (Illumina / Solexa). MACS empirically models the length of the sequenced ChIP fragments, which tends to be shorter than sonication or library construction size estimates, and uses it to improve the spatial resolution of predicted binding sites. MACS also uses a dynamic Poisson distribution to effectively capture local biases in the genome sequence, allowing for more sensitive and robust prediction.

mafft (7.475)

Multiple alignment program for amino acid or nucleotide sequences

MAGeCK (0.5.9.2)

MAGeCK is Model-based Analysis of Genome-wide CRISPR/Cas9 Knockout (MAGeCK) method for prioritizing single-guide RNAs, genes and pathways in genome-scale CRISPR/Cas9 knockout screens. It demonstrates better performance compared with other methods, identifies both positively and negatively selected genes simultaneously, and reports robust results across different experimental conditions.

mageck-vispr (0.5.6)

MAGeCK-VISPR is a comprehensive quality control, analysis and visualization workflow for CRISPR/Cas9 screens.

magicblast (1.7.0)

Magic-BLAST is a tool for mapping large next-generation RNA or DNA sequencing runs against a whole genome or transcriptome. Each alignment optimizes a composite score, taking into account simultaneously the two reads of a pair, and in case of RNA-seq, locating the candidate introns and adding up the score of all exons. This is very different from other versions of BLAST, where each exon is scored as a separate hit and read-pairing is ignored.

MAJIQ (2.4)

Modeling Alternative Junction Inclusion Quantification. MAJIQ and Voila are two software packages that together define, quantify, and visualize local splicing variations (LSV) from RNA-Seq data.

manorm (1.3.0)

MAnorm is for quantitative comparison of ChIP-Seq data sets describing transcription factor binding sites and epigenetic modifications. The quantitative binding differences inferred by MAnorm showed strong correlation with both the changes in expression of target genes and the binding of cell type-specific regulators.

manta (1.6.0-fork-jykr)

Structural variant and indel caller for mapped sequencing data

maps (1.1.0)

a set of multiple scripts used to analyze PLAC-Seq and HiChIP data.

mash (2.3)

mash is a command line tool and library to provide fast genome and metagenome distance estimation using MinHash. Only command line tool is installed

mbg (1.0.16)

Minimizer based sparse de Bruijn Graph constructor.

medaka (1.10.0)

medaka is a tool to create a consensus sequence from nanopore sequencing data. This task is performed using neural networks applied from a pileup of individual sequencing reads against a draft assembly.

medusa (1.6)

A draft genome scaffolder that uses multiple reference genomes in a graph-based approach.

megadepth (1.2.0)

MEGAHIT (1.2.9)

MEGAHIT is a single node assembler for large and complex metagenomics NGS reads, such as soil. It makes use of succinct de Bruijn graph (SdBG) to achieve low memory assembly. MEGAHIT can optionally utilize a CUDA-enabled GPU to accelerate its SdBG construction.

megalodon (2.5.0)

Megalodon is a research command line tool to extract high accuracy modified base and sequence variant calls from raw nanopore reads by anchoring the information rich basecalling neural network output to a reference genome/transcriptome.

MEGAN (6.24.11)

MEtaGenome ANalyzer that takes a file of reads and a Blast output from comparison against a reference genome, and automatically calculate a taxonomic classification of the reads and if desired, a functional classification.

merfin (1.1)

Improved variant filtering and polishing via k-mer validation

merqury (1.3)

Evaluate genome assemblies with k-mers and more

Meryl (0.0)

Meryl: a genomic k-mer counter (and sequence utility) with nice features. It is built into the Celera Assembler and is also available as a stand-alone application. Meryl uses a sorting-based approach that sorts the k-mers in lexicographical order.

metabat (2.15)

MetaBAT: A robust statistical framework for reconstructing genomes from metagenomic data

metal (2020-05-05)

The METAL software is designed to facilitate meta-analysis of large datasets (such as several whole genome scans) in a convenient, rapid and memory efficient manner.

metaphlan (4.0.3)

MetaPhlAn is a computational tool for profiling the composition of microbial communities (Bacteria, Archaea, Eukaryotes and Viruses) from metagenomic shotgun sequencing data (i.e. not 16S) with species-level. With the newly added StrainPhlAn module, it is now possible to perform accurate strain-level microbial profiling.

miniasm (0.3.r179)

Ultrafast de novo assembly for long noisy reads (though having no consensus step)

minimac (4 (1.0.1))

minimac is a low memory, computationally efficient implementation of the MaCH algorithm for genotype imputation. It is designed to work on phased genotypes and can handle very large reference panels with hundreds or thousands of haplotypes. 'mini' refers to the low amount of computational resources it needs.

minimap2 (2.28)

Minimap2 is a fast sequence mapping and alignment program that can find overlaps between long noisy reads, or map long reads or their assemblies to a reference genome optionally with detailed alignment (i.e. CIGAR).

mirdeep2 (0.1.3)

miRDeep2 is a completely overhauled tool which discovers microRNA genes by analyzing sequenced RNAs.

misopy (0.5.4)

MISO (Mixture-of-Isoforms) is a probabilistic framework that quantitates the expression level of alternatively spliced genes from RNA-Seq data

mitosuite (1.0.9b)

mitosuite is a graphical tool for human mitochondrial genome profiling in massively parallel sequencing

mixcr (4.6.0)

MiXCR is a universal software for fast and accurate analysis of T- and B- cell receptor repertoire sequencing data.

mixer (1.3)

MiXeR is Causal Mixture Model for GWAS summary statistics. The version(1.3) contains a Python port of MiXeR, wrapping the C/C++ core. Also data preprocessing code sumstats.py is included too.

modbamtools (0.4.8)

A set of tools to manipulate and visualize DNA/RNA base modification data that are stored in bam format.

modkit (0.2.6)

A bioinformatics tool for working with modified bases from Oxford Nanopore. Specifically for converting modBAM to bedMethyl files using best practices, but also manipulating modBAM files and generating summary statistics.

modphred (1.0c)

modPhred is a pipeline for detection, annotation and visualisation of DNA/RNA modifications.

monopogen (Aug23.2023)

SNV calling from single cell sequencing

mosaicforecast (0.0.1)

a machine learning method that leverages read-based phasing and read-level features to accurately detect mosaic SNVs (SNPs, small indels) from NGS data.It builds on existing algorithms to achieve a multifold increase in specificity.

mosdepth (0.3.3)

Fast BAM/CRAM depth calculation for WGS, exome, or targeted sequencing.

mtoolbox (1.2.1)

A bioinformatics pipeline aimed at the analysis of mitochondrial DNA (mtDNA) in high throughput sequencing studies.

multiqc (1.20)

aggregates results for various frequently used bioinformatics tools across multiple samples into a nice visual report

MuSE (2.0.1)

MuSE is an approach to somatic variant calling based on the F81 Markov substitution model for molecular evolution, which models the evolution of the reference allele to the allelic composition of the matched tumor and normal tissue at each genomic locus.

muTect (1.1.7)

MuTect is a method developed at the Broad Institute for the reliable and accurate identification of somatic point mutations in next generation sequencing data of cancer genomes.

MutSig (1.41)

MutSig analyzes lists of mutations discovered in DNA sequencing, to identify genes that were mutated more often than expected by chance given background mutation processes.

nanopack (20231214)

Tools for analyzing and processing long reads and alignments

nanopolish (0.14.0)

nanopolish is a software package for signal-level analysis of Oxford Nanopore sequencing data. Nanopolish can calculate an improved consensus sequence for a draft genome assembly, detect base modifications, call SNPs and indels with respect to a reference genome and more (see Nanopolish modules, below).

neusomatic (0.2.1)

NeuSomatic is based on deep convolutional neural networks for accurate somatic mutation detection. With properly trained models, it can robustly perform across sequencing platforms, strategies, and conditions. NeuSomatic summarizes and augments sequence alignments in a novel way and incorporates multi-dimensional features to capture variant signals effectively. It is not only a universal but also accurate somatic mutation detection method.

NGMLR (0.2.7)

NGMLR is a long-read mapper designed to align PacBio or Oxford Nanopore (standard and ultra-long) to a reference genome with a focus on reads that span structural variations.

ngsplot (2.63)

ngsplot is an easy-to-use global visualization tool for next-generation sequencing data.

novocraft (4.03.08)

Package includes aligner for single-ended and paired-end reads from the Illumina Genome Analyser. Novoalign finds global optimum alignments using full Needleman-Wunsch algorithm with affine gap penalties.

NucleoATAC (0.3.4)

NucleoATAC is a software for nucleosome calling using ATAC-seq. It can identify the rotational and translational positions of nucleosomes with up to base-pair resolution and provide quantitative measures of nucleosome occupancy.

Octopus (0.7.4)

Octopus is a mapping-based variant caller that implements several calling models within a unified haplotype-aware framework. Octopus takes inspiration from particle filtering by constructing a tree of haplotypes and dynamically pruning and extending the tree based on haplotype posterior probabilities in a sequential manner. This allows octopus to implicitly consider all possible haplotypes at a given loci in reasonable time.

ont-fast5-api (4.1.0)

Tools to manipulate HDF5 files of the Oxford Nanopore .fast5 file format

pairtools (0.3.0; 1.0.2)

Pairtools is a simple and fast command-line framework to process sequencing data from a Hi-C experiment. Pairtools perform various operations on Hi-C pairs and occupy the middle position in a typical Hi-C data processing pipeline. Pairtools aim to be an all-in-one tool for processing Hi-C pairs.

panaroo (1.3.3)

An updated pipeline for pangenome investigation

parabricks (4.3.0)

The Clara Parabricks toolkit is a set of GPU-accelerated genome analysis tools for secondary analysis of next generation sequencing data.

PartekFlow (11.0.24.0328)

Web interface designed specifically for the analysis needs of next generation sequencing applications including RNA, small RNA, and DNA sequencing.

pb-cpg-tools (2.3.1)

Tools for analyzing CpG/5mC data from PacBio HiFi reads aligned to a reference genome

pbipa (1.8.0)

Improved Phased Assembler (IPA) is the official PacBio software for HiFi genome assembly. IPA was designed to utilize the accuracy of PacBio HiFi reads to produce high-quality phased genome assemblies.

pbsuite (15.8.24)

The PBSuite contains two projects created for analysis of Pacific Biosciences long-read sequencing data: PBHoney and PBJelly. PBHoney is an implementation of two variant-identification approaches designed to exploit the high mappability of long reads (i.e., greater than 10,000 bp). PBHoney considers both intra-read discordance and soft-clipped tails of long reads to identify structural variants. PBJelly is a highly automated pipeline that aligns long sequencing reads (such as PacBio RS reads or long 454 reads in fasta format) to high-confidence draft assembles. PBJelly fills or reduces as many captured gaps as possible to produce upgraded draft genomes.

peakachu (2.2.post1)

A supervised learning framework for chromatin loop detection in genome-wide contact maps.

pear (0.9.11)

PEAR is an ultrafast, memory-efficient and highly accurate pair-end read merger. It is fully parallelized and can run with as low as just a few kilobytes of memory.

peddy (0.4.8)

peddy is used to compare sex and familial relationships given in a PED file with those inferred from a VCF file

PennCNV ( 1.0.5)

PennCNV is a free software tool for Copy Number Variation (CNV) detection from SNP genotyping arrays. Currently it can handle signal intensity data from Illumina and Affymetrix arrays. With appropriate preparation of file format, it can also handle other types of SNP arrays and oligonucleotide arrays.

PEPATAC (0.10.3)

PEPATAC is a robust pipeline for Assay for Transposase-Accessible Chromatin using sequencing (ATAC-seq) built on a loosely coupled modular framework. It may be easily applied to ATAC-seq projects of any size, from one-off experiments to large-scale sequencing projects. It is optimized on unique features of ATAC-seq data to be fast and accurate and provides several unique analytical approaches.

picard (3.1.0)

Picard comprises Java-based command-line utilities that manipulate SAM files, and a Java API (SAM-JDK) for creating new programs that read and write SAM files. Both SAM text format and SAM binary (BAM) format are supported.

picrust (2.5.2)

PICRUSt is a bioinformatics software package designed to predict metagenome functional content from marker gene (e.g., 16S rRNA) surveys and full genomes.

pindel (0.2.5)

Pindel can detect breakpoints of large deletions, medium sized insertions, inversions, tandem duplications and other structural variants at single-based resolution from next-gen sequence data. It uses a pattern growth approach to identify the breakpoints of these variants from paired-end short reads.

Platypus (0.8.1)

tool for variant-detection in high-throughput sequencing data.

plink (3.6-alpha)

PLINK is whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.

pod5 (0.3.6)

Tool for manipulating the pod5 format of nanopore reads

pomoxis (0.3.15)

Pomoxis comprises a set of basic bioinformatic tools tailored to nanopore sequencing. Notably tools are included for generating and analysing draft assemblies. Many of these tools are used by the research data analysis group at Oxford Nanopore Technologies.

porechop (0.2.4)

Trim/demultiplex Oxford Nanopore reads

poretools (0.6.1a1)

Poretools is a toolkit for manipulating and exploring nanopore sequencing data sets. Poretools operates on individual FAST5 files, directory of FAST5 files, and tar archives of FAST5 files.

preseq (3.1.2)

predicting library complexity and genome coverage in high-throughput sequencing

PRINSEQ (0.20.4)

PRINSEQ is a tool that generates summary statistics of sequence and quality data and that is used to filter, reformat and trim next-generation sequence data. It is particular designed for 454/Roche data, but can also be used for other types of sequence data.

proseq (2.0)

proseq-2.0 is a pipeline for preprocesses and alignment of run-on sequencing (PRO/GRO/ChRO-seq) data from Single-Read or Paired-End Illumina Sequencing

pvactools (4.0.1)

pVACtools is a cancer immunotherapy suite consisting of pVACseq, pVACfuse, pVACvector

pychopper (2.7.1)

Pychopper v2 is a tool to identify, orient and trim full-length Nanopore cDNA reads. The tool is also able to rescue fused reads.

pyclone (0.13.1)

PyClone is statistical model and software tool designed to infer the prevalence of point mutations in heterogeneous cancer samples.

pycoQC (2.5.2)

pycoQC is a new tool to generate interactive quality control metrics and plots from basecalled nanopore reads or summary files generated by the basecallers Albacore, Guppy or MinKNOW. pycoQC has several novel features, including: 1) python support for creation of dynamic D3.js visualizations and interactive data exploration in Jupyter Notebooks; 2) simple command line interface to generate customizable interactive HTML reports; and 3) multiprocessing FAST5 feature extraction program to generate a summary file directly from FAST5 files.

pyega3 (5.0.2)

A download client for the European Genome-phenome Archive (EGA). The EGA is designed to be a repository for all types of sequence and genotype experiments, including case-control, population, and family studies.

qctool (2.2)

QCTOOL is a command-line utility program for basic quality control of gwas datasets.

qualimap (2.2.1)

A platform-independednt application written in Java and R that provides both a GUI and a co mmand-line interface to facilitate the quality control of alignment sequencing data.

quast (5.2.0)

QUAST stands for QUality ASsessment Tool. The tool evaluates genome assemblies by computing various metrics. The package includes the general QUAST tool for genome assemblies, MetaQUAST, the extension for metagenomic datasets, and Icarus, interactive visualizer for these tools.

racon (1.4.3)

Ultrafast consensus module for raw de novo genome assembly of long uncorrected reads.

ragtag (2.1.0)

RagTag is a collection of software tools for scaffolding and improving modern genome assemblies.

raremetal (4.15.1)

RAREMETAL is a computationally efficient tool for meta-analysis of rare variants using sequencing or genotyping array data.

Rcorrector (1.0.5)

Rcorrector implements a k-mer based method to correct random sequencing errors in Illumina RNA-seq reads. Rcorrector uses a De Bruijn graph to compactly represent all trusted k-mers in the input reads. Unlike WGS read correctors, which use a global threshold to determine trusted k-mers, Rcorrector computes a local threshold at every position in a read.

regtools (1.0.0)

regtools:Tools that integrate DNA-seq and RNA-seq data to help interpret mutations in a regulatory and splicing context

RepeatMasker (4.1.6)

RepeatMasker is a program that screens DNA sequences for interspersed repeats and low complexity DNA sequences. The output of the program is a detailed annotation of the repeats that are present in the query sequence as well as a modified version of the query sequence in which all the annotated repeats have been masked (default: replaced by Ns). On average, almost 50% of a human genomic DNA sequence currently will be masked by the program.

repeatmodeler (2.0.1)

RepeatModeler is a de novo transposable element (TE) family identification and modeling package. RepeatModeler assists in automating the runs of the various algorithms given a genomic database, clustering redundant results, refining and classifying the families and producing a high quality library of TE families suitable for use with RepeatMasker and ultimately for submission to the Dfam database (http://dfam.org).

REViewer (0.2.7)

REViewer is a tool for visualizing alignments of reads in regions containing tandem repeats. REViewer requires a BAMlet with graph-realigned reads generated by ExpansionHunter and the corresponding variant catalog.

rgt (0.13.2)

Regulatory Genomics Toolbox: Python library and set of tools for the integrative analysis of high throughput regulatory genomics data. http://www.regulatory-genomics.org

rilseq (0.82)

RILseq computational protocol

rnaseqc (2.4.2)

RNA-SeQC is a java program which computes a series of quality control metrics for RNA-seq data.

rockhopper (2.0.3)

Rockhopper is a comprehensive and user-friendly system for computational analysis of bacterial RNA-seq data. As input, Rockhopper takes RNA sequencing reads output by high-throughput sequencing technology (FASTQ, QSEQ, FASTA, SAM, or BAM files)

ROSE (1.3.1)

ROSE (Rank Ordering of Super-Enhancers) is tool for (1) creating stitched enhancers, and (2) separating super-enhancers from typical enhancers. given sequencing data (.bam) and a file of previously identified constituent enhancers (.gff)

RSD (1.1.7)

Reciprocal Smallest Distance (RSD) is a pairwise orthology algorithm that uses global sequence alignment and maximum likelihood evolutionary distance between sequences to accurately detects orthologs between genomes.

rsem (1.3.3)

RSEM is a software package for estimating gene and isoform expression levels from RNA-Seq data.

rseqc (5.0.3)

Rseqc comprehensively evaluate RNA-seq datasets generated from clinical tissues or other well annotated organisms such as mouse, fly and yeast.

rtg-tools (3.8.4)

variant detection for singletons, families, large pedigrees and populations, cancer, structural variant and CNV analysis, and microbial and metagenomic analysis

rvtests (2.1.0)

Rare Variant tests is a flexible software package for genetic association studies. It is designed to support unrelated individual or related (family-based) individuals

sailfish (0.10.0)

Sailfish is a tool for transcript quantification from RNA-seq data. It requires a set of target transcripts (either from a reference or de-novo assembly) to quantify. All that is needed to run sailfish is a fasta file containing your reference transcripts and a (set of) fasta/fastq file(s) containing your RNA-Seq reads.

salmon (1.10.1)

a tool for quantifying the expression of transcripts using RNA-seq data.

SalmonTE (0.4)

SalmonTE is an ultra-Fast and Scalable Quantification Pipeline of Transpose Element (TE) Abundances.

sambamba (1.0.1)

Sambamba is a high performance modern robust and fast tool (and library), written in the D programming language, for working with SAM and BAM files. Current parallelised functionality is an important subset of samtools functionality, including view, index, sort, markdup, and depth.

samblaster (0.1.26)

samblaster is a program for marking duplicates and finding discordant/split read pairs in read-id grouped paired-end SAM files. When marking duplicates, samblaster will use about 20MB per 1M read pairs. In a read-id grouped SAM file all alignments for a read-id (QNAME) are continuous. Aligners naturally produce such files. They can also be created by sorting a SAM file by read-id.

samtools (1.19)

The samtools package now provides samtools, bcftools, tabix, and the underlying htslib library.

scallop (0.10.5)

Scallop is a reference-based transcript assembler.

scalpel (0.5.4)

Bioinformatics pipeline for discovery of genetic variants from NGS reads.

scanpy (1.8.1)

Scanpy is a scalable toolkit for analyzing single-cell gene expression data. It includes preprocessing, visualization, clustering, pseudotime and trajectory inference and differential expression testing. The Python-based implementation efficiently deals with datasets of more than one million cells.

schicexplorer (7)

scHiCExplorer is a set of programs to process, normalize, analyse and visualize single-cell Hi-C data

Scramble (1.0.2)

Scramble is a mobile element insertion (MEI) detection tool. It identifies clusters of soft clipped reads in a BAM file, builds consensus sequences, aligns to representative L1Ta, AluYa5, and SVA-E sequences, and outputs MEI calls.

scvitools (1.1.2)

scvi-tools (single-cell variational inference tools) is a package for end-to-end analysis of single-cell omics data primarily developed and maintained by the Yosef Lab at UC Berkeley.

seqan (2.4.0)

SeqAn is an open source C++ library of efficient algorithms and data structures for the analysis of sequences with the focus on biological data. It applies a unique generic design that guarantees high performance, generality, extensibility, and integration with other libraries. This package also contains a suite of apps, including Fiona, Gustaf, Mason, RazerS 3, Yara, SeqAn T-Coffee, Stellar, and searchjoin.

seqkit (2.7.0)

A cross-platform toolkit for FASTA/Q file manipulation

seqtk (1.4)

seqtk is a toolkit for processing sequences in FASTA/Q formats

sequenza-utils (3.0.0)

Sequenza-utils is The supporting python library for the sequenza R package. Sequenza is a project the estimate purity/ploidy and copy number alteration from tumor sequencing experiments. Sequenza-utils provide command lines programs to transform common NGS file type, such as BAM, pileup and VCF, to input files for the R package

shapeit (5.1.0)

SHAPEIT is a fast and accurate haplotype inference software

shasta (0.11.1)

De novo assembly from Oxford Nanopore reads

shmlast (1.6)

shmlast is a reimplementation of the Conditional Reciprocal Best Hits algorithm for finding potential orthologs between a transcriptome and a species-specific protein database. It uses the LAST aligner and the pydata stack to achieve much better performance while staying in the Python ecosystem.

shrimp (2_2_3)

SHRiMP is a software package for aligning genomic reads against a target genome. It was primarily developed with the multitudinous short reads of next generation sequencing machines in mind, as well as Applied Biosystem's colourspace genomic representation.

sicer (2-1.0.3)

A clustering approach for identification of enriched domains from histone modification ChIP-Seq data

sickle (1.33)

A windowed adaptive trimming tool for FASTQ files using quality

SIFT (6.2.1)

SIFT predicts whether an amino acid substitution affects protein function based on sequence homology and the physical properties of amino acids.

slamdunk (0.4.3)

SlamDunk is a novel, fully automated software tool for automated, robust, scalable and reproducible SLAMseq data analysis.

slapnap (20210507)

The slapnap container is a tool for using the Compile, Analyze and Tally NAb Panels (CATNAP; Yoon et al. 2015) database to develop predictive models of HIV-1 neutralization sensitivity to one or several broadly neutralizing antibodies (bnAbs).

smrtanalysis (13.1)

SMRT® Analysis is a bioinformatics software suite available for analysis of DNA sequencing data from Pacific Biosciences’ SMRT technology. Users can choose from a variety of analysis protocols that utilize PacBio® and third-party tools. Analysis protocols include de novo genome assembly, cDNA mapping, DNA base-modification detection, and long-amplicon analysis to determine phased consensus sequences.

sniffles (2.3.2)

Sniffles is a structural variation caller using third generation sequencing (PacBio or Oxford Nanopore). It detects all types of SVs (10bp+) using evidence from split-read alignments, high-mismatch regions, and coverage analysis.

snippy (4.6.0)

snippy is a tool for rapid haploid variant calling and core genome alignment

snp2hla (1.0.3)

SNP2HLA is a tool to impute amino acid polymorphisms and single nucleotide polymorphisms in human luekocyte antigenes (HLA) within the major histocompatibility complex (MHC) region in chromosome 6.

snpEff (5.2)

snpEff is a variant annotation and effect prediction tool. It annotates and predicts the effects of variants on genes (such as amino acid changes).

snptest (2.5.6)

SNPTEST is a program for the analysis of single SNP association in genome-wide studies. The tests implemented include * Binary (case-control) phenotypes, single and multiple quantitative phenotypes * Bayesian and Frequentist tests * Ability to condition upon an arbitrary set of covariates * Various different methods for the dealing with imputed SNPs. The program is designed to work seamlessly with the output of both the genotype calling program CHIAMO, the genotype imputation program IMPUTE and the program GTOOL.

somaticsniper (1.0.5.0)

The purpose of this program is to identify single nucleotide positions that are different between tumor and normal (or, in theory, any two bam files). It takes a tumor bam and a normal bam and compares the two to determine the differences. It outputs a file in a format very similar to Samtools consensus format.

sortmeRNA (4.3.6)

SortMeRNA is a local sequence alignment tool for filtering, mapping and clustering. The core algorithm is based on approximate seeds and allows for sensitive analysis of NGS reads. The main application of SortMeRNA is filtering rRNA from metatranscriptomic data. SortMeRNA takes as input a file of reads (fasta or fastq format) and one or multiple rRNA database file(s), and sorts apart aligned and rejected reads into two files specified by the user.

spaceranger (3.0.0)

10x pipeline for processing Visium spatial RNA-seq data

spades (3.15.5)

SPAdes – St. Petersburg genome assembler – is intended for both standard isolates and single-cell MDA bacteria assemblies.

speedseq (0.1.2-20180208-4e60002)

SpeedSeq is a genome analysis platform designed for rapid whole-genome variant detection and interpretation

spipe (1.20.0)

Split-pool combinatorial barcoding makes it possible to scale projects to hundreds of samples and millions of cells, overcoming limitations of previous droplet based technologies. Spipe (split-pipe) implements combinatorial barcoding method for single cell RNA sequencing (scRNA-seq) with dramatically improved sensitivity.

sratoolkit (3.0.10)

The NCBI SRA Toolkit enables reading ("dumping") of sequencing files from the SRA database and writing ("loading") files into the .sra format.

SRST2 (0.2.0)

SRST2 is a a read mapping-based tool for rapid molecular typing of bacterial pathogens. It allows fast and accurate detection of genes, alleles and multi-locus sequence types (MLST) from WGS data. SRST2 is highly accurate and outperforms assembly-based methods in terms of both gene detection and allele assignment.

stampy (1.0.32)

Short read aligner

STAR (2.7.10b)

Spliced Transcripts Alignment to a Reference

STAR-Fusion (1.12.0)

Transcript fusion detection

straglr (1.4.1)

Tandem repeat expansion detection or genotyping from long-read alignments

STREAM (20180816)

STREAM stands for Single-cell Trajectories Reconstruction, Exploration And Mapping ofomics data. It is an interactive pipeline capable of disentangling and visualizing complex branching trajectories from both single-cell transcriptomic and epigenomic data.

strelka (2.9.10)

Strelka is an analysis package designed to detect somatic SNVs and small indels from the aligned sequencing reads of matched tumor-normal samples.

stringtie (2.2.1)

StringTie is a fast and highly efficient assembler of RNA-Seq alignments into potential transcripts. It is primarily a genome-guided transcriptome assembler, although it can borrow algorithmic techniques from de novo genome assembly to help with transcript assembly.

stripy-pipeline (1.2)

STRipy-pipeline is non-graphical command-line version of STRipy that can be integrated into pipelines and analyse multiple STR or VNTR loci in parallel.

subread (2.0.3)

High-performance read alignment, quantification and mutation discovery

suppa (2.3)

Fast, accurate, and uncertainty-aware differential splicing analysis across multiple conditions

SURVIVOR (1.0.7)

SURVIVOR is a tool set for simulating/evaluating SVs, merging and comparing SVs within and among samples, and includes various methods to reformat or summarize SVs.

svanna (1.0.3)

The svanna is an efficient and accurate pathogenicity prediction for coding and regulatory structural variants in long-read genome sequencing.

svtyper (0.7.1)

Svtyper is a Bayesian genotyper for structural variants.

svviz (1.6.2)

svviz visualizes high-throughput sequencing data relevant to a structural variant. Only reads supporting the variant or the reference allele will be shown. svviz can operate in both an interactive web browser view to closely inspect individual variants, or in batch mode, allowing multiple variants (annotated in a VCF file) to be analyzed simultaneously.

syri (1.6.3)

Syri compares alignments between two chromosome-level assemblies and identifies synteny and structural rearrangements.

talon (6.0)

TALON is a Python package for identifying and quantifying known and novel genes/isoforms in long-read transcriptome data sets.

taxonkit (0.12.0)

A Cross-platform and Efficient NCBI Taxonomy Toolkit

telescope (6cd5525)

Single locus resolution of Transposable ELEment expression. Telescope estimates transposable element expression (retrotranscriptome) resolved to specific genomic locations. It directly addresses uncertainty in fragment assignment by reassigning ambiguously mapped fragments to the most probable source transcript as determined within a Bayesian statistical model.

telseq (0.0.2)

TelSeq is a software that estimates telomere length from whole genome sequencing data (BAMs).

tetoolkit (2.2.3)

A package for including transposable elements in differential enrichment analysis of sequencing datasets.

THetA (0.7-20-g94fd772)

Tumor Heterogeneity Analysis (THetA) is an algorithm used to estimate tumor purity and clonal/subclonal copy number aberrations simultaneously from high-throughput DNA sequencing data.

tombo (1.5.1)

a suite of tools primarily for the identification of modified nucleotides from nanopore sequencing data.

tophat (2.1.2)

TopHat is a fast splice junction mapper for RNA-Seq reads. It aligns RNA-Seq reads to mammalian-sized genomes using the ultra high-throughput short read aligner Bowtie, and then analyzes the mapping results to identify splice junctions between exons.

TPMCalculator (0.0.4)

TPMCalculator quantifies mRNA abundance directly from the alignments by parsing BAM files

transcript_clean (2.0.3)

TranscriptClean is a Python program that corrects mismatches, microindels, and noncanonical splice junctions in long reads that have been mapped to the genome. It is designed for use with sam files from the PacBio Iso-seq and Oxford Nanopore transcriptome sequencing technologies

TransDecoder (5.5.0)

TransDecoder identifies candidate coding regions within transcript sequences, such as those generated by de novo RNA-Seq transcript assembly using Trinity, or constructed based on RNA-Seq alignments to the genome using Tophat and Cufflinks.

transvar (2.5.9)

TransVar is a versatile annotator for 3-way conversion and annotation among genomic characterization(s) of mutations and transcript-dependent annotation(s).

trimAl (1.2rev59)

trimAl is a tool for the automated removal of spurious sequences or poorly aligned regions from a multiple sequence alignment. It can consider several parameters, alone or in multiple combinations, in order to select the most-reliable positions in the alignment. These include the proportion of sequences with a gap, the level of residue similarity and, if several alignments for the same set of sequences are provided, the consistency level of columns among alignments. Moreover, trimAl is able to manually select a set of columns to be removed from the alignment.

trimgalore (0.6.7)

Consistent quality and adapter trimming for RRBS or standard FastQ files.

trimmomatic (0.39)

Trimmomatic performs a variety of useful trimming tasks for illumina paired-end and single ended data.

trinity (2.15.1)

Trinity, developed at the Broad Institute and the Hebrew University of Jerusalem, represents a novel method for the efficient and robust de novo reconstruction of transcriptomes from RNA-seq data.

Trinotate (4.0.2)

Trinotate is a comprehensive annotation suite designed for automatic functional annotation of transcriptomes, particularly de novo assembled transcriptomes, from model or non-model organisms.

tRNAscan-SE (2.0.9)

tRNAscan-SE 2.0 has advanced the state-of-the-art methodology in tRNA gene detection and functional prediction, captured by rich new content of the companion Genomic tRNA Database

ultraplex (1.2.5)

Ultraplex is primarily designed for the demultiplexing of sequencing data generated using in-house library preparation protocols with custom adaptors

umitools (1.1.2)

tools for dealing with Unique Molecular Identifiers (UMIs)/Random Molecular Tags (RMTs) and single cell RNA-Seq cell barcodes

vatools (5.1.0)

VAtools is a python package that includes several tools to annotate VCF files with data from other tools.

vcf2maf (1.6.21)

A smarter, more reproducible, and more configurable tool for converting a VCF to a MAF.

vcfanno (0.3.3)

annotate a VCF with other VCFs/BEDs/tabixed files

vcflib (1.0.3)

a simple C++ library for parsing and manipulating VCF files, + many command-line utilities

vcftools (0.1.16)

VCFtools contains a Perl API (Vcf.pm) and a number of Perl scripts that can be used to perform common tasks with VCF files such as file validation, file merging, intersecting, complements, etc.

VEP (111)

VEP (Variant Effect Predictor) determines the effect of your variants (SNPs, insertions, deletions, CNVs or structural variants) on genes, transcripts, and protein sequence, as well as regulatory regions.

verifybamid (2.0.1)

verifyBamID is a software that verifies whether the reads in particular file match previously known genotypes for an individual (or group of individuals), and checks whether the reads are contaminated as a mixture of two samples. verifyBamID can detect sample contamination and swaps when external genotypes are available. When external genotypes are not available, verifyBamID still robustly detects sample swaps.

verkko (1.4.1)

Verkko is a hybrid genome assembly pipeline developed for telomere-to-telomere assembly of PacBio HiFi and Oxford Nanopore reads.

viper (0-20231003-1525270)

VIPER combines the use of several dozen RNA-seq tools, suites, and packages to create a complete pipeline that takes RNA-seq analysis from raw sequencing data all the way through alignment, quality control, unsupervised analyses, differential expression, and downstream pathway analysis

VirSorter2 (2.2.3)

VirSorter2 is a DNA and RNA virus identification tool. It leverages genome-informed database advances across a collection of customized automatic classifiers to improve the accuracy and range of virus sequence detection.

VIRTUS (2.0.1)

Bioinformatics pipeline for viral transcriptome detection.

vsearch (2.22.1)

VSEARCH supports de novo and reference based chimera detection, clustering, full-length and prefix dereplication, rereplication, reverse complementation, masking, all-vs-all pairwise global alignment, exact and global alignment searching, shuffling, subsampling and sorting. It also supports FASTQ file analysis, filtering, conversion and merging of paired-end reads.

vt (0.57721)

vt is a variant tool set that discovers short variants from Next Generation Sequencing data.

winnowmap (2.03)

winnowmap is used for mapping ONT and PacBio reads to repetitive reference sequences.

xeniumranger (1.7.0.2)

Pipeline to process Xenium In Situ Gene Expression data

xHLA (2018-04-04)

The HLA gene complex on human chromosome 6 is one of the most polymorphic regions in the human genome and contributes in large part to the diversity of the immune system. Accurate typing of HLA genes with short-read sequencing data has historically been difficult due to the sequence similarity between the polymorphic alleles. xHLA iteratively refines the mapping results at the amino acid level to achieve high typing accuracy for both class I and II HLA genes.

XHMM (2016-01-04)

XHMM uses principal component analysis (PCA) normalization and a hidden Markov model (HMM) to detect and genotype copy number variation (CNV) from normalized read-depth data from targeted sequencing experiments.

yak (r69)

Yet another K-mer analyzer. Yak is initially developed for two specific use cases: 1) to robustly estimate the base accuracy of CCS reads and assembly contigs, and 2) to investigate the systematic error rate of CCS reads.

Image Analysis

3DSlicer (5.2.2)

A software platform for the analysis (including registration and interactive segmentation) and visualization (including volume rendering) of medical images and for research in image guided therapy.

AFNI (current-py3)

AFNI (Analysis of Functional NeuroImages) is a set of C programs for processing, analyzing, and displaying functional MRI (FMRI) data - a technique for mapping human brain activity.

ANTs (2.4.2)

Advanced Normalization Tools (ANTs) extracts information from complex datasets that include imaging. Paired with ANTsR (answer), ANTs is useful for managing, interpreting and visualizing multidimensional data.

AreTomo (1.3.4)

Alignment and Reconstruction for Electron Tomography

B-SOID (1.3)

baracus (1.1.4)

Baracus predicts brain age, based on data from Freesurfer. It combines data from cortical thickness, cortical surface area, and subcortical information

brkraw (0.3.7)

The ‘BrkRaw’ is a python module designed to provide a comprehensive tool to access raw data acquired from Bruker Biospin preclinical MRI scanner. This module is also compatible with the zip compressed data to enable use of the archived data directly. The module is comprised of four components, including graphical user interface (GUI), command-line tools, high-level and low-level python APIs.

Bsoft (1.9.0)

Bsoft is a collection of programs and a platform for development of software for image and molecular processing in structural biology. Problems in structural biology are approached with a highly modular design, allowing fast development of new algorithms without the burden of issues such as file I/O. It provides an easily accessible interface, a resource that can be and has been used in other packages.

c3d (1.1.0)

C3D is a command-line tool for converting 3D images between common file formats. The tool also includes a growing list of commands for image manipulation, such as thresholding and resampling.

cellpose (2.1.1)

A generalist algorithm for cellular segmentation with human-in-the-loop capabilities.

civet (2.1.1)

civet is a brain-imaging pipeline for analysis of large MR data sets. civet extracts and analyses cortical surfaces from MR images, as well as many other volumetric and corticometric functions.

cmtk (3.3.2)

CMTK is a Software toolkit for computational morphometry of biomedical images. CMTK provides a set of command line tools for processing and I/O.

CompuCell3D (4.5.0)

CompuCell3D is a multiscale multicellular virtual tissue modeling and simulation environment. CompuCell3D is written in C++ and provides Python bindings for model and simulation development in Python. CompuCell3D is supported on Windows, Mac and Linux.

connectome-workbench (1.5.0)

Tools to browse, download, explore, and analyze data from the Human Connectome Project (HCP). Allows users to compare their own data to that of the HCP.

coolbox (0.3.8)

CoolBox is an open-source, user-friendly toolkit for visual analysis of genomics data. It is highly compatible with the Python ecosystem and customizable with a well-designed user interface. It can bed used, for example, to produce high-quality genome track plots or fetch commonly used genomic data files with a Python script or command line, to explore genomic data interactively within Jupyter environment or web browser.

cpac (1.8.6)

A configurable, open-source, Nipype-based, automated processing pipeline for resting state fMRI data.

CTF (6.1)

The CTF MEG software has two main roles: - Provide a human-machine interface to the CTF MEG elec- tronics to collect MEG and/or EEG data. - Provide a tool for reviewing and (to a limited extent) ana- lyzing the MEG and/or EEG data acquired by the CTF MEG system.

dcm2bids (3.1.0)

Tool to convert dcm images to the neuroimaging BIDS format

dcm2niix (1.0.20211006)

DICOM to NIfTI converter

DeepCAD (20210826)

DeepCAD is a self-supervised deep-learning method for spatiotemporal enhancement of calcium imaging data that does not require any high signal-to-noise ratio (SNR) observations. DeepCAD suppresses detection noise and improves the SNR more than tenfold, which reinforces the accuracy of neuron extraction and spike inference and facilitates the functional analysis of neural circuits.

DeepLabCut (2.3.9)

deepmedic (0.8.4)

dmriprep (0.5.0)

Pipeline for preprocessing diffusion MRI datasets.

dynamo (1.1.532)

Dynamo is a software environment for subtomogram averaging of cryo-EM data.

elastix (4.9; 5.1.0)

a toolbox for rigid and nonrigid registration of images.

fastsurfer (1.1.1)

Fastsurfer is a neuroimaging pipeline based on deep learning.

Fiji (1.52f)

Fiji is a distribution of the popular open-source software ImageJ focused on biological-image analysis. Fiji uses modern software engineering practices to combine powerful software libraries with a broad range of scripting languages to enable rapid prototyping of image-processing algorithms. Fiji facilitates the transformation of new algorithms into ImageJ plugins that can be shared with end users through an integrated update system.

fitlins (0.9.1)

Fitlins fits linear models to BIDS neuroimaging datasets.

fmriprep (23.1.4)

A Robust Preprocessing Pipeline for fMRI Data

Freesurfer (7.4.1)

Freesurfer is a set of automated tools for reconstruction of the brain's cortical surface from structural MRI data, and overlay of functional MRI data onto the reconstructed surface.

FSL (6.0.6)

FSL is a comprehensive library of image analysis and statistical tools for FMRI, MRI and DTI brain imaging data.

hitips (1.0.4)

HiTIPS: High-Throughput Image Processing Software for the Study of Nuclear Architecture and Gene Expression. Documentations: https://hitips.readthedocs.io/en/latest/

Huygens (23.04.0-p6)

Huygens is an image restoration, deconvolution, resolution and noise reduction. It can process images from all current optical microscopes, including wide-field, confocal, Nipkow (scanning disk confocal), multiple-photon, and 4Pi microscopes.

Imaris (9.7)

Imaris provides scientists with solutions for processing, visualizing and analyzing multi-dimensional microscopic images. It reads images in many of the most commonly used proprietary formats.

IMOD (4.12.25)

IMOD is a set of image processing, modeling and display programs used for tomographic reconstruction and for 3D reconstruction of EM serial sections and optical sections.

IsoNet (0.2.1)

ITK-SNAP (3.8.0)

ITK-SNAP is a tool for segmentation of 3D biomedical images. It requires a graphical connection to run on the cluster.

ITK-SNAP-BS (20131007)

ITK-SNAP brings active contour segmentation to the fingertips of clinical researchers. This application fulfills a specific and pressing need of biomedical imaging research by providing a combination of manual and semiautomatic tools for extracting structures in 3D image data of different modalities and from different anatomical regions.

laynii (2.4.0)

Tools to analyze layer fMRI datasets

magetbrain (1.0)

Given a set of labelled MR images (atlases) and unlabelled images (subjects), MAGeT produces a segmentation for each subject using a multi-atlas voting procedure based on a template library made up of images from the subject set.

mango (4.1)

Mango (Multi-image Analysis GUI) is a viewer for medical research images. It provides analysis tools and a user interface to navigate image volumes.

membrain-seg (0.0.1)

3D membrane segmentation for cryo-electron tomography.

minc-toolkit (1.9.18)

This metaproject bundles multiple MINC-based packages that historically have been developed somewhat independently

MIPAV (11.0.3)

The MIPAV (Medical Image Processing, Analysis, and Visualization) application enables quantitative analysis and visualization of medical images of numerous modalities such as PET, MRI, CT, or microscopy.

MONAILabel (0.3.2; 0.4.2)

MONAI Label is a free and open-source platform that facilitates the development of AI-based applications that aim at reducing the time required to annotate 3D medical image datasets. It allows researchers to readily deploy their apps as services, which can be made available to clinicians via their preferred user-interface. Currently, MONAI Label readily supports locally installed (3DSlicer) and web-based (OHIF) frontends, and offers two Active learning strategies to facilitate and speed up the training of segmentation algorithms. MONAI Label allows researchers to make incremental improvements to their labeling apps by making them available to other researchers and clinicians alike.

mricron (1.0.20190902)

Viewer for several types of brain scan formats

mriqc (23.1.0)

MRIQC is an MRI quality control tool

mrtrix (3.0.4)

MRtrix provides a large suite of tools for image processing, analysis and visualisation, with a focus on the analysis of white matter using diffusion-weighted MRI.

mrtrix3tissue (5.2.9)

MRtrix3Tissue is a fork of the MRtrix3 project. It aims to add capabilities for 3-Tissue CSD modelling and analysis to a complete version of the MRtrix3 software.

napari (0.4.18)

napari is an image viewer.Can be used to annotate images.

nibabies (22.0.2)

Preprocessing pipeline for neonate and infant MRI.

OpenSlide (3.4.1)

OpenSlide is a C library for reading and manipulating digital slides of diverse vendor formats. It provides a simple interface to read whole-slide images (also known as virtual slides). OpenSlide has been used in the digital pathology projects.

PEET (1.15.0)

PEET (Particle Estimation for Electron Tomography) is an open-source package for aligning and averaging particles in 3-D subvolumes extracted from tomograms. It seeks the optimal alignment of each particle against a reference volume through several iterations. If PEET and IMOD are both installed, most PEET operations are available from the eTomo graphical user interface in IMOD.

petprep_hmc (0.06)

Positron Emission Tomography (PET) is a state-of-the-art neuroimaging tool for quantification of the in vivo spatial distribution of specific molecules in the brain. It is affected by various kinds of patient movement during a scan. The PETprep_HMC application allows for correction of the PET results in the presence of head motions.

plastimatch (1.9.3)

Application for registration of medical images such as X-rays, CT, MRI and PET

PyTom (1.0)

qsiprep (0.19.1)

qsiprep configures pipelines for processing diffusion-weighted MRI (dMRI) data.

QuPath (0.5.0)

QuPath is open source software for bioimage analysis. It is often used for digital pathology applications because it offers a powerful set of tools for working with whole slide images - but it can be applied to lots of other kinds of image as well.

rapidtide (2.8.2)

Rapidtide is a suite of Python programs used to model, characterize, visualize, and remove time varying, physiological blood signals from fMRI and fNIRS datasets. The primary workhorses of the package are the rapidtide program, which characterizes bulk blood flow, and happy, which focusses on the cardiac band.

SAMsrcV3 (20180713-c5e1042)

Synthetic Aperture Magnetometry - The SANsrcV3 suite implements the latest advances in MEG source localization.

SimNIBS (4.0)

SimNIBS is a free software package for the Simulation of Non-invasive Brain Stimulation. It allows for realistic calculations of the electric field induced by transcranial magnetic stimulation (TMS) and transcranial direct current stimulation (tDCS).

smriprep (0.8.3)

Structural MRI PREProcessing (sMRIPrep) workflows for NIPreps (NeuroImaging PREProcessing tools).

SPHIRE (1.4)

SPHIRE (SPARX for High-Resolution Electron Microscopy) is an open-source, user-friendly software suite for the semi-automated processing of single particle electron cryo-microscopy (cryo-EM) data. It allows fast and reproducible structure determination from cryo-EM images.

spm12 (7870)

The (S)tatistical (P)ara(M)etric application analyzes brain imaging data.

tedana (24.0.0)

Tedana is an application to denoise multi-echo fMRI datasets

tomotwin (0.8.0)

TomoTwin - a deep metric learning based particle picking procedure for cryo-ET

TORTOISE (3.2.0)

(Tolerably Obsessive Registration and Tensor Optimization Indolent Software Ensemble) The TORTOISE software package is for processing diffusion MRI data.

tortoisev4 (current)

TORTOISE (Tolerably Obsessive registration and Tensor Optimization Indolent Software Ensemble) is a suite of programs for for pre-processing, post-processing and analyzing diffusion MRI data

tractseg (1.7.1)

Tool for white matter bundle segmentation from Diffusion MRI.

xcpengine (1.2.4)

xpcEngine performs denoising and estimation of Functional Connectivity on fMRI datasets

xcp_d (0.5.0)

xcp_d is a postprocessing and noise regression pipeline for fMRI datasets (can use output from fmriprep and nibabies).

Linkage/Phylogenetics

AdmixTools (7.0.2)

ADMIXTOOLS is a software package that supports formal tests of whether admixture occurred, and makes it possible to infer admixture proportions and dates.

admixture (1.3.0)

ADMIXTURE is a software tool for maximum likelihood estimation of individual ancestries from multilocus SNP genotype datasets. It uses the same statistical model as STRUCTURE but calculates estimates much more rapidly using a fast numerical optimization algorithm.

arcashla (0.5.0)

high resolution HLA typing from RNA seq

Beagle (5.4_01Mar24)

Beagle is a package for imputing genotypes, inferring haplotype phase, and performing genetic association analysis. BEAGLE is designed to analyze large-scale data sets with hundreds of thousands of markers genotyped on thousands of samples.

beagle-lib (4.0)

BEAGLE is a high-performance library that can perform the core calculations at the heart of most Bayesian and Maximum Likelihood phylogenetics packages. It can make use of highly-parallel processors such as those in graphics cards (GPUs) found in many PCs.
module name: beagle-lib

BEAST (1.10.5,2.6.2)

BEAST (Bayesian Evolutionary Analysis Sampling Trees) is a cross-platform program for Bayesian MCMC analysis of molecular sequences.

biobakery_workflows (3.1)

CD-HIT (4.6.8)

CD-HIT is a very widely used program for clustering and comparing protein or nucleotide sequences.

eigensoft (8.0.0)

The EIGENSOFT package combines functionality from population genetics methods and EIGENSTRAT stratification correction method.

famsa (2.2.2)

Progressive algorithm for large-scale multiple sequence alignments

FastQTL (2.184)

In order to discover quantitative trait loci (QTLs), multi-dimensional genomic datasets combining DNA-seq and ChiP-/RNA-seq require methods that rapidly correlate tens of thousands of molecular phenotypes with millions of genetic variants while appropriately controlling for multiple testing. FastQTL implements a popular cis-QTL mapping strategy in a user- and cluster-friendly tool. FastQTL also proposes an efficient permutation procedure to control for multiple testing.

FastTree (2.1.11)

FastTree infers approximately-maximum-likelihood phylogenetic trees from alignments of nucleotide or protein sequences. FastTree can handle alignments with up to a million of sequences in a reasonable amount of time and memory. For large alignments, FastTree is 100-1,000 times faster than PhyML 3.0 or RAxML 7.

GCTA (1.94.1)

GCTA (Genome-wide Complex Trait Analysis) is designed to estimate the proportion of phenotypic variance explained by genome- or chromosome-wide SNPs for complex traits.

gtdb-tk (2.3.2)

GTDB-Tk is a software toolkit for assigning objective taxonomic classifications to bacterial and archaeal genomes based on the Genome Database Taxonomy

gubbins (2.3.4)

Gubbins is an algorithm that iteratively identifies loci containing elevated densities of base substitions while concurrently constructing a phylogeny based on the putative point mutations outside of these regions.

hyphy (2.5.29)

HyPhy (Hypothesis Testing using Phylogenies) is an open-source software package for the analysis of genetic sequences (in particular the inference of natural selection) using techniques in phylogenetics, molecular evolution, and machine learning.

iqtree (2.2.0.5)

Efficient phylogenomic software by maximum likelihood

king (2.2.7)

KING is a toolset to explore genotype data from a genome-wide association study (GWAS) or a sequencing project. KING can be used to check family relationship and flag pedigree errors by estimating kinship coefficients and inferring IBD segments for all pairwise relationships.

LTSOFT (4.0)

The LTSOFT application implements a new approach to using information from known associated variants when conducting disease association studies. The approach is based in part on the classical technique of liability threshold modeling and performs estimation of model parameters for each known variant while accounting for the published disease prevalence from the epidemiological literature.

Madeline2 (2.0)

The Madeline 2.0 Pedigree Drawing Engine (PDE) is a pedigree drawing program for use in linkage and family-based association studies. The program is designed to handle large and complex pedigrees with an emphasis on readability and aesthetics.

MEGA (11.0.10)

MEGA, Molecular Evolutionary Genetics Analysis, is a software suite for analyzing DNA and protein sequence data from species and populations

mega2 (6.0.0)

Mega2 is a data-handling program for facilitating genetic linkage and association analyses.

merlin (1.1.2)

MERLIN uses sparse trees to represent gene flow in pedigrees and is one of the fastest pedigree analysis packages around.

metaWRAP (1.3.2)

MetaWRAP is a modular pipeline for shotgun metagenomic data analysis. It deploys state-of-the-art software to handle metagenomic data processing starting from raw sequencing reads and ending in metagenomic bins and their analysis. It includes hybrid algorithms that leverage the strengths of a variety of software to extract and refine high-quality bins from metagenomic data through bin consolidation and reassembly.

mothur (1.48.0)

mothur is a tool for analyzing 16S rRNA gene sequences generated on multiple platforms as part of microbial ecology projects.

MrBayes (3.2.7)

MrBayes is a program for Bayesian inference and model choice across a wide range of phylogenetic and evolutionary models. MrBayes uses Markov chain Monte Carlo (MCMC) methods to estimate the posterior distribution of model parameters.

ohana (current)

Ohana is a suite of software for analyzing population structure and admixture history using unsupervised learning methods.

oma (2.6.0)

OMA standalone is a standalone package that can infer orthologs using the OMA algorithm on custom genomes.

PAML (4.10.7)

A package of programs for phylogenetic analyses of DNA and protein sequences using maximum likelihood.

phase (2.1.1)

infers haplotypes from population genotype data

Phylip (3.698)

Phylip is a package of programs for inferring phylogenies (evolutionary trees). Includes methods for parsimony, distance matrix and likelihood methods.

pplacer (1.1)

Pplacer places query sequences on a fixed reference phylogenetic tree to maximize phylogenetic likelihood or posterior probability according to a reference alignment. Pplacer is designed to be fast, to give useful information about uncertainty, and to offer advanced visualization and downstream analysis.

PRIMUS (1.9.0)

PRIMUS, Pedigree Reconstruction and Identification of a Maximum Unrelated Set, is used to read genome-wide IBD estimates and identify a maximum unrelated set and pedigree reconstruction.

PRSice (2.3.3)

PRSice is a Polygenic Risk Score software for calculating, applying, evaluating and plotting the results of polygenic risk scores (PRS) analyses.

QIIME (2023.5)

QIIME is an open source software package for comparison and analysis of microbial communities, primarily based on high-throughput amplicon sequencing data (such as SSU rRNA) generated on a variety of platforms, but also supporting analysis of other types of data (such as shotgun metagenomic data).

RAxML (8.2.12)

RAxML (randomized axelerated maximum likelihood) is a sequential and parallel program for inference of large phylogenies with maximum likelihood (ML).

raxml-ng (1.2.0)

RAxML-NG is a phylogenetic tree inference tool which uses maximum-likelihood (ML) optimality criterion. Its search heuristic is based on iteratively performing a series of Subtree Pruning and Regrafting (SPR) moves, which allows to quickly navigate to the best-known ML tree. Successor to raxml.

RevBayes (1.2.1)

Bayesian phylogenetic inference using probabilistic graphical models and an interpreted language

scDRS (1.02)

The scDRS application implements an approach that links scRNA-seq with polygenic disease risk at single-cell resolution, independent of annotated cell types. scDRS identifies cells exhibiting excess expression across disease-associated genes implicated by genome-wide association studies (GWASs). Genes whose expression was correlated with the scDRS score across cells (reflecting coexpression with GWAS disease-associated genes) were strongly enriched for gold-standard drug target and Mendelian disease genes.

shapeit (5.1.0)

SHAPEIT is a fast and accurate haplotype inference software

SMR (1.3.1)

SMR integrates summary-level data from GWAS with data from expression quantitative trait locus (eQTL) studies to identify genes whose expression levels are associated with a complex trait because of pleiotropy. It implements methods to test for pleiotropic association between the expression level of a gene and a complex trait of interest using summary-level data from GWAS and expression quantitative trait loci (eQTL) studies (Zhu et al. 2016 Nat Genet).

solar (9.0.1)

SOLAR-Eclipse is an extensive, flexible software package for genetic variance components analysis, including linkage analysis, quantitative genetic analysis, SNP association analysis (QTN and QTLD), and covariate screening.

sumtrees (4.5.2)

Summarize non-parameteric bootstrap or Bayesian posterior probability support for splits or clades on phylogenetic trees.

TensorQTL (1.0.9)

treemix (1.13)

TreeMix is a method for inferring the patterns of population splits and mixtures in the history of a set of populations.

treetime (0.11.1)

TreeTime provides routines for ancestral sequence reconstruction and inference of molecular-clock phylogenies.

Mass Spectrometry

diann (1.8.1)

DIA-NN - a universal software for data-independent acquisition (DIA) proteomics data processing

fragpipe (20.0)

FragPipe is a Java Graphical User Interface (GUI) and CLI workflow tool for a suite of computational tools enabling comprehensive analysis of mass spectrometry-based proteomics data. It is powered by MSFragger.

ionquant (1.9.8)

IonQuant is a fast and comprehensive tool for MS1 precursor intensity-based quantification for timsTOF PASEF DDA and non-timsTOF (e.g., Orbitrap) data. It enables label-free quantification with false discovery (FDR) controlled match-between-runs (MBR). It can also be used for quantification in labelling-based experiments such as those involving SILAC, dimethyl, or similar labelling strategies. IonQuant is available as part of FragPipe (recommended option), but can also be run as a command-line tool.

Mascot (2.8)

The Mascot search engine uses mass spectrometry data to identify proteins from primary sequence databases. Mascot searches can be run directly on the NIH Mascot server at https://biospec.nih.gov, or by using the Mascot daemon on your own desktop PC.

maxquant (2.4.9.0)

MaxQuant is a quantitative proteomics software package designed for analyzing large mass-spectrometric data sets. It is specifically aimed at high-resolution MS data. Several labeling techniques as well as label-free quantification are supported.

msfragger (3.8)

An ultrafast database search tool for peptide identification in mass spectrometry-based proteomics.

percolator (rel-3-06-05)

A software for postprocessing of shotgun proteomics data

philosopher (5.1.0)

Philosopher is fast, easy-to-use, scalable, and versatile data analysis software for mass spectrometry-based proteomics. Philosopher is dependency-free and can analyze both traditional database searches and open searches for post-translational modification (PTM) discovery.

Mathematical/Statistics

ANTs (2.4.2)

cmdstan (2.30.1)

Command line interface to stan

dentist (1.1.0.0)

DENTIST (Detecting Errors iN analyses of summary staTISTics) is a quality control (QC) tool for summary-level data from genome-wide association studies (GWASs). It leverages the difference between the observed GWAS test-statistic of a variant and its predicted value (using the neighbouring variants and linkage equilibrium (LD) data from a reference panel) to remove problematic variants.

FEBio (4.4)

FEBio software suite implement a nonlinear implicit finite element (FE) framework, designed specifically for analysis in computational solid biomechanics. FEBio offers modeling scenarios, constitutive models, and boundary conditions, which are relevant to numerous applications in biomechanics. The open-source FEBio software is written in C++, with particular attention to scalar and parallel performance on modern computer architectures.

GAUSS (10)

The GAUSS Mathematical and Statistical System is an easy-to-use data analysis environment based on the fast and powerful GAUSS Matrix Programming Language designed for computationally intensive tasks.

gurobi (9.0.0)

Gurobi is a mathematical optimization solver. It is a commercial product developed by gurobi.com. On Biowulf, Gurobi is licensed for use by the members of the CDSL_Gurobi_users group only. It is installed in /data/CDSL_Gurobi_users and is not accessible by any other users. A token license server, running on Biowulf, manages the Gurobi license.

IDL/ENVI (9.0/6.0)

IDL and ENVI are a complete computing environment for the interactive analysis and visualization of data. IDL integrates an array-oriented language with mathematical analysis and graphical display techniques. ENVI is designed for extracting information from geospatial and medical imagery.

lifelines (0.27.4)

Lifelines is a complete survival analysis library, written in pure Python. It has the benefits of easy installation, internal plotting methods, simple and intuitive API handles right, left and interval censored data. It also contains the most popular parametric, semi-parametric and non-parametric models.

Mathematica (13.3.0)

Mathematica is an interactive system for doing mathematical computation. It performs numerical, symbolic and graphical computations, and incorporates a high-level programming language.

Matlab (2023a)

MATLAB is an interactive software package for scientific and engineering numeric computation. MATLAB integrates numerical analysis, matrix computation, signal processing, and graphics in an environment where problems and solutions are expressed just as they are written mathematically.

MCL (14-137)

MCL implements Markov cluster algorithm. Among its applications is the assignment of proteins into families based on precomputed sequence similarity information. This approach does not suffer from the problems that normally hinder other protein sequence clustering algorithms, such as the presence of multi-domain proteins, promiscuous domains and fragmented proteins.

Meep (1.17.1)

Meep (or MEEP) is a free finite-difference time-domain (FDTD) simulation software package developed at MIT to model electromagnetic systems, along with the MPB eigenmode package.

Octave (5.1.0)

GNU Octave is a high-level language, primarily intended for numerical computations. It provides a convenient command line interface for solving linear and nonlinear problems numerically, and for performing other numerical experiments using a language that is mostly compatible with Matlab.

PEER (1.3)

PEER stands for probabilistic estimation of expression residuals. It is a collection of Bayesian approaches to infer hidden determinants and their effects from gene expression profiles using factor analysis methods.

R (4.3.2)

R (the R Project) is a language and environment for statistical computing and graphics. R is similar to S, and provides a wide variety of statistical and graphical techniques (linear and nonlinear modelling, statistical tests, time series analysis, classification, clustering, ...).

Rstudio (2023.03.1-446)

RStudio is a set of integrated tools designed to help you be more productive with R. It includes a console, syntax-highlighting editor that supports direct code execution, as well as tools for plotting, history, debugging and workspace management.

SAS (9.4M7)

Base SAS provides a scalable, integrated software environment specially designed for data access, transformation and reporting.

Molecular Modeling/Graphics

AutodockCrankprep (1.0)

"AutoDock CrankPep or ADCP is an AutoDock docking engine specialized for docking peptides. It combines technology form the protein folding filed with an efficient representation of a rigid receptor as affinity grids to fold the peptide in the context of the energy landscape created by the receptor."

blender (2.82)

Blender is the free and open source 3D creation suite. Blender on Biowulf is meant for command-line rendering.

chap (0.9.1)

CHAP is a tool for the functional annotation of ion channel structures

Chimera (1.16.0)

Chimera is a highly extensible program for interactive visualization and analysis of molecular structures and related data, including density maps, supramolecular assemblies, sequence alignments, docking results, trajectories, and conformational ensembles.

chromeister (1.5.a)

Chromeister is an ultra fast, heuristic approach to detect conserved signals in extremely large pairwise genome comparisons.

Coot (0.9.8.92)

Coot is for macromolecular model building, model completion and validation, particularly suitable for protein modelling using X-ray data.

Cytoscape (3.9.1)

Cytoscape is an open source software platform for visualizing molecular interaction networks and biological pathways and integrating these networks with annotations, gene expression profiles and other state data.

iqmol (2.14.0)

free open-source molecular editor and visualization package. It offers a range of features including a molecular editor, surface generation (orbitals and densities) and animations (vibrational modes and reaction pathways).

lammps (29Oct20)

LAMMPS is a classical molecular dynamics code, and an acronym for Large-scale Atomic/Molecular Massively Parallel Simulator. It runs on a variety of different computer systems, including single processor systems, distributed-memory machines with MPI, and GPU and Xeon Phi systems. LAMMPS is open source software, released under the GNU General Public License.

mosaics (1.0.0)

A collection of tools for characterizing membrane structure and dynamics within simulated trajectories of molecular systems.

OpenBabel (3.1.1)

Open Babel is a chemical toolbox designed to speak the many languages of chemical data.

pdb2pqr (3.6.1)

Automates many of the common tasks of preparing structures for continuum solvation calculations as well as many other types of biomolecular structure modeling

posefilter (619eca51)

PoseFilter is a PyMOL plugin and assists in the analysis of docked ligands through identification of unique oligomeric poses by utilizing RMSD and interaction fingerprint analysis methods.

Psi4 (1.6.1)

Psi4 is an ab-initio electronic structure code that supports various methods for calculating energies and gradients of molecular systems.

Rosetta (2023.45)

The Rosetta++ software suite can perform de novo protein structure predictions, identify low free energy sequences for target protein backbones, predict the structure of a protein-protein complex from the individual structures of the monomer components, incorporate NMR data into the basic Rosetta protocol to accelerate the process of NMR structure prediction, and more...

Schrodinger (2023.1)

A limited number of Schrödinger applications are available on the Biowulf cluster through the Molecular Modeling Interest Group. Most are available through the Maestro GUI.

vmd (1.9.3)

VMD is a molecular visualization program for displaying, animating, and analyzing large biomolecular systems using 3-D graphics and built-in scripting. To use, type vmd at the prompt.

Sequence Analysis

advntr (1.4.1)

a tool for genotyping Variable Number Tandem Repeats (VNTR) from sequence data

agat (1.2.0)

Another Gtf/Gff Analysis Toolkit

alleleCount (4.2.1)

Calculates genotype frequencies of a SNPMatrix. This component tests each SNP for its Hardy-Weinberg equilibrium. If there are NA values, the frequencies of missing value per sample in the input file are calculated.

amr (3.12.8)

AMRFinderPlus - Identify AMR genes and point mutations, and virulence and stress resistance genes in assembled bacterial nucleotide and protein sequence.

ANNOgesic (1.0.22; 1.1.14)

Processing and integrating RNA-Seq data in order to generate high-resolution annotations is challenging, time consuming and requires numerous different steps. ANNOgesic is a powerful and modular pipeline that provides the required analyses and simplifies RNA-Seq-based bacterial and archaeal genome annotation. It predicts and annotates numerous features, including small non-coding RNAs, with high precision.

antiSMASH (7.1.0)

antiSMASH allows the rapid genome-wide identification, annotation and analysis of secondary metabolite biosynthesis gene clusters in bacterial and fungal genomes.

anvio (7.1)

Anvi’o is an open-source, community-driven analysis and visualization platform for microbial ‘omics. It brings together many aspects of today’s cutting-edge strategies including genomics, metagenomics, metatranscriptomics, pangenomics, metapangenomics, phylogenomics, and microbial population genetics in an integrated and easy-to-use fashion through extensive interactive visualization capabilities.

arcashla (0.5.0)

high resolution HLA typing from RNA seq

arriba (2.3.0)

Arriba identifies gene fusions in RNA-Seq data. It also can detect other structural variants in genomic data, such as intron duplications and gene truncations.

augustus (3.4.0)

AUGUSTUS is a program that predicts genes in eukaryotic genomic sequences.

bakta (1.9.1)

Rapid & standardized annotation of bacterial genomes, MAGs & plasmids

Bartender (1.1)

biobakery_workflows (3.1)

birdsuite (1.5.5)

Birdsuite is a four-stage analytical framework instantiated in software for deriving integrated and mutually consistent copy number and SNP genotypes. The method sequentially assigns copy number across regions of common copy number polymorphisms (CNPs), calls genotypes of SNPs, identifies rare CNVs via a hidden Markov model (HMM), and generates an integrated sequence and copy number genotype at every locus.

Blast (2.15.0+)

NCBI's well-known sequence database searching program which compares a nucleotide or protein query sequence against all sequences in a database.

blat (3.5)

BLAT is a DNA/Protein Sequence Analysis program that is designed to quickly find sequences of 95% and greater similarity of length 40 bases or more.

BOLT-LMM (2.4.1)

The BOLT-LMM algorithm computes statistics for testing association between phenotype and genotypes using a linear mixed model (LMM)

bracken (2.8)

Bracken is a companion program to Kraken 1, KrakenUniq, or Kraken 2 While Kraken classifies reads to multiple levels in the taxonomic tree, Bracken allows estimation of abundance at a single level using those classifications (e.g. Bracken can estimate abundance of species within a sample).

bwa-mem2 (2.2.1)

The next version of the bwa-mem algorithm in bwa.

cactus (2.6.4)

Cactus is a reference-free whole-genome multiple alignment program.

CADD (1.6.post1)

CAVIAR (2.2)

CAVIAR (CAusal Variants Identication in Associated Regions) is a statistical framework that quantifies the probability of each variant to be causal while allowing with arbitrary number of causal variants

cell2location (0.1.3)

Cell2location: Comprehensive mapping of tissue cell architecture via integrated single cell and spatial transcriptomics (cell2location model)

cellsnp-lite (1.2.2)

Efficient genotyping bi-allelic SNPs on single cells

cgpbigwig (1.6.0)

BigWig manpulation tools using libBigWig and htslib

checkm2 (1.0.2)

Rapid assessment of genome bin quality using machine learning.

CHESS (0.3.7)

The CHESS (Comparison of Hi-C Experiments using Structural Similarity) application implements an algorithm for the comparison of chromatin contact maps and automatic differential feature extraction.

chipseq_pipeline (2.1.6)

AQUAS Transcription Factor and Histone ChIP-Seq processing pipeline. The AQUAS pipeline is based off the ENCODE (phase-3) transcription factor and histone ChIP-seq pipeline specifications (by Anshul Kundaje)

CIRCLE-seq (1.1)

CIRCLE-seq is a tool ifor analysis of circularization for in vitro reporting of cleavage effects by sequencing. It is a highly sensitive, sequencing-efficient in vitro screening strategy that outperforms existing cell-based or biochemical approaches for identifying CRIScrisPR–Cas9 genome-wide off-target mutations. It can be practiced using widely accessible next-generation sequencing technology and does not require reference genome sequences. Importantly, CIRCLEcircle-seq can be used to identify off-target mutations associated with cell-type-specific single-nucleotide polymorphisms, demonstrating the feasibility and importance of generating personalized specificity profiles.

circos (0.69-9)

Clair3 (1.0.4)

cloops2 (0.0.4)

cLoops2 is an enhanced and elegant flexible peak/loop/domain -calling and analysis tool for 1D/3D genomic data.

clustalo (1.2.4)

Clustal-Omega is a general purpose multiple sequence alignment (MSA) program for proteins and DNA/RNA. It produces high quality MSAs and is capable of handling data-sets of hundreds of thousands of sequences in reasonable time.

ClustalW (2.1)

ClustalW is a general-purpose multiple alignment program for DNA or protein sequences.

conifer (0.2.2)

CoNIFER (copy number inference from exome reads) uses exome sequencing data to find copy number variants (CNVs) and genotype the copy-number of duplicated genes. It can reliably be used to discover disruptive genic CNVs missed by standard approaches and should have broad application in human genetic studies of disease.

cooltools (0.5.4)

Cooltools is a suite of computational tools that enables flexible, scalable, and reproducible analysis of high-resolution contact frequency data. Cooltools leverages the widely-adopted cooler format which handles storage and access for high-resolution datasets. Cooltools provides a paired command line interface and Python application programming interface, which respectively facilitate workflows on high-performance computing clusters and in interactive analysis environments.

coverageMaster (20220706)

CoverageMaster (CoM) is a copy number variation (CNV) calling algorithm i ased on depth-of-coverage maps designed to detect CNVs of any size i n exome [whole exome sequencing (WES)] and genome [whole genome sequencing (WGS)] data. The core of the algorithm is the compression of sequencing coverage data in a multiscale Wavelet space and the analysis through an iterative Hidden Markov Model.

CutRunTools (20200629 )

CutRunTools is a flexible, general pipeline for facilitating the identification of chromatin-associated protein binding and genomic footprinting analysis from antibody-targeted CutRun primary cleavage data. CutRunTools extracts endonuclease cut site information from sequences of short read fragments and produces single-locus binding estimates, aggregate motif footprints, and informative visualizations to support the high-resolution mapping capability of CutRun.

CytoSig (0.1)

CytoSig is a data-driven infrastructure hosted by the National Cancer Institute. CytoSig includes both a database of target genes modulated by cytokines and a predictive model of cytokine signaling activity and regulatory cascade from transcriptomic profiles.

CytoSPACE (1.0.6)

CytoSPACE implements an optimization method for mapping individual cells from a single-cell RNA sequencing atlas to spatial expression profiles. Across diverse platforms and tissue types, it outperforms previous methods with respect to noise tolerance and accuracy, enabling tissue cartography at single-cell resolution.

CytoTRACE (0.3.3)

CytoTRACE (Cellular (Cyto) Trajectory Reconstruction Analysis using gene Counts and Expression) is a computational method that predicts the differentiation state of cells from single-cell RNA-sequencing data. CytoTRACE leverages a simple, yet robust, determinant of developmental potential—the number of detectably expressed genes per cell, or gene counts. We have validated CytoTRACE on ~150K single-cell transcriptomes spanning 315 cell phenotypes, 52 lineages, 14 tissue types, 9 scRNA-seq platforms, and 5 species.

decifer (2.1.3)

DeCiFer is an algorithm that simultaneously selects mutation multiplicities and clusters somatic single-nucleotide variants (SNVs) by their corresponding descendant cell fractions (DCF), a statistic that quantifies the proportion of cells which acquired the SNV or whose ancestors acquired the SNV. DCF is related to the commonly used cancer cell fraction (CCF) but further accounts for SNVs which are lost due to deleterious somatic copy-number aberrations (CNAs), identifying clusters of SNVs which occur in the same phylogenetic branch of tumour evolution.

deeploc (2.0)

DeepLoc 2.0 predicts the subcellular localization(s) of eukaryotic proteins. It is is able to predict one or more localizations for any given protein.

deepsea (0.94c)

DeepSEA is a deep learning-based algorithmic framework for predicting the chromatin effects of sequence alterations with single nucleotide sensitivity. DeepSEA can accurately predict the epigenetic state of a sequence, including transcription factors binding, DNase I sensitivities and histone marks in multiple cell types, and further utilize this capability to predict the chromatin effects of sequence variants and prioritize regulatory variants.

denovogear (1.1.1)

DeNovoGear is az software for analyzing de novo mutations from familial and somatic tissue sequencing data. It uses likelihood-based error modeling to reduce the false positive rate of mutation discovery in exome analysis and fragment information to identify the parental origin of germ-line mutations. DeNovoGear has been used on human whole-genome sequencing data to produce a set of predicted de novo insertion and/or deletion (indel) mutations.

diamond (2.0.15)

DIAMOND is a new high-throughput program for aligning DNA reads or protein sequences against a protein reference database such as NR, at up to 20,000 times the speed of BLAST, with high sensitivity.

DNAWorks (3.2.4)

DNAWorks is a computer program that automates the design of oligonucleotides for gene synthesis by PCR-based gene assembly. The program requires simple input information: an amino acid sequence of the target protein or a DNA sequence, and a desired annealing temperature. It is a web-based tool available at https://hpcwebapps.cit.nih.gov/dnaworks/.

drep (3.2.2)

dRep is a python program for rapidly comparing large numbers of genomes. dRep can also "de-replicate" a genome set by identifying groups of highly similar genomes and choosing the best representative genome for each genome set.

duphold (0.2.3)

Duphold: scalable, depth-based annotation and curation of high-confidence structural variant calls

dyno (20220906)

dyno is a meta package that installs several other packages from the dynvers (https://github.com/dynverse). It comprises a set of R packages to construct and interpret single-cell trajectories.

Eagle (2.4)

Eagle performs a reference-based haplotype phasing. It attains high accuracy across a broad range of cohort sizes by efficiently leveraging information from large external reference panels (such as the Haplotype Reference onsortium; HRC) using a new data structure based on the positional Burrows-Wheeler transform.

epic2 (0.0.52)

epic2 is an ultraperformant reimplementation of SICER. It focuses on speed, low memory overhead and ease of use.

exonerate (2.4.0)

Exonerate is a generic tool for pairwise sequence comparison. It allows you to align sequences using a many alignment models, either exhaustive dynamic programming or a variety of heuristics.

ExpansionHunter (5.0.0)

Expansion Hunter: a tool for estimating repeat sizes. There are a number of regions in the human genome consisting of repetitions of short unit sequence (commonly a trimer). Such repeat regions can expand to a size much larger than the read length and thereby cause a disease. Expansion Hunter aims to estimate sizes of such repeats by performing a targeted search through a BAM/CRAM file for reads that span, flank, and are fully contained in each repeat.

ExpansionHunterDenovo (0.9.0)

ExpansionHunter Denovo (EHdn) is a suite of tools for detecting novel expansions of short tandem repeats (STRs). EHdn is intended for analysis of a collection of BAM/CRAM files containing alignments of short (100-200bp) reads.

fastp (0.23.2)

A tool designed to provide fast all-in-one preprocessing for FastQ files. This tool is developed in C++ with multithreading supported to afford high performance.

fastq_demux (20230713)

fastq_demux is a simple program to demultiplex a FASTQ file or a pair of FASTQ files based on the barcodes present in the FASTQ headers.

FitHiChIP (9.1; 10.0; 11.0)

FitHiChIP is a computational method for identifying chromatin contacts among regulatory regions such as enhancers and promoters from HiChIP/PLAC-seq data. FitHiChIP jointly models the non-uniform coverage and genomic distance scaling of HiChIP data, captures previously validated enhancer interactions for several genes including MYC and TP53, and recovers contacts genome-wide that are supported by ChIA-PET, promoter capture Hi-C and Hi-C data.

FLAIR (1.6.1)

FLAIR (Full-Length Alternative Isoform analysis of RNA) is a workflow leveraging the full-length transcript sequencing data that nanopore affords. It uses multiple alignment steps and splice site filters to increase confidence in the set of isoforms defined from noisy data.

focus (0.802)

FOCUS (Fine-mapping Of CaUsal gene Sets) is software to fine-map transcriptome-wide association study statistics at genomic risk regions. The software takes as input summary GWAS data along with eQTL weights and outputs a credible set of genes to explain observed genomic risk.

freepsi (0.3)

FreePSI is a new method for genome-wide percent spliced in (PSI) estimation that requires neither a reference transcriptome (hence, transcriptome-free) nor the mapping of RNA-seq reads (hence, alignment-free). The first attribute allows FreePSI to work effectively when a high quality reference transcriptome is unavailable and the second not only helps make FreePSI more efficient, it also eliminates the necessity of dealing with multi-reads.

funannotate (1.8.15)

Funannotate is a genome prediction, annotation, and comparison software package. It was originally written to annotate fungal genomes (small eukaryotes ~ 30 Mb genomes), but has evolved over time to accomodate larger genomes.

FuSeq (1.1.4)

FuSeq is a software for discovering fusion genes from paired-end RNA sequencing data. It implements a fast and accurate method to discover fusion genes based on quasi-mapping to quickly map the reads, extract initial candidates from split reads and fusion equivalence classes of mapped reads, and finally apply multiple filters and statistical tests to get the final candidates.

fusioninspector (2.8.0)

In silico Validation of Fusion Transcript Predictions

GEM1 (1.4.3)

GEM (Gene-Environment interaction analysis for Millions of samples) is a software program for large-scale gene-environment interaction testing in samples from unrelated individuals. It enables genome-wide association studies in up to millions of samples while allowing for multiple exposures, control for genotype-covariate interactions, and robust inference.

genometools (1.6.5)

collection of bioinformatic tools

gfatools (0.5)

gfatools is a set of tools for manipulating sequence graphs in the GFA or the rGFA format. It has implemented parsing, subgraph and conversion to FASTA/BED.

gffcompare (0.12.6)

gffcompare can be used to compare and evaluate the accuracy of RNA-Seq transcript assemblers (Cufflinks, Stringtie). It can collapse (merge) duplicate transcripts from multiple GTF/GFF3 files (e.g. resulted from assembly of different samples) and classify transcripts from one or multiple GTF/GFF3 files as they relate to reference transcripts provided in a annotation file (also in GTF/GFF3 format).

GimmeMotifs (0.17.2)

GimmeMotifs is a pipeline for transcription factor motif analysis written in Python. It incorporates an ensemble of computational tools to predict motifs de novo from ChIP-sequencing data. Similar redundant motifs are compared using the weighted information content similarity score and clustered using an iterative procedure. A comprehensive output report is generated with several different evaluation metrics to compare and evaluate the results.

GLIMPSE (1.1.1)

GLIMPSE is a phasing and imputation method for large-scale low-coverage sequencing studies. It perform accurate imputed genotype calls and outperforms SNP arrays.

glnexus (1.4.1)

Joint variant calling for large cohort sequencing

graphmap (0.5.2)

GraphMap is a highly sensitive and accurate mapper for long, error-prone reads. It offers a number of valuable features, such as mapping position agnostic to alignment parameters, high sensitivity and precision, handling circular genomes, meaningful mapping quality, various alignment strategies, and more.

gtdb-tk (2.3.2)

GTDB-Tk is a software toolkit for assigning objective taxonomic classifications to bacterial and archaeal genomes based on the Genome Database Taxonomy

gutSMASH (0-20210610)

gutSMASH is a tool that has been developed to systematically evaluate the metabolic potential of anaerobic bacteria in the gut by predicting both known and novel anaerobic metabolic gene clusters from the gut microbiome.

Hap-IBD (20221201)

The hap-ibd program detects identity-by-descent (IBD) segments and homozygosity-by-descent (HBD) segments in phased genotype data. The hap-ibd program can analyze data sets with hundreds of thousands of samples.

HemTools (20230512)

HemTools is a collection of NGS pipelines and bioinformatic analysis tools. It includes tools for data visualization, motif analysis, integrative analysis, bioinformatica analysis, differential analysis, CRISPR analysis, and more.

hhsuite (3.3.0)

The HH-suite is an open-source software package for sensitive protein sequence searching based on the pairwise alignment of hidden Markov models (HMMs).

hicmaptools (20230303)

hicmaptools is a command line tool to access HiC maps. The complete program provides multi-query modes and analysis tools.

hipstr (0.7)

Tool for genotyping short tandom repeats from Illumina sequencing data

HMMER (3.4)

HMMER is used for searching sequence databases for homologs of protein sequences, and for making protein sequence alignments. It implements methods using probabilistic models called "profile hidden Markov models" (profile HMMs). Compared to BLAST, FASTA, and other sequence alignment and database search tools based on older scoring methodology, HMMER aims to be significantly more accurate and more able to detect remote homologs because of the strength of its underlying mathematical models.

HTGTSrep (9fe74ff)

A pipeline for comprehensive analysis of HTGTS-Rep-seq.

iDiffIR (20220121)

iDiffIR is a tool for identifying differential IR from RNA-seq data. It accepts any sorted, indexed BAM file for single- or paired-end reads.

igblast (1.21.0)

IgBlast is a sequence analysis tool for immunoglobulin variable domains.

ImReP (0.8)

ImReP is a novel computational method for rapid and accurate profiling of the adaptive immune repertoire from regular RNA-Seq data. It is able to efficiently extract TCR- and BCR-derived reads from RNA-Seq data. ImReP can also accurately assemble the complementary determining regions 3 (CDR3s), the most variable regions of B and T cell receptors, and determine their antigen specificity.

infernal (1.1.5)

Package for searching DNA sequence databases for RNA structure and sequence similarities

interproscan (5.63-95.0)

InterProScan is the software package that allows sequences (protein and nucleic) to be scanned against InterPro's signatures. Signatures are predictive models, provided by several different databases, that make up the InterPro consortium.

InterVar (2.1.2, 2.1.3)

intogen (2023.1)

Identifies cancer genes and pinpoints their putative mechanism of action across tumor types

iVar (1.3.1)

JAX-CNV (20240208)

JAX-CNV implemens an algorithm for copy number variant (CNV) calling from the whole-genome sequencing (WGS) data. On testing data, it demonstrated ~7-fold increase in the number of detected CNVs as compared to the chromosomal microarray assay (CMA) for clinical diagnosis.

jellyfish (2.3.0)

Jellyfish is a tool for fast, memory-efficient counting of k-mers in DNA.

kallisto (0.50.1)

kallisto is a program for quantifying abundances of transcripts from RNA-Seq data, or more generally of target sequences using high-throughput sequencing reads. It is based on the novel idea of pseudoalignment for rapidly determining the compatibility of reads with targets, without the need for alignment.

LAST (1519)

LAST is designed for moderately large data (e.g. genomes, DNA reads, proteomes). It's especially geared toward:

Finding rearrangements and recombinations (last-split)
Finding DNA-versus-protein related regions, especially protein fossils.
Unusual data, e.g. AT-rich DNA, because it can fit parameters to the data and calculate significance.
Sensitive DNA-DNA search, due to fitting, sensitive seeding, and calculating significance.

LASTZ (1.04.03)

LASTZ is a tool for (1) aligning two DNA sequences, and (2) inferring appropriate scoring parameters automatically. LASTZ is a drop-in replacement for BLASTZ, and is backward compatible with BLASTZ's command-line syntax.

ldsc (1.0.1-20200724)

ldsc is a command line tool for estimating heritability and genetic correlation from GWAS summary statistics. ldsc also computes LD Scores.

liger2liger (20230901)

Liger2LiGer is a Nanopore chimera splitting/detection tool. Automated end-to-end chimera detection and dataset evaluation starting from fastq.

LJA (0.1)

m-tools (20210208)

A selection of software developed at the Australian Centre for Ecogenomics to aid in the analysis of metagenomic datasets: unitem, refinem, checkm, graftm, groopm, bamm, finishm, singlem, orfm, and coverm

ma-focus (0.802)

magma (1.10)

MAGMA is a tool for gene analysis and generalized gene-set analysis of GWAS data. It can be used to analyse both raw genotype data as well as summary SNP p-values from a previous GWAS or meta-analysis.

mantis (1.0.5)

Microsatellite Analysis for Normal-Tumor InStability is a program developed for detecting microsatellite instability from paired-end BAM files.

maps (1.1.0)

a set of multiple scripts used to analyze PLAC-Seq and HiChIP data.

MashMap (2.0)

MashMap is an approximate algorithm for computing local alignment boundaries between long DNA sequences. Given a minimum alignment length and an identity threshold, it computes the desired alignment boundaries and identity estimates using kmer-based statistics, and maintains sufficient probabilistic guarantees on the output sensitivity.

MELT (2.2.2)

MELT is an application for identifying mobile elements in genomic data

meme (5.5.5)

MEME is used to discover motifs in groups of DNA/protein sequences or databases.

metaWRAP (1.3.2)

MINTIE (0.4.2)

MINTIE is a tool for identifying novel, rare transcriptional variants in cancer RNA-seq data. MINTIE detects gene fusions, transcribed structural variants, novel splice variants and complex variants, and annotates all novel transcriptional variants.

mmseqs (2-13-45111-219-gaabc78c)

MMseqs2 (Many-against-Many sequence searching) is a software suite to search and cluster huge protein and nucleotide sequence sets.

MoChA (1.10.2; 1.11; 1.16)

MoChA is a bcftools extension to call mosaic chromosomal alterations starting from phased VCF files with either B Allele Frequency (BAF) and Log R Ratio (LRR) or allelic depth (AD)

msisensor-pro (1.2.0)

evaluates Microsatellite Instability (MSI) for cancer patients with next generation sequencing data.

MUMmer (4.0.0rc1)

Mummer is a system for aligning entire genomes extremely rapidly.

MUSCLE (5.1)

Fast Multiple Sequence Alignment program.

mutsig2cv (3.11)

mutsig2cv analyzes somatic point mutations discovered in DNA sequencing, identifying genes mutated more often than expected by chance.

napu (R9)

Napu (Nanopore Analysis Pipeline) is a collection of WDL workflows for variant calling and de novo assembly of ONT data,

ncbi-toolkit (25.2.0)

The NCBI C++ Toolkit is a set of executables and libraries for a multitude of sequence analysis functions.

ncbi-vdb (3.0.1)

The SRA Toolkit and SDK from NCBI is a collection of tools and libraries for using data in the INSDC Sequence Read Archives.

NeST-VNN (20240321)

nested (2.0.0)

nested (now also called TE-greedy) is software to analyze nested LTR transposable elements in DNA sequences, such as reference genomes.

netmhcpan (4.1)

netMHCpan predicts binding of peptides to any MHC molecule of known sequence using artificial neural networks

netOglyc (4.0)

NetOglyc produces neural network predictions of mucin type GalNAc O-glycosylation sites in mammalian proteins.

Nirvana (3.18.1)

Nirvana provides clinical-grade annotation of genomic variants (SNVs, MNVs, insertions, deletions, indels, and SVs (including CNVs). It can be run as a stand-alone package or integrated into larger software tools that require variant annotation.

NucleoATAC (0.3.4)

Octopus (0.7.4)

oncodriveCLUSTL (1.1.1)

OncodriveCLUSTL is a sequence-based clustering algorithm to detect significant clustering signals across genomic regions. It is based based on a local background model derived from the simulation of mutations accounting for the composition of trior penta-nucleotide context substitutions observed in the cohort under study.

oncodriveFML (2.2.0)

OncodriveFML is a method designed to analyze the pattern of somatic mutations across tumors in both coding and non-coding genomic regions to identify signals of positive selection, and therefore, their involvement in tumorigenesis.

OpenCRAVAT (1.7.0; 2.2.5)

OpenCRAVAT is a new open source, scalable decision support system for variant and gene prioritization. It includses a modular resource catalog to maximize community and developer involvement, and as a result the catalog is being actively developed and growing every month. Resources made available via the store are well-suited for analysis of cancer, as well as Mendelian and complex diseases.

OptiType (1.3.5)

OptiType is a HLA genotyping algorithm based on integer linear programming, capable of producing accurate 4-digit HLA genotyping predictions from NGS data by simultaneously selecting all major and minor HLA Class I alleles.

ORFfinder (0.4.3)

ORF finder searches for open reading frames (ORFs) in the DNA sequence you enter. The program returns the range of each ORF, along with its protein translation. Use ORF finder to search newly sequenced DNA for potential protein encoding segments, verify predicted protein using newly developed SMART BLAST or regular BLASTP.

OrthoFinder (2.5.4)

OrthoFinder is an accurate and comprehensive platform for comparative genomics. It finds orthogroups and orthologs, infers rooted gene trees for all orthogroups and identifies all of the gene duplication events in those gene trees.

PAINTOR (3.0)

PAINTOR (Probabilistic Annotation INtegraTOR) is a probabilistic framework that integrates association strength with genomic functional annotation data to improve accuracy in selecting plausible causal variants for functional validation.

pangenome (1.1.2)

nf-core/pangenome is a bioinformatics best-practise analysis pipeline for the rendering of a collection of sequences into a pangenome graph.

pangolin (2.3.6)

Phylogenetic Assignment of Named Global Outbreak LINeages. PANGOLIN is a system for identifying phylogenetic COVID lineages that contribute most to active spread.

parsnp (1.7.4)

Parsnp is a command-line-tool for efficient microbial core genome alignment and SNP detection. Parsnp was designed to work in tandem with Gingr, a flexible platform for visualizing genome alignments and phylogenetic trees

patric (1.035)

PATRIC is an integration of different types of data and software tools that support research on bacterial pathogens. For users that wish command-line access to PATRIC there are the P3-scripts. They are intended to run on locally, going over the network to access the services provided by PATRIC.

PEPPER_deepvariant (0.7)

PEPPER-Margin-DeepVariant is a haplotype-aware variant calling pipeline for processing third-generation nanopore sequence data. This pipeline is also applicable to PacBio HiFi data, providing an efficient solution with superior performance over the current WhatsHap-DeepVariant standard.

phasebook (1.0.0)

phasebook is a novel approach for reconstructing the haplotypes of diploid genomes from long reads de novo, that is without the need for a reference genome.

phaser (1.1.1)

phasing and Allele Specific Expression from RNA-seq. Performs haplotype phasing using read alignments in BAM format from both DNA and RNA based assays, and provides measures of haplotypic expression for RNA based assays.

phylowgs (20181105)

This Python/C++ code is the accompanying software for the paper PhyloWGS: Reconstructing subclonal composition and evolution from whole-genome sequencing of tumors, with authors Amit G. Deshwar, Shankar Vembu, Christina K. Yung, Gun Ho Jang, Lincoln Stein, and Quaid Morris.

plotsr (1.1.0)

Plotsr generates high-quality visualisation of synteny and structural rearrangements between multiple genomes. For this, it uses the genomic structural annotations between multiple chromosome-level assemblies.

popscle (0.1)

A suite of population scale analysis tools for single-cell genomics data including implementation of Demuxlet/Freemuxlet methods and auxiliary tools

PRANK (150803)

PRANK is a probabilistic multiple alignment program for DNA, codon and amino-acid sequences. PRANK is based on a novel algorithm that treats insertions correctly and avoids over-estimation of the number of deletion events.

preseq (3.1.2)

predicting library complexity and genome coverage in high-throughput sequencing

primer3 (2.6.1)

Primer3 is a program for designing PCR primers, hybridization probes, and sequencing primers.

prokka (1.14.6)

Prokka is a software tool for the rapid annotation of prokaryotic genomes.

PSIPRED (4.0)

PSIPRED is a simple and accurate secondary structure prediction method, incorporating two feed-forward neural networks which perform an analysis on output obtained from PSI-BLAST (Position Specific Iterated - BLAST).

pysamstats (1.1.2)

Pysamstats is a fast Python and command-line utility for extracting simple statistics against genome positions based on sequence alignments from a SAM or BAM file.

pySCENIC (0.12.1)

pySCENIC is a lightning-fast python implementation of the SCENIC pipeline (Single-Cell rEgulatory Network Inference and Clustering). It enables biologists to infer transcription factors, gene regulatory networks and cell types from single-cell RNA-seq data.

QTLtools (1.3.1)

A tool set for molecular QTL discovery and analysis. It allows to go from the raw sequence data to collection of molecular Quantitative Trait Loci (QTLs) in few easy-to-perform steps.

randfold (2.0.1)

RandFold computes the probability that, for a given RNA sequence, the Minimum Free Energy (MFE) of the secondary structure is different from a distribution of MFE computed with random sequences.

regenie (3.0.3)

regenie is a C++ program for whole genome regression modelling of large genome-wide association studies. It is developed and supported by a team of scientists at the Regeneron Genetics Center. regenie employs the BGEN library.

remora (2.1.1)

Modified base model training and application.

RGI (6.0.2)

RGI (Resistance Gene Identifier) is a robust antimicrobial resistance (AMR) gene predicting tool. It is based on newly curated Comprehensive Antibiotic Research Database (CARD) and allows detection detect AMR genes from thirteen genomes of Pseudomonas strains.

rMATS (4.1.2; 4.0.2)

MATS is a computational tool to detect differential alternative splicing events from RNA-Seq data.

rmblast (2.14.1)

RMBlast is a RepeatMasker-compatible version of the standard NCBI blastn program. RMBlast supports RepeatMasker searches by adding a few necessary features to the stock NCBI blastn program.

RNAmmer (1.2)

RNAmmer predicts ribosomal RNA genes in full genome sequences by utilising two levels of Hidden Markov Models: An initial spotter model searches both strands. The spotter model is constructed from highly conserved loci within a structural alignment of known rRNA sequences. Once the spotter model detects an approximate position of a gene, flanking regions are extracted and parsed to the full model which matches the entire gene.

roary (3.13.0)

Roary is a high speed stand alone pan genome pipeline, which takes annotated assemblies in GFF3 format (produced by Prokka) and calculates the pan genome.

ROSE (1.3.1)

SAIGE (1.1.9)

R package for large-scale genetic association studies.

scanpy (1.8.1)

scChromHMM (20220610)

scChromHMM provides a suite of tools for rapid processing of single-cell histone modification data to perform chromatin states analysis of the genome within each single-cell. It is an extention of bulk ChromHMM framework, which consumes the HMM model learned from ChromHMM and perform chromatin state analysis by running forward-backward algorithm for each single-cell.

scDRS (1.02)

scomatic (current)

a tool that provides functionalities to detect somatic single-nucleotide mutations in high-throughput single-cell genomics and transcriptomics data sets, such as single-cell RNA-seq and single-cell ATAC-seq

scvelo (0.2.3)

scVelo is a method to describe the rate of gene expression change for an individual gene at a given time point based on the ratio of its spliced and unspliced messenger RNA (mRNA). It avoids errors in the velocity estimates by solving the full transcriptional dynamics of splicing kinetics using a likelihood-based dynamical model. This generalizes RNA velocity to systems with transient cell states, which are common in development and in response to perturbations.

SEACR (1.3)

SEACR is intended to call peaks and enriched regions from sparse Cleavage Under Targets and Release Using Nuclease (CUT&RUN) or chromatin profiling data in which background is dominated by "zeroes" (i.e. regions with no read coverage).

selscan (1.3.0; 2.0.0)

selscan is a tool for haplotype-based scans to detect natural selection, which are useful to identify recent or ongoing positive selection in genomes. It is an efficient multithreaded application that implements Extended Haplotype Homozygosity (EHH), Integrated Haplotype Score (iHS), and Cross-population EHH (XPEHH). selscan accepts phased genotypes in multiple formats, including TPED.

seq2hla (2.3)

seq2HLA computationally determines human leukocyte antigen (HLA) genotypes of a sample using RNA-Seq sequencing reads.

sequencetubemap (2023.8)

A JavaScript module for the visualization of genomic sequence graphs. It automatically generates a "tube map"-like visualization of sequence graphs which have been created with vg.

shimmer (0.2)

Shimmer is a software package for the characterization of genetic differences between two very similar samples, e.g., a tumor sample and its matched normal tissue sample.

signalp (6.0g)

SignalP predicts the presence and location of signal peptide cleavage sites in amino acid sequences from different organisms: Gram-positive bacteria, Gram-negative bacteria, and eukaryotes. The method incorporates a prediction of cleavage sites and a signal peptide/non-signal peptide prediction based on a combination of several artificial neural networks.

SignatureAnalyzer (20240208)

SignatureAnalyzer is a tool for the identification of somatic mutational signatures. It employs Bayesian non-negative matrix factorization (NMF).

slivar (0.3.0)

slivar is a set of command-line tools that enables rapid querying and filtering of VCF files. It facilitates operations on trios and groups and allows arbitrary expressions using simple javascript.

smcpp (1.15.4)

SMC++ is a program for estimating the size history of populations from whole genome sequence data.

smoove (0.2.7)

smoove simplifies and speeds calling and genotyping SVs for short reads. It also improves specificity by removing many spurious alignment signals that are indicative of low-level noise and often contribute to spurious calls.

SnapATAC2 (2.1.2)

SnapATAC is a software package for analyzing scATAC-seq datasets. Conventional assays to map regulatory elements via open chromatin analysis of primary tissues is hindered by sample heterogeneity. SnapATAC dissects cellular heterogeneity in an unbiased manner and map the trajectories of cellular states.

splam (1.0.9)

Splice junction recognition model based on a deep residual convolutional neural network to assess splice junctions.

straw (1.3.1)

Straw is library which allows rapid streaming of contact data from .hic files.

stripenn (1.1.50; 1.1.65.15)

Stripenn is a command line interface python package developed for detection of atchitectural stripes from chromatin conformation capture (3C) data. It implements an algorithm rooted in computer vision for demarcation and quantification of the architectural stripes. Stripenn was demonstrated to outperform existing methods, be applicable in the context of analysis of B and T lymphocytes, and to allow examination of the role of sequence variation on the architectural stripes by studying the conservation of these features in inbred strains of mice.

SynthDNM (0.1.3)

SynthDNM is a random-forest based classifier that can be readily adapted to new sequencing or variant-calling pipelines by applying a flexible approach to constructing simulated training examples from real data. The optimized SynthDNM classifiers predict de novo SNPs and indels with robust accuracy across multiple methods of variant calling.

tandem-genotypes (1.9.1)

tandem-genotypes finds changes in length of tandem repeats, from "long" DNA reads aligned to a genome.

tantan (40)

A tool to mask low complexity and short period tandem repeats

TElocal (1.1.1)

TElocal: a tool that utilizes both uniquely and ambiguously mapped reads to quantify transposable element expression at the locus level.

TelomereHunter (1.1.0)

TelomereHunter is a software for the detailed characterization of telomere maintenance mechanism footprints in the genome. The tool is implemented for the analysis of large cancer genome cohorts and provides a variety of diagnostic diagrams as well as machine-readable output for subsequent analysis.

TMHMM (2.0c)

TMHMM predicts transmembrane helices in proteins.

toga (1.1.2)

TOGA is a new method that integrates gene annotation, inferring orthologs and classifying genes as intact or lost. TOGA implements a novel machine learning based paradigm to infer orthologous genes between related species and to accurately distinguish orthologs from paralogs or processed pseudogenes. This tutorial explains how to get started using TOGA. It shows how to install and execute TOGA, and how to handle possible issues that may occur.

tombo (1.5.1)

a suite of tools primarily for the identification of modified nucleotides from nanopore sequencing data.

TRF (4.09)

A tandem repeat in DNA is two or more adjacent, approximate copies of a pattern of nucleotides. Tandem Repeats Finder is a program to locate and display tandem repeats in DNA sequences.

trgt (0.8.0)

TRGT is a tool for targeted genotyping of tandem repeats from PacBio HiFi data. In addition to the basic size genotyping, TRGT profiles sequence composition, mosaicism, and CpG methylation of each analyzed repeat. TRGT comes with a companion tool TRVZ for visualization of reads overlapping the repeats.

trimAl (1.2rev59)

triodenovo (0.06)

triodenovo is an application for calling de novo mutations in trios for NGS data.

trust4 (1.0.8)

Tcr Receptor Utilities for Solid Tissue (TRUST) is a computational tool to analyze TCR and BCR sequences using unselected RNA sequencing data, profiled from solid tissues, including tumors. TRUST4 performs de novo assembly on V, J, C genes including the hypervariable complementarity-determining region 3 (CDR3) and reports consensus of BCR/TCR sequences. TRUST4 then realigns the contigs to IMGT reference gene sequences to report the corresponding information. TRUST4 supports both single-end and paired-end sequencing data with any read length.

ultra (0.1)

uLTRA implements an alignment method for long RNA sequencing reads based on a novel two-pass collinear chaining algorithm. uLTRA is guided by a database of exon annotations, but it can also be used as a wrapper around minimap2 to align reads outside annotated regions.

unicycler (0.5.0)

Unicycler is an assembly pipeline for bacterial genomes. It can assemble Illumina-only read sets where it functions as a SPAdes-optimiser. It can also assembly long-read-only sets (PacBio or Nanopore) where it runs a miniasm+Racon pipeline.

uropa (4.0.2)

UROPA is a command line based tool for genomic region annotation

usearch (11.0.667)

USEARCH is a unique sequence analysis tool with thousands of users world-wide. USEARCH offers search and clustering algorithms that are often orders of magnitude faster than BLAST.

VADR (1.3)

VADR stands for Viral Annotation DefineR. It is a suite of tools for classifying and analyzing sequences homologous to a set of reference models of viral genomes or gene families. It has been mainly tested for analysis of Norovirus, Dengue, and SARS-CoV-2 virus sequences in preparation for submission to the GenBank database.

VarScan (2.4.6)

A platform-independent, technology-independent software tool for identifying SNPs and indels in massively parallel sequencing of individual and pooled samples.

VCF-kit (0.2.9)

VCF-kit is a collection of utility tools for processing and analyzing the VCF (variant call format) files, including primer generation for variant validation, dendrogram production,genotype imputation from sequence data in linkage studies, and additional tools to be used by statistical and population geneticists.

vcontact2 (0.11.3)

vConTACT2 is a tool to perform guilt-by-contig-association classification of viral genomic sequence data. It's designed to cluster and provide taxonomic context of metagenomic sequencing data.

velocyto (0.17)

Velocyto is a library for the analysis of RNA velocity. It includes a command line tool and an analysis pipeline.

velvet (1.2.10)

Velvet is a de novo genomic assembler specially designed for short read sequencing technologies, such as Solexa or 454

victor (1.2beta)

VICTOR (Variant Interpretation for Clinical Testing Or Research) is a pipeline that can be used for disease gene discovery research or clinical genetic testing

viennarna (2.5.1)

RNA Secondary Structure Prediction and Comparison

viper (0-20231003-1525270)

VirSorter2 (2.2.3)

weblogo (3.6)

contains seqlogo utility to create sequence logo summarizing sequence alignments

xengsort (1.1.0)

A fast xenograft read sorter based on space-efficient k-mer hashing.

xenome (1.0.1)

xenome is a tool for classifying reads from xenograft source.

xHLA (2018-04-04)

xtail (1.1.5)

Xtail is an analysis pipeline tailored for ribosome profiling data that comprehensively and accurately identifies differentially translated genes in pairwise comparisons. Applied on simulated and real datasets, Xtail exhibits high sensitivity with minimal false-positive rates, outperforming existing methods in the accuracy of quantifying differential translations.

xTea (0.1.9; 1.0.0)

xTea (x-Transposable element analyzer), is a tool for identifying TE insertions in whole-genome sequencing data. Whereas existing methods are mostly designed for shortread data, xTea can be applied to both short-read and long-read data. xTea outperforms other short read-based methods for both germline and somatic TE insertion discovery.

Structural Biology

alphafold2 (2.3.2)

This package provides an implementation of the protein structure inference pipeline of AlphaFold v2.0.

alphalink (1.0)

AlphaLink predicts protein structures using deep learning given a sequence and a set of experimental contacts. It extends OpenFold with crosslinking MS data or other experimental distance restraint by explicitly incorporating them in the OpenFold architecture.

AlphaPulldown (1.0.4)

AlphaPulldown is a Python package that streamlines protein-protein interaction screens and high-throughput modelling of higher-order oligomers using AlphaFold-Multimer. It provides a convenient command-line interface, a variety of confidence scores and a graphical analysis tool.

Autodock-GPU (1.5.3)

Autodock-GPU performs docking calculations, and processes ligand-receptor poses in parallel over multiple compute units on GPUs.

braker (3)

A pipeline for fully automated prediction of protein coding gene structures with GeneMark-ES/ET and AUGUSTUS in novel eukaryotic genomes.

CCP4 (8.0.016)

CCP4 is a suite of programs for protein crystallography and structural biology.

colabfold (1.5.5)

ColabFold batch scripts

cryolo (1.9.2)

Automated particle picker for cryo-EM

CSD (2022)

The Cambridge Structural Database is the world repository of small molecule crystal structures.

Dali (5.1)

The three-dimensional co-ordinates of each protein are used to calculate residue - residue distance matrices.

deepmedic (0.8.4)

DeepMM (20220830)

DFC (20240124)

This application is intended to evaluate coevolutionary structural predictions of fold-switching proteins. State-of-the art algorithms predict that these fold-switching proteins assume only one stable structure. We hypothesize that coevolutionary signatures are being missed. Fold-switching proteins have the ability to transition between two sets of stable secondary and tertiary structure. The approach successfully revealed coevolution of amino acid pairs uniquely corresponding to both conformations of 56 fold-switching proteins from distinct families.

DSSP (2.3.0)

The DSSP program was designed by Wolfgang Kabsch and Chris Sander to standardize secondary structure assignment. DSSP is a database of secondary structure assignments (and much more) for all protein entries in the Protein Data Bank (PDB). DSSP is also the program that calculates DSSP entries from PDB entries. DSSP does not predict secondary structure.

esm (2.0.0-74-g2b36991)

Code and pre-trained weights for Transformer protein language models from the Meta Fundamental AI Research Protein Team (FAIR)

foldseek (5-53465f0)

Fast structural similarity search

HBD / heme_binder_diffusion (20240319)

lammps (29Oct20)

mdtraj (1.9.7)

MDTraj is a python library that allows users to manipulate molecular dynamics (MD) trajectories and perform a variety of analyses, including fast RMSD, solvent accessible surface area, hydrogen bonding, etc.

mustache (1.0.1)

Mustache is a tool for identifying chromatin loops from HiC and MicroC contact maps

naccess (2.1.1)

The naccess program calculates the atomic accessible surface defined by rolling a probe of given size around a van der Waals surface.

OmegaFold (1.1.0)

OmegaFold is the first computational method to successfully predict high-resolution protein structure from a single primary sequence alone. Using a new combination of a protein language model that allows us to make predictions from single sequences and a geometry-inspired transformer model trained on protein structures, OmegaFold outperforms RoseTTAFold and achieves similar prediction accuracy to AlphaFold2 on recently released structures.

Phenix (1.20.1-4487)

PHENIX is a software suite for the automated determination of macromolecular structures using X-ray crystallography and other methods.

PyMOL (2.6.0)

A comprehensive molecular visualization product for rendering and animating 3D molecular structures.

PyRosetta (360.py3.10)

PyRosetta is an interactive Python-based interface to the powerful Rosetta molecular modeling suite. It enables users to design their own custom molecular modeling algorithms using Rosetta sampling methods and energy functions.

ReLeaSE (20220825)

RFdiffusion (1.1.0)

RoseTTAFold (allatom)

Accurate prediction of protein structures and interactions using a 3-track network, , in which information at the 1D sequence level, the 2D distance map level, and the 3D coordinate level is successively transformed and integrated.

rosettafoldna (0.2)

Accurate prediction of nucleic acid and protein-nucleic acid complexes using RoseTTAFoldNA

Schrodinger (2023.1)

A limited number of Schrödinger applications are available on the Biowulf cluster through the Molecular Modeling Interest Group. Most are available through the Maestro GUI.

sidesplitter (1.2)

Sidesplitter reduces over-fitting in both idealised and experimental settings, while maintaining independence between the two sides of a split refinement. It can improve the final resolution in refinements of structures prone to severe over-fitting, such as membrane proteins in detergent micelles.

subtom (1.1.6-32f731b)

Subtom is a pipeline for subvolume alignment and averaging of electron cryo-tomography data.

vmd (1.9.3)

VMD is a molecular visualization program for displaying, animating, and analyzing large biomolecular systems using 3-D graphics and built-in scripting. To use, type vmd at the prompt.

Xplor-NIH (3.8)

Xplor-NIH is a structure determination program which builds on the X-PLOR v3.851 program, including additional tools developed at the NIH.

ZDOCK (3.0.2)

ZDOCK predicts protein-docking models, and uses a fast Fourier transform to search all possible binding modes for proteins, evaluating based on shape complementarity, desolvation energy, and electrostatics.

Systems Biology

bakta (1.9.1)

Rapid & standardized annotation of bacterial genomes, MAGs & plasmids

bionetgen (0.7.9)

BioNetGen is software for the specification and simulation of rule-based models of biochemical systems, including signal transduction, metabolic, and genetic regulatory networks. BioNetGen is presently a mixture of Perl, C++, and Python.

cellphonedb (3.1.0)

A publicly available repository of curated receptors, ligands and their interactions. Subunit architecture is included for both ligands and receptors, representing heteromeric complexes accurately.

CytoSig (0.1)

dyno (20220906)

dyno is a meta package that installs several other packages from the dynvers (https://github.com/dynverse). It comprises a set of R packages to construct and interpret single-cell trajectories.

eggNOGmapper (2.1.6)

eggNOGmapper is a tool for functional annotation of large sets of sequences based on fast orthology assignments using precomputed clusters and phylogenies from the eggNOG database. Orthology assignment is ideally suited for functional inference. However, predicting orthology is computationally intensive at large scale, and most other pipelines are relatively inaccessible (e.g., new assignments only available through database updates), so less precise homology-based functional transfer was previously the default for (meta-)genome annotation.

GCN_Cancer (20221105)

gsea (4.3.2)

Gene Set Enrichment Analysis (GSEA) is a computational method that determines whether an a priori defined set of genes shows statistically significant, concordant differences between two biological states (e.g. phenotypes).

MAKER (3.01.03)

MAKER is a portable and easily configurable genome annotation pipeline. Its purpose is to allow smaller eukaryotic and prokaryotic genome projects to independently annotate their genomes and to create genome databases. MAKER identifies repeats, aligns ESTs and proteins to a genome, produces ab-initio gene predictions and automatically synthesizes these data into gene annotations having evidence-based quality values.

Neuron (7.7)

NEURON is a simulation environment for modeling individual neurons and networks of neurons. It provides tools for conveniently building, managing, and using models in a way that is numerically sound and computationally efficient. It is particularly well-suited to problems that are closely linked to experimental data, especially those that involve cells with complex anatomical and biophysical properties.

pySCENIC (0.12.1)

ROBOT (1.9.4)

ROBOT is a command-line tool and library for automating ontology development tasks, with a focus on Open Biological and Biomedical Ontologies (OBO). It can be used as a command-line tool or as a library for any language on the Java Virtual Machine.

scvelo (0.2.3)

vdjtools (1.2.1)

A comprehensive analysis framework for T-cell and B-cell repertoire sequencing data.

Utilities

apptainer (1.1.6)

Apptainer allows you to build and run Linux containers with emphasis on use in HPC. Apptainer is the Linux Foundation variant of and successor to the widely popular Singularity.

aria2 (1.36.0)

multiprotocol download utility
Type 'module load aria2'then 'aria2c --help' for more info.

asciinema (2.4.0)

asciinema [as-kee-nuh-muh] is a free and open source solution for recording terminal sessions and sharing them.
Type 'module load asciinema' then 'asciinema' to run.

Aspera (3.7..4)

High-speed fasp-powered file transfers. Mostly used to download data from NCBI, which has an Aspera server. See the data transfer page for details.

autoconf (2.72)

Autoconf is an extensible package of M4 macros that produce shell scripts to automatically configure software source code packages.

AWS (1.25.49)

Command-line tools for Amazon Web Services. Use 'module load python; aws -help' to see the command-line help, or http://aws.amazon.com/cli/.

azcopy (10.11.0)

a command-line utility to copy blobs or files to or from Azure storage

bakta (1.9.1)

Rapid & standardized annotation of bacterial genomes, MAGs & plasmids

bbcp (15.02.03.01.1)

Secure and fast copy utility

BGEN (20230420)

The BGEN library contains a reference implementation of the BGEN format, written in C++. The library can be used as the basis for BGEN support in other software.

circos (0.69-9)

coreutils (9.1)

The GNU Core Utilities are the basic file, shell and text manipulation utilities of the GNU operating system.

cpdf (2.5.1)

Coherent PDF tools

crypt4gh (0.4.1)

Rust implementation for the Crypt4GH encryption format.

curl (8.5.0)

command line tool and library for transferring data with URLs

datalad (0.13.0rc2)

Datalad is a tool for uploading and downloading public up-t-to-date neuroimaging datasets.

datamash (1.7)

GNU datamash is a command-line program which performs basic numeric,textual and statistical operations on input textual data files.

DNAnexus (0.352.1)

DNAnexus is a cloud-based commercial solution for next-generation sequence analysis and visualization. It has a command-line interface (CLI) which can be used to log in to the DNAnexus platform, upload and navigate data, and launch analyses.

dxda (0.6.0)

CLI tool to manage the download of large quantities of files from DNAnexus

EDirect (21.3.20240124)

Entrez Direct (EDirect) is an advanced method for accessing the NCBI's set of interconnected databases (publication, sequence, structure, gene, variation, expression, etc.) from a UNIX terminal window.

gdc-client (1.6.1)

The GDC Data Transfer Tool provides an optimized method of transferring data to and from the GDC, and enables resumption of interrupted transfers.

gdrive (2.1.0)

gdrive is a command line utility for interacting with Google Drive.

Ghostscript (9.22)

Ghostscript is an interpreter for the PostScript language and for PDF.

globus-timer-cli (0.2.9)

Command-line client for setting up and monitoring recurring or scheduled Globus transfers

gnuplot (6.0.0)

Gnuplot is a portable command-line driven graphing utility to visualize mathematical functions and data interactively, and can support many non-interactive uses such as web scripting.
Type 'gnuplot' to run, or 'module avail gnuplot' to see other available versions.

google-cloud-sdk (397.0.0)

Google Cloud SDK is a set of tools that you can use to manage resources and applications hosted on Google Cloud Platform. These include the gcloud, gsutil, and bq command line tools. See docs at https://cloud.google.com/sdk/docs/how-to.
Type 'module load google-cloud-sdk' to use on Biowulf.

Grace (5.1.25)

Grace is a WYSIWYG 2D plotting tool for the X-Window system. It is a successor to Xmgr.
Type 'module load grace', then 'xmgrace' or 'gracebat' to run.

graphviz (2.40)

Graphviz is open source graph visualization software. Graph visualization is a way of representing structural information as diagrams of abstract graphs and networks. It has important applications in networking, bioinformatics, software engineering, database and web design, machine learning, and in visual interfaces for other technical domains.

groff (1.22.3)

Groff (GNU troff) is a typesetting system that reads plain text mixed with formatting commands and produces formatted output. Output may be PostScript or PDF, html, or ASCII/UTF8 for display at the terminal. Formatting commands may be either low-level typesetting requests (“primitives”) or macros from a supplied set. Users may also write their own macros. All three may be combined.

h5utils (1.13.1)

h5utils is a set of utilities for visualization and conversion of scientific data in the free, portable HDF5 format. Type 'module load h5utils' to access the executables (e.g. h5topng)

ImageMagick (7.1.0)

ImageMagick is a software suite to create, edit, compose, or convert bitmap images. It can read and write images in a variety of formats.
The ImageMagick tools are available by default (e.g. type 'convert') or use 'module load ImageMagick' to load the latest versions.

inkscape (1.3)

Inkscape is a Free and open source vector graphics editor

JAX (0.3.7)

JAX is a python library that brings Autograd and XLA (Accelerated Linear Algebra) together for high-performance machine learning research. JAX uses XLA to compile and run your NumPy programs on GPUs. Compilation happens under the hood by default, with library calls getting just-in-time compiled and executed. But JAX also lets you just-in-time compile your own Python functions into XLA-optimized kernels using a one-function API, jit

jo (1.6)

A small utility to create JSON objects from command line arguments.

jq (1.6)

Command line json processor

jupyter (5.0.0)

Jupyter is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and explanatory text. Uses include: data cleaning and transformation, numerical simulation, statistical modeling, machine learning and much more.

k8 (0.2.5)

javascript shell based on v8, but with fasta/fastq parser and the ability to open files on the filesystem

kronatools (2.8)

Krona allows hierarchical data to be explored with zooming, multi-layered pie charts. Krona charts can be created using an Excel template or KronaTools, which includes support for several bioinformatics tools and raw data formats. The interactive charts are self-contained and can be viewed with any modern web browser.

lbzip2 (2.5)

lbzip2 is a multi-threaded compression utility with support for bzip2 compressed file format.

longshot (0.4.5)

Longshot is a variant calling tool for diploid genomes using long error prone reads such as Pacific Biosciences (PacBio) SMRT and Oxford Nanopore Technologies (ONT). It takes as input an aligned BAM file and outputs a phased VCF file with variants and haplotype information. It can also output haplotype-separated BAM files that can be used for downstream analysis. Currently, it only calls single nucleotide variants (SNVs).

mariadb (10.4)

MariaDB Server is one of the most popular database servers in the world. It’s made by the original developers of MySQL and guaranteed to stay open source.

mc (4.8.31)

GNU Midnight Commander is a visual file manager, with a feature rich full-screen text mode application that allows you to copy, move and delete files and whole directory trees, search for files and run commands in the subshell. Type module load mc and then the command mc to get started.

megit (0.4.0)

Minimal Eclipe GIT. A graphical front end for git based off the eclipse IDE.

MySQL (8.0.34)

MySQL is an open-source relational database management system.

nda-tools (0.2.12,0.2.16,0.2.26)

In order to submit data to the National Institute of Mental Health Data Archives (NDA), users must validate their data to ensure it complies with the required format. This is done using the NDA validation tool, vtcmd. Additionally, users can package and download data from NDA as well, using the downloadcmd tool.

netpbm (10.86.33)

Netpbm is a toolkit for manipulation of graphic images, including conversion of images between a variety of different formats. There are over 300 separate tools in the package including converters for about 100 graphics formats. Examples of the sort of image manipulation we're talking about are: Shrinking an image by 10%; Cutting the top half off of an image; Making a mirror image; Creating a sequence of images that fade from one image to another.

nvchecker (2.13.1)

nvchecker (short for new version checker) is for checking if a new version of some software has been released.

OpenNeuro_cli (4.14.1)

This command-line tool allows you to upload and download OpenNeuro.org datasets without a browser.

parallel (20240322)

GNU parallel is a shell tool for executing jobs in parallel using one or more computers.

paraview (5.11.1)

ParaView is an open-source, multi-platform data analysis and visualization application.

patchelf (0.17)

patchelf is a small utility to modify the dynamic linker and RPATH of ELF executables.

patric (1.035)

pdf2svg (0.2.3)

A simple PDF to SVG converter using the Poppler and Cairo libraries.

petsc (3.19.1)

PETSc (Portable, Extensible Toolkit for Scientific Computation) is a suite of data structures and routines for the scalable (parallel) solution of scientific applications modeled by partial differential equations. It employs the MPI standard for all message-passing communication.

pigz (2.7)

pigz (parallel implementation of gzip) is a fully functional replacement for gzip that exploits multiple processors and multiple cores to the hilt when compressing data.

pilon (1.24)

Pilon is a software tool which can be used to: 1) Automatically improve draft assemblies 2) Find variation among strains, including large event detection

postgresql (16.1)

PostgreSQL is a powerful, open source object-relational database system with over 35 years of active development that has earned it a strong reputation for reliability, feature robustness, and performance.

POVRay (3.7)

POVRAY (Persistence of Vision RAYtracer) is a high-quality tool for creating three-dimensional graphics. Raytraced images are publication-quality and 'photo-realistic', but are computationally expensive so that large images can take many hours to create.

pre-commit (3.5.0)

A framework for managing and maintaining multi-language pre-commit hooks

pyega3 (5.0.2)

qpdf (1.11.1)

QPDF is a command-line tool and C++ library that performs content-preserving transformations on PDF files

quarto (1.4.550)

Quarto is an open-source scientific and technical publishing system built on Pandoc

rclone (1.62.2)

Rclone is a utility for synchronizing directories on a file-based storage system (e.g. /home or /data) with an object store such as Amazon S3. It uses the S3 protocol, and it can be used with the HPC object storage system.

rdfind (1.6.0)

rdfind is a program that finds duplicate files. It is useful for compressing backup directories or just finding duplicate files. It compares files based on their content, NOT on their file names. After typing module load rdfind, type man rdfind for more information.

rfmix (2.0)

Local Ancestry and Admixture Inference

ripgrep (14.1.0)

ripgrep, a modern line-oriented search tool providing rg

rstudio-server (2023.06.0-421)

RStudio Server is a web-based R IDE similar to RStudio Desktop.

sb_cli (0.25.0)

Use the Seven Bridges Command Line Interface (SB CLI) to programmatically access and automate your interaction with the Platform via the API. The CLI is called by a simple command: sb.

screen (4.9.1)

Screen is a full-screen window manager that multiplexes a physical terminal between several processes, typically interactive shells.

singularity (4.0.1)

Singularity is a container platform focused on supporting ``Mobility of Compute``. It allows users to emulate, and share custom Linux environments allowing for the creation of self-contained development stacks.

snp-sites (2.4.1)

Rapidly extracts SNPs from a multi-FASTA alignment

spark (3.2.2)

Apache Spark is a fast and general engine for large-scale data processing. It is commonly used as an in-memory alternative to Hadoop MapReduce.

SQLite (3.38.5)

SQLite is a software library that implements a self-contained, serverless, zero-configuration, transactional SQL database engine.

stow (2.3.1)

GNU Stow is a symlink farm manager which takes distinct packages of software and/or data located in separate directories on the filesystem, and makes them appear to be installed in the same place.

swarm (23.2.1)

Swarm is a script designed to simplify submitting a group of commands to the Biowulf cluster.

swig (4.1.1)

SWIG is a software development tool that connects programs written in C and C++ with a variety of high-level programming languages.

synapseclient (4.0.0)

The synapseclient package provides an interface to Synapse, a collaborative workspace for reproducible, data intensive research projects

tmux (3.3a)

tmux is a terminal multiplexer.
Type 'module load tmux' to load the module, then 'tmux --help'

UKBB (0.1)

UK Biobank tools for downloading and processing UKBB data. Type 'module load ukbb' to access them.

vartrix (1.1.22)

VarTrix is a software tool for extracting single cell variant information from 10x Genomics single cell data.

vcf2db (2020.09.14)

vcf2db creates a gemini-compatible database from a VCF.

visidata (2.11.1)

VisiData is an interactive multitool for tabular data

whatshap (1.1)

WhatsHap is a software for phasing genomic variants using DNA sequencing reads, also called read-based phasing or haplotype assembly. It is especially suitable for long reads, but works also well with short reads.

wuzz (0.5.0)

Interactive cli tool for HTTP inspection

XAR (1.8.0.417.1)

eXtensible ARchiver

xclip (0.13)

xclip is a command line utility that is designed to run on any system with an X11 implementation. It provides an interface to X selections ("the clipboard") from the command line. It can read data from standard in or a file and place it in an X selection for pasting into other X applications. xclip can also print an X selection to standard out, which can then be redirected to a file or another program.

xpdf (4.04)

Xpdf is a free PDF viewer and toolkit, including a text extractor, image converter, HTML converter, and more. Most of the tools are available as open source.

zstd (1.5.6)

Zstandard, or zstd as short version, is a fast lossless compression algorithm, targeting real-time compression scenarios at zlib-level and better compression ratios.

Workflow Managers

caper (2.1.3)

A Python-based wrapper for the Cromwell pipeline system used by ENCODE pipelines. Includes the croo and qc2tsv commands for parsing Cromwell and quality control output.

cromwell (84)

A Workflow Management System geared towards scientific workflows.

nextflow (23.10.0)

Data-driven computational pipelines

snakemake (8.4.12)

Snakemake aims to reduce the complexity of creating workflows by providing a fast and comfortable execution environment, together with a clean and modern domain specific specification language (DSL) in python style. It is well suited for bioinformatic workflows.