Biowulf High Performance Computing at the NIH
Scientific Reference Data

We provide a set of centrally-maintained scientific reference databases for Biowulf users. You can search through this data here. To request a new database or an update, please contact us at staff@hpc.nih.gov.


OR

Search by keywordSearches through metadata using keywords
Search by filenameSearches through filenames where available


Browse Common Databases

Recently Updated:

2022-11-26 Betacoronavirus Blast database of Betacoronavirus nucleotide sequences. (Blast database full path and name - /fdb/blastdb/Betacoronavirus)
2022-11-26 taxonomy The Taxonomy Database is a curated classification and nomenclature for all of the organisms in the public sequence databases.
2022-11-25 I-TASSER ITLIB I-TASSER Template Library for Protein Structure and Function Prediction
2022-11-23 VEP VEP determines the effect of your variants (SNPs, insertions, deletions, CNVs or structural variants) on genes, transcripts, and protein sequence, as well as regulatory regions.
2022-11-22 NCBI nt Blast database NCBI nonredundant comprehensive nucleotide database, compiled from Genbank, Refseq, TPA and PDB. (Blast database full path and name - /fdb/blastdb/nt )
2022-11-22 PDB nucleotide sequences Blast db Protein Data Bank nucleotide sequences. (Blast database full path and name - /fdb/blastdb/pdbnt )
2022-11-22 PDB protein sequences Blast db Protein Data Bank sequences. (Blast database full path and name - /fdb/blastdb/pdbaa )
2022-11-22 Swissprot Blast database Curated, highly-annotated protein sequence database (Blast database full path and name - /fdb/blastdb/swissprot )
2022-11-21 NCBI nr Blast database NCBI nonredundant comprehensive protein database, compiled from GenBank CDS translations, PDB, Swiss-Prot, PIR, and PRF (Blast database full path and name - /fdb/blastdb/nr )
2022-11-20 UCSC goldenPath The UCSC Genomics Institute maintains a broad collection of vertebrate and model organism assemblies and annotations, along with a large suite of tools for viewing, analyzing and downloading data.
2022-11-17 UCSC gbdb The UCSC Genomics Institute maintains a broad collection of vertebrate and model organism assemblies and annotations, along with a large suite of tools for viewing, analyzing and downloading data.
2022-11-05 Patent nucleotide sequences Blast db Patent nucleotide sequences (Blast database full path and name - /fdb/blastdb/patnt )
2022-10-17 Standard databases for foldseek foldseek provides prebuilt databases for AlphafoldDB (Swiss-Prot, Proteome, and UniProt50) as well as PDB.
2022-10-16 dfam The Dfam database is a open collection of Transposable Element DNA sequence alignments, hidden Markov Models (HMMs), consensus sequences, and genome annotations.
2022-10-10 ensembl Ensembl is a genome browser for vertebrate genomes that supports research in comparative genomics, evolution, sequence variation and transcriptional regulation.
2022-10-04 CogentAP pipeline reference data Reference data for the Cogent NGS Analysis Pipeline for sequencing data generated with TaKaRa Bio platforms.
2022-09-27 colabfold sequence databases Databases used by colabfold for MSA generation with mmseqs2.
2022-09-27 Interproscan databases Databases for our local interproscan implementation. Database versions are tied to interproscan releases
2022-09-20 Rfam Rfam is part of RNA database for rosettafold2na
2022-09-20 rnacentral rnacentral is part of RNA database for rosettafold2na