Scientific Reference Data

We provide a set of centrally-maintained scientific reference databases for Biowulf users. You can search through this data here. To request a new database or an update, please contact us at staff@hpc.nih.gov.


OR

Search by keywordSearches through metadata using keywords
Search by filenameSearches through filenames where available


Browse Common Databases

Recently Updated:

2024-07-23 NCBI nt Blast database NCBI nonredundant comprehensive nucleotide database, compiled from Genbank, Refseq, TPA and PDB. (Blast database full path and name - /fdb/blastdb/nt )
2024-07-23 Patent nucleotide sequences Blast db Patent nucleotide sequences (Blast database full path and name - /fdb/blastdb/patnt )
2024-07-23 PDB protein sequences Blast db Protein Data Bank sequences. (Blast database full path and name - /fdb/blastdb/pdbaa )
2024-07-23 Swissprot Blast database Curated, highly-annotated protein sequence database (Blast database full path and name - /fdb/blastdb/swissprot )
2024-07-23 taxonomy The Taxonomy Database is a curated classification and nomenclature for all of the organisms in the public sequence databases.
2024-07-22 Betacoronavirus Blast database of Betacoronavirus nucleotide sequences. (Blast database full path and name - /fdb/blastdb/Betacoronavirus)
2024-07-22 NCBI nr Blast database NCBI nonredundant comprehensive protein database, compiled from GenBank CDS translations, PDB, Swiss-Prot, PIR, and PRF (Blast database full path and name - /fdb/blastdb/nr )
2024-07-20 PDB nucleotide sequences Blast db Protein Data Bank nucleotide sequences. (Blast database full path and name - /fdb/blastdb/pdbnt )
2024-07-16 VEP VEP determines the effect of your variants (SNPs, insertions, deletions, CNVs or structural variants) on genes, transcripts, and protein sequence, as well as regulatory regions.
2024-07-09 I-TASSER ITLIB I-TASSER Template Library for Protein Structure and Function Prediction
2024-07-03 LD Scores A minimal set of files necessary to run LDSC, as well as a low-frequency variants model applied to UK Biobank SNPs. See /fdb//ldsc/readme_baseline_versions for details. Additional data are available on a requester-pays basis from https://console.cloud.google.com/storage/browser/broad-alkesgroup-public-requester-pays
2024-05-21 ensembl Ensembl is a genome browser for vertebrate genomes that supports research in comparative genomics, evolution, sequence variation and transcriptional regulation.
2024-05-20 diamond Select reference databases for the diamond application
2024-05-05 UCSC goldenPath The UCSC Genomics Institute maintains a broad collection of vertebrate and model organism assemblies and annotations, along with a large suite of tools for viewing, analyzing and downloading data.
2024-05-03 GATK resource bundles [Not available]
2024-05-03 Metaphlan Databases Metaphlan database files
2024-05-01 HUMAnN Reference Data Reference data for the HUMAnN pipeline
2024-04-02 TomoTwin Models Models for the particle picking software TomoTwin
2024-03-21 dorado models Models for the dorado basecaller by ONT
2024-03-15 Reference data for the cellranger pipeline References for the 10x Genomics cellranger pipeline