Scientific Reference Data

We provide a set of centrally-maintained scientific reference databases for Biowulf users. You can search through this data here. To request a new database or an update, please contact us at staff@hpc.nih.gov.


OR

Search by keywordSearches through metadata using keywords
Search by filenameSearches through filenames where available


Browse Common Databases

Recently Updated:

2025-11-05 ensembl Ensembl is a genome browser for vertebrate genomes that supports research in comparative genomics, evolution, sequence variation and transcriptional regulation.
2025-03-11 Betacoronavirus Blast database of Betacoronavirus nucleotide sequences. (Blast database full path and name - /fdb/blastdb/Betacoronavirus)
2025-03-11 NCBI nt Blast database NCBI nonredundant comprehensive nucleotide database, compiled from Genbank, Refseq, TPA and PDB. (Blast database full path and name - /fdb/blastdb/nt )
2025-03-11 Patent nucleotide sequences Blast db Patent nucleotide sequences (Blast database full path and name - /fdb/blastdb/patnt )
2025-03-11 PDB nucleotide sequences Blast db Protein Data Bank nucleotide sequences. (Blast database full path and name - /fdb/blastdb/pdbnt )
2025-03-11 PDB protein sequences Blast db Protein Data Bank sequences. (Blast database full path and name - /fdb/blastdb/pdbaa )
2025-03-11 Swissprot Blast database Curated, highly-annotated protein sequence database (Blast database full path and name - /fdb/blastdb/swissprot )
2025-03-11 taxonomy The Taxonomy Database is a curated classification and nomenclature for all of the organisms in the public sequence databases.
2025-03-10 NCBI nr Blast database NCBI nonredundant comprehensive protein database, compiled from GenBank CDS translations, PDB, Swiss-Prot, PIR, and PRF (Blast database full path and name - /fdb/blastdb/nr )
2025-03-06 VEP VEP determines the effect of your variants (SNPs, insertions, deletions, CNVs or structural variants) on genes, transcripts, and protein sequence, as well as regulatory regions.
2025-03-03 NCBI SRA Refseq data NCBI SRA Refseq data
2025-02-24 Reference data for the cellranger-arc pipeline References for the 10x Genomics cellranger-arc pipeline
2025-02-20 nf-core singularity images Here includes several singularity images of nf-core pipelines. Eg. rnaseq, sarek Other pipelines are going to be added.
2025-02-10 Network parameters for ESM [Not available]
2025-01-22 dorado models Models for the dorado basecaller by ONT
2024-12-20 Interproscan databases Databases for our local interproscan implementation. Database versions are tied to interproscan releases
2024-12-12 Standard databases for foldseek foldseek provided databases for AlphafoldDB (Swiss-Prot, Proteome, and others)
2024-11-19 CADD CADD is a tool for scoring the deleteriousness of single nucleotide variants as well as insertion/deletions variants in the human genome.
2024-11-18 biogans BioGANs is a novel application of Generative Adversarial Networks (GAN) to the synthesis of cells imaged by fluorescence microscopy.
2024-11-18 metawrap MetaWRAP is a modular pipeline for shotgun metagenomic data analysis.