Scientific Reference Data

We provide a set of centrally-maintained scientific reference databases for Biowulf users. You can search through this data here. To request a new database or an update, please contact us at staff@hpc.nih.gov.


OR

Search by keywordSearches through metadata using keywords
Search by filenameSearches through filenames where available


Browse Common Databases

Recently Updated:

2023-05-30 taxonomy The Taxonomy Database is a curated classification and nomenclature for all of the organisms in the public sequence databases.
2023-05-28 UCSC goldenPath The UCSC Genomics Institute maintains a broad collection of vertebrate and model organism assemblies and annotations, along with a large suite of tools for viewing, analyzing and downloading data.
2023-05-26 I-TASSER ITLIB I-TASSER Template Library for Protein Structure and Function Prediction
2023-05-25 UCSC gbdb The UCSC Genomics Institute maintains a broad collection of vertebrate and model organism assemblies and annotations, along with a large suite of tools for viewing, analyzing and downloading data.
2023-05-09 Betacoronavirus Blast database of Betacoronavirus nucleotide sequences. (Blast database full path and name - /fdb/blastdb/Betacoronavirus)
2023-05-09 colabfold sequence databases Databases used by colabfold for MSA generation with mmseqs2.
2023-05-09 dbNSFP dbNSFP is a database developed for functional prediction and annotation of all potential non-synonymous single-nucleotide variants (nsSNVs) in the human genome.
2023-05-09 NCBI nt Blast database NCBI nonredundant comprehensive nucleotide database, compiled from Genbank, Refseq, TPA and PDB. (Blast database full path and name - /fdb/blastdb/nt )
2023-05-09 Patent nucleotide sequences Blast db Patent nucleotide sequences (Blast database full path and name - /fdb/blastdb/patnt )
2023-05-09 PDB nucleotide sequences Blast db Protein Data Bank nucleotide sequences. (Blast database full path and name - /fdb/blastdb/pdbnt )
2023-05-09 PDB protein sequences Blast db Protein Data Bank sequences. (Blast database full path and name - /fdb/blastdb/pdbaa )
2023-05-09 Swissprot Blast database Curated, highly-annotated protein sequence database (Blast database full path and name - /fdb/blastdb/swissprot )
2023-05-08 NCBI nr Blast database NCBI nonredundant comprehensive protein database, compiled from GenBank CDS translations, PDB, Swiss-Prot, PIR, and PRF (Blast database full path and name - /fdb/blastdb/nr )
2023-05-01 diamond Select reference databases for the diamond application
2023-04-24 annovar ANNOVAR is an efficient software tool to utilize update-to-date information to functionally annotate genetic variants detected from diverse genomes.
2023-04-20 Standard databases for foldseek foldseek provided databases for AlphafoldDB (Swiss-Prot, Proteome, and others)
2023-04-17 funseq2 A flexible framework to prioritize regulatory mutations from cancer genome sequencing
2023-04-08 graphsite [Not available]
2023-02-22 intogen intogen is collecting data from TCGA, PCAWG, cBioPortal, Hartwig Medical Foundation, ICGC, St.Jude, PedcBioPortal, TARGET, Beat AML, and Literature.
2023-02-22 refdb the data is collecting data from TCGA, PCAWG, cBioPortal, Hartwig Medical Foundation, ICGC, St.Jude, PedcBioPortal, TARGET, Beat AML, and Literature.