FamDB is a modular HDF5-based export format and query tool developed for offline access to the Dfam database of transposable element and repetitive DNA families. FamDB stores family sequence models (profile HMMs, and consensus sequences), metadata including:
- Family names, aliases, description
- Classification
- Taxa
- Citations and attribution.
In addition, FamDB stores a subset of the NCBI Taxonomy relevant to the family taxa represented in the file, facilitating quick extraction of species/clade-specific family libraries. The query tool provides options for exporting search results in a variety of common formats including EMBL, FASTA, and HMMER HMM format. At this time FamDB is intended for use as a "read-only" data store by tools such as RepeatMasker as an alternative to unindexed EMBL or HMM files.
Allocate an interactive session and run the program.
Sample session (user input in bold):
[user@biowulf]$ sinteractive salloc.exe: Pending job allocation 46116226 salloc.exe: job 46116226 queued and waiting for resources salloc.exe: job 46116226 has been allocated resources salloc.exe: Granted job allocation 46116226 salloc.exe: Waiting for resource configuration salloc.exe: Nodes cn3144 are ready for job [user@cn3144 ~]$ module load famdb [user@cn3144 ~]$ famdb.py -i /fdb/dfam/current --format fasta_name --include-class-in-name -ad "danio rerio" [user@cn3144 ~]$ exit salloc.exe: Relinquishing job allocation 46116226 [user@biowulf ~]$