OpenCRAVAT: a platform for the annotation of human genetic variation
OpenCRAVAT is a new open source, scalable decision support system for variant and gene prioritization. It includses a modular resource catalog to maximize community and developer involvement, and as a result the catalog is being actively developed and growing every month. Resources made available via the store are well-suited for analysis of cancer, as well as Mendelian and complex diseases.
References:
- Kymberleigh A Pagel, Rick Kim, Kyle Moad, Ben Busby, Lily Zheng, Matthew
Hynes-Grace, Collin Tokheim, Michael Ryan, Rachel Karchin
OpenCRAVAT, an open source collaborative platform for the annotation of human genetic variation, bioRxiv, 2019 ,
doi: https://doi.org/10.1101/794297.
Documentation
- OpenCRAVAT quick start guide
- OpenCRAVAT Github wiki page
- OpenCRAVAT Home page
- Instructions for the GUI usage
Important Notes
- Module Name: OpenCRAVAT (see the modules page for more information)
- Unusual environment variables set
- OC_HOME OpenCRAVAT installation directory
- OC_BIN OpenCRAVAT executable directory
- OC_SRC OpenCRAVAT source code directory
Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.
Allocate an interactive session and run the program. Sample session:
[user@biowulf]$ sinteractive --mem=16g --gres=gpu:p100:1,lscratch:10 -c4 [user@cn2389 ~]$ module load OpenCRAVAT [+] Loading annovar 2019-10-24 on cn2389 [+] Loading OpenCRAVAT 2.2.5 ...In order to annotate and interpret variants, OpenCRAVAT (OC) makes use of a database comprising chanks of data called "modules" (not to be confused with the Biowulf modules). The OC modules are to be installed by each user in the user's private folder pointed to by the user-defined environment variable OC_MODULES. For example:
[user@cn2389 ~]$ mkdir my_modules [user@cn2389 ~]$ export OC_MODULES=$PWD/my_modules [user@cn2389 ~]$ export OC_LOGS=$PWDInitially, the modules folder will be empty. In order to install base modules into this folder, run the command:
[user@cn2389 ~]$ oc module install-baseAlternatively, you can copy a set of already preinstalled modules, inscluding the base modules, from the default modules folder /fdb/OpenCRAVAT/modules_2.2.5 to you private folder:
[user@cn2389 ~]$ cp -r $OC_DEFAULT_MODULES/* $OC_MODULESA user can add other desired module(s) to the OC_MODULES folder by using the commands listed below:
- to search for potentially available modules by type, e.g. annotator, converter etc.:
[user@cn2389 ~]$ oc module ls -a -t annotator [user@cn2389 ~]$ oc module ls -a -t converter [user@cn2389 ~]$ oc module ls -a -t mapper [user@cn2389 ~]$ oc module ls -a -t reporter [user@cn2389 ~]$ oc module ls -a -t webviewerwidget- to install new module with a given name (provided by the first column of the output from a module search command):
[user@cn2389 ~]$ oc module install <module_name>For example:
[user@cn2389 ~]$ oc module install trinity [user@cn2389 ~]$ oc module install thousandgenomes [user@cn2389 ~]$ oc module install uniprot [user@cn2389 ~]$ oc module install textreporter [user@cn2389 ~]$ oc module install hg38Now you are ready to run OpenCRAVAt on test data. First, create the a input file in your current working directory by using the commands:
[user@cn2389 ~]$ oc new example-input .The latter command will create a file "example_input" in your current directory:
[user@cn2389 ~]$ cat example_input | wc -l 373 [user@cn2389 ~]$ head -n 20 example_input chr10 121593817 - A T s0 chr10 2987654 + T A s1 chr10 43077259 + A T s2 chr10 8055656 + A T s3 chr10 87864470 + A T s4 chr10 87864486 + A - s0 chr10 87864486 + AA - s1 chr10 87894027 + - CG s2 chr10 87894027 + - CT s3 chr1 100719861 + A T s4 chr1 10100 + C T s0 chr1 110340653 + CGGCTTT - s1 chr11 108227625 + A T s2 chr11 113789394 + G A s3 chr1 111762684 + G A s4 chr11 119206418 + A T s0 chr1 114713881 + TGGTC - s1 chr1 114713881 + TGGTCTC - s2 chr1 114716160 - A T s3Run OpenCRAVAT on the test input file:
[user@cn2389 ~]$ oc run ./example_input -l hg38 Input file(s): ./example_input finished in 1.504s Genome assembly: hg38 Running converter... Converter (converter) finished in 0.658s Running gene mapper... finished in 75.670s Running annotators... go: started at Mon Jun 8 20:43:12 2020 biogrid: started at Mon Jun 8 20:43:12 2020 cgc: started at Mon Jun 8 20:43:12 2020 segway: started at Mon Jun 8 20:43:12 2020 brca1_func_assay: started at Mon Jun 8 20:43:12 2020 cosmic_gene: started at Mon Jun 8 20:43:12 2020 gnomad: started at Mon Jun 8 20:43:12 2020 target: started at Mon Jun 8 20:43:12 2020 revel: started at Mon Jun 8 20:43:12 2020 brca1_func_assay: finished at Mon Jun 8 20:43:12 2020 brca1_func_assay: runtime 0.044s mutpred1: started at Mon Jun 8 20:43:12 2020 chasmplus_BLCA: started at Mon Jun 8 20:43:12 2020 cgc: finished at Mon Jun 8 20:43:12 2020 cgc: runtime 0.162s denovo: started at Mon Jun 8 20:43:12 2020 vest: started at Mon Jun 8 20:43:12 2020 dbsnp: started at Mon Jun 8 20:43:12 2020 phylop: started at Mon Jun 8 20:43:12 2020 chasmplus_BLCA_mski: started at Mon Jun 8 20:43:12 2020 clinvar: started at Mon Jun 8 20:43:12 2020 chasmplus_ACC_mski: started at Mon Jun 8 20:43:12 2020 intact: started at Mon Jun 8 20:43:12 2020 cgl: started at Mon Jun 8 20:43:12 2020 polyphen2: started at Mon Jun 8 20:43:12 2020 mupit: started at Mon Jun 8 20:43:12 2020 cgl: finished at Mon Jun 8 20:43:12 2020 cgl: runtime 0.049s pubmed: started at Mon Jun 8 20:43:13 2020 cadd_exome: started at Mon Jun 8 20:43:13 2020 clingen: started at Mon Jun 8 20:43:13 2020 sift: started at Mon Jun 8 20:43:13 2020 clingen: finished at Mon Jun 8 20:43:13 2020 clingen: runtime 0.031s thousandgenomes: started at Mon Jun 8 20:43:13 2020 chasmplus: started at Mon Jun 8 20:43:13 2020 uniprot: started at Mon Jun 8 20:43:13 2020 chasmplus_ACC: started at Mon Jun 8 20:43:13 2020 pharmgkb: started at Mon Jun 8 20:43:13 2020 gnomad_gene: started at Mon Jun 8 20:43:13 2020 pharmgkb: finished at Mon Jun 8 20:43:13 2020 pharmgkb: runtime 0.141s target: finished at Mon Jun 8 20:43:19 2020 target: runtime 6.878s cosmic_gene: finished at Mon Jun 8 20:43:20 2020 cosmic_gene: runtime 7.607s biogrid: finished at Mon Jun 8 20:43:20 2020 biogrid: runtime 8.093s intact: finished at Mon Jun 8 20:43:20 2020 intact: runtime 8.101s mupit: finished at Mon Jun 8 20:43:21 2020 mupit: runtime 8.347s pubmed: finished at Mon Jun 8 20:43:21 2020 pubmed: runtime 8.278s uniprot: finished at Mon Jun 8 20:43:21 2020 uniprot: runtime 8.180s gnomad_gene: finished at Mon Jun 8 20:43:21 2020 gnomad_gene: runtime 8.355s vest: finished at Mon Jun 8 20:43:22 2020 vest: runtime 10.262s revel: finished at Mon Jun 8 20:43:23 2020 revel: runtime 10.540s go: finished at Mon Jun 8 20:43:23 2020 go: runtime 10.802s denovo: finished at Mon Jun 8 20:43:23 2020 denovo: runtime 11.068s cadd_exome: finished at Mon Jun 8 20:43:23 2020 cadd_exome: runtime 10.767s clinvar: finished at Mon Jun 8 20:43:24 2020 clinvar: runtime 11.310s chasmplus_BLCA_mski: finished at Mon Jun 8 20:43:24 2020 chasmplus_BLCA_mski: runtime 12.043s chasmplus_BLCA: finished at Mon Jun 8 20:43:24 2020 chasmplus_BLCA: runtime 12.113s thousandgenomes: finished at Mon Jun 8 20:43:25 2020 thousandgenomes: runtime 11.952s gnomad: finished at Mon Jun 8 20:43:25 2020 gnomad: runtime 12.577s chasmplus: finished at Mon Jun 8 20:43:25 2020 chasmplus: runtime 12.167s polyphen2: finished at Mon Jun 8 20:43:25 2020 polyphen2: runtime 12.559s chasmplus_ACC_mski: finished at Mon Jun 8 20:43:25 2020 chasmplus_ACC_mski: runtime 12.740s chasmplus_ACC: finished at Mon Jun 8 20:43:25 2020 chasmplus_ACC: runtime 12.273s mutpred1: finished at Mon Jun 8 20:43:26 2020 mutpred1: runtime 13.414s dbsnp: finished at Mon Jun 8 20:43:26 2020 dbsnp: runtime 13.889s phylop: finished at Mon Jun 8 20:43:27 2020 phylop: runtime 14.582s sift: finished at Mon Jun 8 20:43:27 2020 sift: runtime 14.721s segway: finished at Mon Jun 8 20:43:33 2020 segway: runtime 20.510s annotator(s) finished in 22.017s Running aggregator... Variants finished in 0.326s Genes finished in 0.379s Samples finished in 0.076s Tags finished in 0.295s Running postaggregators... Tag Sampler (tagsampler) finished in 0.137s Variant Metadata (varmeta) finished in 0.001s VCF Info (vcfinfo) finished in 0.000s Running reporter... Excel Reporter (excelreporter) chasmplus: getting gene summary data chasmplus: finished getting gene summary data in 0.003s chasmplus_ACC: getting gene summary data chasmplus_ACC: finished getting gene summary data in 0.003s chasmplus_ACC_mski: getting gene summary data chasmplus_ACC_mski: finished getting gene summary data in 0.003s chasmplus_BLCA: getting gene summary data chasmplus_BLCA: finished getting gene summary data in 0.003s chasmplus_BLCA_mski: getting gene summary data chasmplus_BLCA_mski: finished getting gene summary data in 0.003s mupit: getting gene summary data mupit: finished getting gene summary data in 0.001s vest: getting gene summary data vest: finished getting gene summary data in 0.004s finished in 2.650s Finished normally. Runtime: 103.555sOnce the job is finished, the following files wil be created:
example_input.logIn particular, file example_input.sqlite is the sqlite database with the results. This sqlite database can be opened in the OpenCRAVAT web viewer as follows:
example_input.xlsx
example_input.sqlite
example_input.err
[user@cn2389 ~]$ oc gui example_input.sqlite OpenCRAVAT is served at 0.0.0.0:8060 (To quit: Press Ctrl-C or Ctrl-Break if run on a Terminal or Windows, or click "Cancel" and then "Quit" if run through OpenCRAVAT app on Mac OS)On your local system, open a new window and type:
ssh -t -L 8060:localhost:8060 biowulf.nih.gov "ssh -L 8060:localhost:8060 cn2389"Navigate a browser on your local system to the URL: localhost:8060.

Exit an interactive session:
[user@cn2389 ~]$ exit salloc.exe: Relinquishing job allocation 46116226 [user@biowulf ~]$