Biowulf High Performance Computing at the NIH
OpenCRAVAT: a platform for the annotation of human genetic variation

OpenCRAVAT is a new open source, scalable decision support system for variant and gene prioritization. It includses a modular resource catalog to maximize community and developer involvement, and as a result the catalog is being actively developed and growing every month. Resources made available via the store are well-suited for analysis of cancer, as well as Mendelian and complex diseases.

References:

Documentation
Important Notes

Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

Allocate an interactive session and run the program. Sample session:

[user@biowulf]$ sinteractive  --mem=16g  --gres=gpu:p100:1,lscratch:10 -c4
[user@cn2389 ~]$ module load OpenCRAVAT 
[+] Loading annovar 2019-10-24 on cn2389
[+] Loading OpenCRAVAT 2.2.5  ...
In order to annotate and interpret variants, OpenCRAVAT (OC) makes use of a database comprising chanks of data called "modules" (not to be confused with the Biowulf modules). The OC modules are to be installed by each user in the user's private folder pointed to by the user-defined environment variable OC_MODULES. For example:
[user@cn2389 ~]$ mkdir my_modules 
[user@cn2389 ~]$ export OC_MODULES=$PWD/my_modules
[user@cn2389 ~]$ export OC_LOGS=$PWD
Initially, the modules folder will be empty. In order to install base modules into this folder, run the command:
[user@cn2389 ~]$ oc module install-base
Alternatively, you can copy a set of already preinstalled modules, inscluding the base modules, from the default modules folder /fdb/OpenCRAVAT/modules_2.2.5 to you private folder:
[user@cn2389 ~]$ cp -r $OC_DEFAULT_MODULES/* $OC_MODULES
A user can add other desired module(s) to the OC_MODULES folder by using the commands listed below:
- to search for potentially available modules by type, e.g. annotator, converter etc.:
[user@cn2389 ~]$ oc module ls -a -t annotator 
[user@cn2389 ~]$ oc module ls -a -t converter 
[user@cn2389 ~]$ oc module ls -a -t mapper    
[user@cn2389 ~]$ oc module ls -a -t reporter  
[user@cn2389 ~]$ oc module ls -a -t webviewerwidget 
- to install new module with a given name (provided by the first column of the output from a module search command):
[user@cn2389 ~]$ oc module install <module_name> 
For example:
[user@cn2389 ~]$ oc module install trinity 
[user@cn2389 ~]$ oc module install thousandgenomes  
[user@cn2389 ~]$ oc module install uniprot 
[user@cn2389 ~]$ oc module install textreporter  
[user@cn2389 ~]$ oc module install hg38          
Now you are ready to run OpenCRAVAt on test data. First, create the a input file in your current working directory by using the commands:
[user@cn2389 ~]$ oc new example-input . 
The latter command will create a file "example_input" in your current directory:
[user@cn2389 ~]$ cat example_input | wc -l 
373
[user@cn2389 ~]$  head -n 20 example_input
chr10   121593817       -       A       T       s0
chr10   2987654 +       T       A       s1
chr10   43077259        +       A       T       s2
chr10   8055656 +       A       T       s3
chr10   87864470        +       A       T       s4
chr10   87864486        +       A       -       s0
chr10   87864486        +       AA      -       s1
chr10   87894027        +       -       CG      s2
chr10   87894027        +       -       CT      s3
chr1    100719861       +       A       T       s4
chr1    10100   +       C       T       s0
chr1    110340653       +       CGGCTTT -       s1
chr11   108227625       +       A       T       s2
chr11   113789394       +       G       A       s3
chr1    111762684       +       G       A       s4
chr11   119206418       +       A       T       s0
chr1    114713881       +       TGGTC   -       s1
chr1    114713881       +       TGGTCTC -       s2
chr1    114716160       -       A       T       s3
Run OpenCRAVAT on the test input file:
[user@cn2389 ~]$ oc run ./example_input -l hg38
Input file(s): ./example_input
finished in 1.504s
Genome assembly: hg38
Running converter...
        Converter (converter)           finished in 0.658s
Running gene mapper...                  finished in 75.670s
Running annotators...
        go: started at Mon Jun  8 20:43:12 2020
        biogrid: started at Mon Jun  8 20:43:12 2020
        cgc: started at Mon Jun  8 20:43:12 2020
        segway: started at Mon Jun  8 20:43:12 2020
        brca1_func_assay: started at Mon Jun  8 20:43:12 2020
        cosmic_gene: started at Mon Jun  8 20:43:12 2020
        gnomad: started at Mon Jun  8 20:43:12 2020
        target: started at Mon Jun  8 20:43:12 2020
        revel: started at Mon Jun  8 20:43:12 2020
        brca1_func_assay: finished at Mon Jun  8 20:43:12 2020
        brca1_func_assay: runtime 0.044s
        mutpred1: started at Mon Jun  8 20:43:12 2020
        chasmplus_BLCA: started at Mon Jun  8 20:43:12 2020
        cgc: finished at Mon Jun  8 20:43:12 2020
        cgc: runtime 0.162s
        denovo: started at Mon Jun  8 20:43:12 2020
        vest: started at Mon Jun  8 20:43:12 2020
        dbsnp: started at Mon Jun  8 20:43:12 2020
        phylop: started at Mon Jun  8 20:43:12 2020
        chasmplus_BLCA_mski: started at Mon Jun  8 20:43:12 2020
        clinvar: started at Mon Jun  8 20:43:12 2020
        chasmplus_ACC_mski: started at Mon Jun  8 20:43:12 2020
        intact: started at Mon Jun  8 20:43:12 2020
        cgl: started at Mon Jun  8 20:43:12 2020
        polyphen2: started at Mon Jun  8 20:43:12 2020
        mupit: started at Mon Jun  8 20:43:12 2020
        cgl: finished at Mon Jun  8 20:43:12 2020
        cgl: runtime 0.049s
        pubmed: started at Mon Jun  8 20:43:13 2020
        cadd_exome: started at Mon Jun  8 20:43:13 2020
        clingen: started at Mon Jun  8 20:43:13 2020
        sift: started at Mon Jun  8 20:43:13 2020
        clingen: finished at Mon Jun  8 20:43:13 2020
        clingen: runtime 0.031s
        thousandgenomes: started at Mon Jun  8 20:43:13 2020
        chasmplus: started at Mon Jun  8 20:43:13 2020
        uniprot: started at Mon Jun  8 20:43:13 2020
        chasmplus_ACC: started at Mon Jun  8 20:43:13 2020
        pharmgkb: started at Mon Jun  8 20:43:13 2020
        gnomad_gene: started at Mon Jun  8 20:43:13 2020
        pharmgkb: finished at Mon Jun  8 20:43:13 2020
        pharmgkb: runtime 0.141s
        target: finished at Mon Jun  8 20:43:19 2020
        target: runtime 6.878s
        cosmic_gene: finished at Mon Jun  8 20:43:20 2020
        cosmic_gene: runtime 7.607s
        biogrid: finished at Mon Jun  8 20:43:20 2020
        biogrid: runtime 8.093s
        intact: finished at Mon Jun  8 20:43:20 2020
        intact: runtime 8.101s
        mupit: finished at Mon Jun  8 20:43:21 2020
        mupit: runtime 8.347s
        pubmed: finished at Mon Jun  8 20:43:21 2020
        pubmed: runtime 8.278s
        uniprot: finished at Mon Jun  8 20:43:21 2020
        uniprot: runtime 8.180s
        gnomad_gene: finished at Mon Jun  8 20:43:21 2020
        gnomad_gene: runtime 8.355s
        vest: finished at Mon Jun  8 20:43:22 2020
        vest: runtime 10.262s
        revel: finished at Mon Jun  8 20:43:23 2020
        revel: runtime 10.540s
        go: finished at Mon Jun  8 20:43:23 2020
        go: runtime 10.802s
        denovo: finished at Mon Jun  8 20:43:23 2020
        denovo: runtime 11.068s
        cadd_exome: finished at Mon Jun  8 20:43:23 2020
        cadd_exome: runtime 10.767s
        clinvar: finished at Mon Jun  8 20:43:24 2020
        clinvar: runtime 11.310s
        chasmplus_BLCA_mski: finished at Mon Jun  8 20:43:24 2020
        chasmplus_BLCA_mski: runtime 12.043s
        chasmplus_BLCA: finished at Mon Jun  8 20:43:24 2020
        chasmplus_BLCA: runtime 12.113s
        thousandgenomes: finished at Mon Jun  8 20:43:25 2020
        thousandgenomes: runtime 11.952s
        gnomad: finished at Mon Jun  8 20:43:25 2020
        gnomad: runtime 12.577s
        chasmplus: finished at Mon Jun  8 20:43:25 2020
        chasmplus: runtime 12.167s
        polyphen2: finished at Mon Jun  8 20:43:25 2020
        polyphen2: runtime 12.559s
        chasmplus_ACC_mski: finished at Mon Jun  8 20:43:25 2020
        chasmplus_ACC_mski: runtime 12.740s
        chasmplus_ACC: finished at Mon Jun  8 20:43:25 2020
        chasmplus_ACC: runtime 12.273s
        mutpred1: finished at Mon Jun  8 20:43:26 2020
        mutpred1: runtime 13.414s
        dbsnp: finished at Mon Jun  8 20:43:26 2020
        dbsnp: runtime 13.889s
        phylop: finished at Mon Jun  8 20:43:27 2020
        phylop: runtime 14.582s
        sift: finished at Mon Jun  8 20:43:27 2020
        sift: runtime 14.721s
        segway: finished at Mon Jun  8 20:43:33 2020
        segway: runtime 20.510s
        annotator(s) finished in 22.017s
Running aggregator...
        Variants                        finished in 0.326s
        Genes                           finished in 0.379s
        Samples                         finished in 0.076s
        Tags                            finished in 0.295s
Running postaggregators...
        Tag Sampler (tagsampler)        finished in 0.137s
        Variant Metadata (varmeta)      finished in 0.001s
        VCF Info (vcfinfo)              finished in 0.000s
Running reporter...
        Excel Reporter (excelreporter)              chasmplus: getting gene summary data
            chasmplus: finished getting gene summary data in 0.003s
            chasmplus_ACC: getting gene summary data
            chasmplus_ACC: finished getting gene summary data in 0.003s
            chasmplus_ACC_mski: getting gene summary data
            chasmplus_ACC_mski: finished getting gene summary data in 0.003s
            chasmplus_BLCA: getting gene summary data
            chasmplus_BLCA: finished getting gene summary data in 0.003s
            chasmplus_BLCA_mski: getting gene summary data
            chasmplus_BLCA_mski: finished getting gene summary data in 0.003s
            mupit: getting gene summary data
            mupit: finished getting gene summary data in 0.001s
            vest: getting gene summary data
            vest: finished getting gene summary data in 0.004s
finished in 2.650s
Finished normally. Runtime: 103.555s
Once the job is finished, the following files wil be created:

example_input.log
example_input.xlsx
example_input.sqlite
example_input.err
In particular, file example_input.sqlite is the sqlite database with the results. This sqlite database can be opened in the OpenCRAVAT web viewer as follows:
[user@cn2389 ~]$ oc gui example_input.sqlite
OpenCRAVAT is served at 0.0.0.0:8060
(To quit: Press Ctrl-C or Ctrl-Break if run on a Terminal or Windows, or click "Cancel" and then "Quit" if run through OpenCRAVAT app on Mac OS)
On your local system, open a new window and type:
ssh -t -L 8060:localhost:8060 biowulf.nih.gov "ssh -L 8060:localhost:8060 cn2389"
Navigate a browser on your local system to the URL: localhost:8060.



Exit an interactive session:
[user@cn2389 ~]$ exit
salloc.exe: Relinquishing job allocation 46116226
[user@biowulf ~]$