OpenCRAVAT: a platform for the annotation of human genetic variation

OpenCRAVAT is a new open source, scalable decision support system for variant and gene prioritization. It includses a modular resource catalog to maximize community and developer involvement, and as a result the catalog is being actively developed and growing every month. Resources made available via the store are well-suited for analysis of cancer, as well as Mendelian and complex diseases.

OB

References:

Documentation
Important Notes

Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

Allocate an interactive session and run the program. Sample session:

[user@biowulf]$ sinteractive  --mem=16g  --gres=gpu:p100:1,lscratch:10 -c4 --tunnel
[user@cn2389 ~]$ module load OpenCRAVAT 
[+] Loading annovar 2020-06-08 on cn2389
[+] Loading OpenCRAVAT 2.4.2  ...
In order to annotate and interpret variants, OpenCRAVAT (OC) makes use of a database comprising chanks of data called "modules" (not to be confused with the Biowulf modules). The currenly installed modules OC modules are available in the folder: $OC_MODULES.

To get started with OpenCRAVAT, create an example input file in your current working directory by using the commands:
[user@cn2389 ~]$ cd /data/$USER
[user@cn2389 ~]$ mkdir OpenCRAVAT && cd OpenCRAVAT
[user@cn2389 ~]$ oc new example-input . 
The latter command will create a file "example_input" in your current directory:
[user@cn2389 ~]$ cat example_input | wc -l 
373
[user@cn2389 ~]$  head -n 20 example_input
chr10   121593817       -       A       T       s0
chr10   2987654 +       T       A       s1
chr10   43077259        +       A       T       s2
chr10   8055656 +       A       T       s3
chr10   87864470        +       A       T       s4
chr10   87864486        +       A       -       s0
chr10   87864486        +       AA      -       s1
chr10   87894027        +       -       CG      s2
chr10   87894027        +       -       CT      s3
chr1    100719861       +       A       T       s4
chr1    10100   +       C       T       s0
chr1    110340653       +       CGGCTTT -       s1
chr11   108227625       +       A       T       s2
chr11   113789394       +       G       A       s3
chr1    111762684       +       G       A       s4
chr11   119206418       +       A       T       s0
chr1    114713881       +       TGGTC   -       s1
chr1    114713881       +       TGGTCTC -       s2
chr1    114716160       -       A       T       s3
chr11   1584916 +       -       GCC     s4
Run OpenCRAVAT on the test input file:
[user@cn2389 ~]$ oc run ./example_input -l hg38 --mp 1
Input file(s): /vf/users/denisovga/OpenCRAVAT/example_input
Genome assembly: hg38
Running converter...
        Converter (converter)           finished in 1.329s
Running gene mapper...                  finished in 2.192s
Running annotators...
        annotator(s) finished in 1.028s
Running aggregator...
        Variants                        finished in 0.197s
        Genes                           finished in 0.150s
        Samples                         finished in 0.145s
        Tags                            finished in 0.276s
Indexing
        variant base__chrom     finished in 0.061s
        variant base__coding    finished in 0.011s
        variant base__so        finished in 0.011s
Running postaggregators...
        Tag Sampler (tagsampler)        finished in 0.138s
Finished normally. Runtime: 5.968s
Once the job is finished, the following files wil be created:

example_input.log
example_input.sqlite
example_input.err
In particular, file example_input.sqlite is the sqlite database with the results.

This sqlite database can be accesses through the OpenCRAVAT gui as follows:

[user@cn2389 ~]$ wget https://github.com/KarchinLab/open-cravat/archive/refs/tags/2.4.2.tar.gz
[user@cn2389 ~]$ export PYTHONPATH==open-cravat-2.4.2:$PYTHONPATH
[user@cn2389 ~]$ python-oc open-cravat-2.4.2/cravat/oc.py gui --port $PORT1 example_input.sqlite

   ____                   __________  ___ _    _____  ______
  / __ \____  ___  ____  / ____/ __ \/   | |  / /   |/_  __/
 / / / / __ \/ _ \/ __ \/ /   / /_/ / /| | | / / /| | / /
/ /_/ / /_/ /  __/ / / / /___/ _, _/ ___ | |/ / ___ |/ /
\____/ .___/\___/_/ /_/\____/_/ |_/_/  |_|___/_/  |_/_/
    /_/

...
where $PORT1 is the tunneling port number you've got after allocating the interactive session. Store this port number and an id of the compute node you have been using, in this example node_id=cn2389

On your local system, open a new terminal window and type:
ssh -t -L $PORT:localhost:$PORT1 biowulf.nih.gov "ssh -L $PORT1:localhost:$PORT1 $node_id"
where $PORT1 and $node_id should be replaced by the actual values you stored.

On your local system, navigate a browser to the URL: localhost:$PORT1.


etc.

Exit an interactive session:
[user@cn2389 ~]$ exit
salloc.exe: Relinquishing job allocation 46116226
[user@biowulf ~]$