Biowulf High Performance Computing at the NIH
SpliceAI: predicting splicing from primary sequence with deep learning

SpliceAI is a deep neural network that accurately predicts splice junctions from an arbitrary pre-mRNA transcript sequence, enabling precise prediction of noncoding genetic variants that cause cryptic splicing.


Important Notes

Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

Allocate an interactive session and run the program. Sample session:

[user@biowulf]$ sinteractive --mem=8g --gres=gpu:v100:1
[user@cn4466 ~]$ module load SpliceAI
[+] Loading python 3.6  ... 
[+] Loading cuDNN 7.0  libraries... 
[+] Loading CUDA Toolkit  9.0.176  ... 
[+] Loading SpliceAI  20190507 
[user@cn4466 ~]$ spliceai -h
Using TensorFlow backend.
usage: spliceai [-h] [-I [I]] [-O [O]] -R R -A A

optional arguments:
  -h, --help  show this help message and exit
  -I [I]      path to the input VCF file, defaults to standard in
  -O [O]      path to the output VCF file, defaults to standard out
  -R R        path to the genome fasta file
  -A A        "grch37" (uses GENCODE canonical annotation file in package),
              "grch38" (uses GENCODE canonical annotation file in package), or
              path to a similarly-constructed custom gene annotation file

Download sample data:
[user@cn4466 ~]$ cp $SPLICEAI_DATA/* . 
Specify a reference sequence:
[user@cn4466 ~]$ ln -s /fdb/igenomes/Homo_sapiens/UCSC/hg19/hg19.fa
Run the spliceai executable on the sample data:
[user@cn4466 ~]$ spliceai -I input.vcf -O output.vcf -R hg19.fa  -A grch37
Using TensorFlow backend.
2019-05-07 10:13:04.308515: I tensorflow/core/platform/] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2019-05-07 10:13:04.539804: I tensorflow/core/common_runtime/gpu/] Found device 0 with properties: 
name: Tesla V100-PCIE-16GB major: 7 minor: 0 memoryClockRate(GHz): 1.38
pciBusID: 0000:13:00.0
totalMemory: 15.75GiB freeMemory: 15.44GiB
2019-05-07 10:13:04.539869: I tensorflow/core/common_runtime/gpu/] Adding visible gpu devices: 0
2019-05-07 10:13:05.066481: I tensorflow/core/common_runtime/gpu/] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-05-07 10:13:05.066541: I tensorflow/core/common_runtime/gpu/]      0 
2019-05-07 10:13:05.066555: I tensorflow/core/common_runtime/gpu/] 0:   N 
2019-05-07 10:13:05.067525: I tensorflow/core/common_runtime/gpu/] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14939 MB memory) -> physical GPU (device: 0, name: Tesla V100-PCIE-16GB, pci bus id: 0000:13:00.0, compute capability: 7.0)
An output file output.vcf will be produced.
[user@cn4466 ~]$ exit
[user@biowulf ~]$

Batch job
Most jobs should be run as batch jobs.

Create a batch input file (e.g. For example:

module load SpliceAI      
spliceai -I input.vcf -O output.vcf -R /fdb/igenomes/Homo_sapiens/UCSC/hg19/hg19.fa  -A grch37

Submit this job using the Slurm sbatch command.

sbatch [--cpus-per-task=#] [--mem=#]