spliceai-wrapper is a wrapper for Illumina SpliceAI that caches results.
Allocate an interactive session and run the program. Sample session:
[user@biowulf]$ sinteractive --gres=gpu:p100:1 -mem=8g
[user@cn4466 ~]$ module load spliceai-wrapper
[+] Loading spliceai-wrapper 0.1.0
[user@cn4466 ~]$ spliceaiwrapper -h
usage: spliceai-wrapper [-h] [--version] {prepare,annotate} ...
Caching wrapper for Illumina SpliceAI
positional arguments:
{prepare,annotate}
prepare Construct SQLite database from precomputed data
annotate Annotate VCF file with SpliceAI using cache for the
scores
optional arguments:
-h, --help show this help message and exit
--version show program's version number and exit
Download sample data:
[user@cn4466 ~]$ cp $SAIWR_DATA/whole_genome_filtered_spliceai_scores.vcf.gz .Import the precomputed scores into a SQLite3 database:
[user@cn4466 ~]$ spliceai-wrapper prepare -h
usage: spliceai-wrapper prepare [-h] [--release RELEASE]
[--precomputed-db-path PRECOMPUTED_DB_PATH]
--precomputed-vcf-path PRECOMPUTED_VCF_PATH
optional arguments:
-h, --help show this help message and exit
--release RELEASE Release to use.
--precomputed-db-path PRECOMPUTED_DB_PATH
--precomputed-vcf-path PRECOMPUTED_VCF_PATH
Path to VCF file for loading precomputed data from
[user@cn4466 ~]$ spliceai-wrapper prepare --release GRCh37 \
--precomputed-db-path precomputed.sqlite3 \
--precomputed-vcf-path whole_genome_filtered_spliceai_scores.vcf.gz
[I 191011 12:49:37 __main__:148] Running 'prepare' with args = {'action': 'prepare', 'release': '
GRCh37', 'precomputed_db_path': './precomputed.sqlite3', 'precomputed_vcf_path': 'spliceai_wrappe
r/whole_genome_filtered_spliceai_scores.vcf.gz'}
[I 191011 12:49:37 __main__:153] Opening database file ./precomputed.sqlite3
[I 191011 12:49:37 __main__:158] Executing
CREATE TABLE IF NOT EXISTS GRCh37_spliceai_scores
(
var_desc TEXT PRIMARY KEY,
chromosome VARCHAR(64),
position INTEGER,
reference TEXT,
alternative TEXT,
symbol TEXT,
strand CHARACTER,
var_type CHARACTER,
distance INTEGER,
delta_score_acceptor_gain FLOAT,
delta_score_acceptor_loss FLOAT,
delta_score_donor_gain FLOAT,
delta_score_donor_loss FLOAT,
delta_position_acceptor_gain INTEGER,
delta_position_acceptor_loss INTEGER,
delta_position_donor_gain INTEGER,
delta_position_donor_loss INTEGER
);
to create table...
[I 191011 12:49:37 __main__:161] Opening VCF for import: spliceai_wrapper/whole_genome_filtered_s
pliceai_scores.vcf.gz...
...
The latter command takes over 20 min to complete and produces the database file ./precomputed.sqlite3. Alternatively, the already precomputed database file can be used:
[user@cn4466 ~]$ cp $SAIWR_DATA/precomputed.sqlite3 .Now annotate the variants from the database:
[user@cn4466 ~]$ spliceai-wrapper prepare -h
usage: spliceai-wrapper annotate [-h] --genes-tsv GENES_TSV
[--release RELEASE]
[--precomputed-db-path PRECOMPUTED_DB_PATH]
[--cache-db-path CACHE_DB_PATH] --input-vcf
INPUT_VCF --output-vcf OUTPUT_VCF
[--min-score MIN_SCORE] [--head HEAD]
--path-reference PATH_REFERENCE
optional arguments:
-h, --help show this help message and exit
--genes-tsv GENES_TSV
Path to grch3[78].txt from SpliceAI
--release RELEASE Release to use.
--precomputed-db-path PRECOMPUTED_DB_PATH
--cache-db-path CACHE_DB_PATH
Path to SQLite3 file for the cache (to be updated)
--input-vcf INPUT_VCF
Path to VCF file to annotate
--output-vcf OUTPUT_VCF
Path to write annotated VCF to
--min-score MIN_SCORE
Minimal score to consider (report as 0 if smaller).
--head HEAD Optional; only consider top N records.
--path-reference PATH_REFERENCE
Path to reference FASTA file.
[user@cn4466 ~]$ cp $SAIWR_DATA/20190804.freebayes.filtered.vcf.gz .
[user@cn4466 ~]$ cp $SAIWR_DATA/grch37.txt .
[user@cn4466 ~]$ ln -s /fdb/igenomes/Homo_sapiens/Ensembl/GRCh37/Sequence/WholeGenomeFasta/genome.fa
[user@cn4466 ~]$ ml spliceai-wrapper
spliceai-wrapper annotate \
--input-vcf ./20190804.freebayes.filtered.vcf.gz \
--output-vcf OUTPUT.vcf.gz \
--precomputed-db-path ./precomputed.sqlite3 \
--release GRCh37 \
--path-reference genome.fa \
--genes-tsv ./grch37.txt
...
2th': PosixPath('/home/staff/.cache/spliceai-wrapper/cache.sqlite3'), 'input_vcf': './20190
804.freebayes.filtered.vcf.gz', 'output_vcf': 'OUTPUT.vcf.gz', 'min_score': 0.1, 'head': None, 'p
ath_reference': '/fdb/igenomes/Homo_sapiens/Ensembl/GRCh37/Sequence/WholeGenomeFasta/genome.fa'}
[I 191015 09:22:29 __main__:282] Opening ./precomputed.sqlite3 (read-only)
[I 191015 09:22:29 __main__:283] URL = file:./precomputed.sqlite3?mode=ro
[I 191015 09:22:29 __main__:292] Opening /home/staff/.cache/spliceai-wrapper/cache.sqlite3 (c
ache; writeable)
[I 191015 09:22:29 __main__:297] Executing
CREATE TABLE IF NOT EXISTS GRCh37_spliceai_scores
(
var_desc TEXT PRIMARY KEY,
chromosome VARCHAR(64),
position INTEGER,
reference TEXT,
alternative TEXT,
symbol TEXT,
strand CHARACTER,
var_type CHARACTER,
distance INTEGER,
delta_score_acceptor_gain FLOAT,
delta_score_acceptor_loss FLOAT,
delta_score_donor_gain FLOAT,
delta_score_donor_loss FLOAT,
delta_position_acceptor_gain INTEGER,
delta_position_acceptor_loss INTEGER,
delta_position_donor_gain INTEGER,
delta_position_donor_loss INTEGER
);
...
[I 191015 09:22:29 __main__:300] Creating temporary directory...
[I 191015 09:22:29 __main__:303] => /tmp/tmpcfre5h6w
[I 191015 09:22:29 __main__:307] Splitting ./20190804.freebayes.filtered.vcf.gz
[I 191015 09:22:29 __main__:308] into: /tmp/tmpcfre5h6w/cache_hit.vcf
[I 191015 09:22:29 __main__:309] and: /tmp/tmpcfre5h6w/cache_nohit.vcf
18857records [00:22, 831.14records/s]
[I 191015 09:22:52 __main__:249] Hits: 2448/15024 (16.3%), pre hits 12576/15024 (83.7%), pre low-
score 0/12576 (0.0%), cache hits 2448/2448 (100.0%), no gene: 3833, cache misses: 0
[I 191015 09:22:52 __main__:347] No cache misses, no need to run spliceai
[I 191015 09:22:52 __main__:359] Converting result with bcftools view -O z -o OUTPUT.vcf.gz /tmp/
tmpcfre5h6w/cache_hit.vcf
[I 191015 09:22:53 __main__:412] Done running 'annotate'.
The output file OUTPUT.vcf.gz will be produced.
[user@cn4466 ~]$ exit [user@biowulf ~]$
Create a batch input file (e.g. spliceai-wrapper.sh). For example:
#!/bin/bash
module load SpliceAI
cp $SAIW_DATA/* .
spliceai-wrapper prepare --release GRCh37 --precomputed-db-path ./precomputed.sqlite3 \
--precomputed-vcf-path whole_genome_filtered_spliceai_scores.vcf.gz
spliceai-wrapper annotate \
--input-vcf ./20190804.freebayes.filtered.vcf.gz \
--output-vcf OUTPUT.vcf.gz \
--precomputed-db-path ./precomputed.sqlite3 \
--release GRCh37 \
--path-reference /fdb/igenomes/Homo_sapiens/Ensembl/GRCh37/Sequence/WholeGenomeFasta/genome.fa \
--genes-tsv ./grch37.txt
Submit this job using the Slurm sbatch command.
sbatch [--cpus-per-task=#] [--mem=#] spliceai-wrapper.sh