spliceai-wrapper is a wrapper for Illumina SpliceAI that caches results.
Allocate an interactive session and run the program. Sample session:
[user@biowulf]$ sinteractive --gres=gpu:p100:1 -mem=8g [user@cn4466 ~]$ module load spliceai-wrapper [+] Loading spliceai-wrapper 0.1.0 [user@cn4466 ~]$ spliceaiwrapper -h usage: spliceai-wrapper [-h] [--version] {prepare,annotate} ... Caching wrapper for Illumina SpliceAI positional arguments: {prepare,annotate} prepare Construct SQLite database from precomputed data annotate Annotate VCF file with SpliceAI using cache for the scores optional arguments: -h, --help show this help message and exit --version show program's version number and exitDownload sample data:
[user@cn4466 ~]$ cp $SAIWR_DATA/whole_genome_filtered_spliceai_scores.vcf.gz .Import the precomputed scores into a SQLite3 database:
[user@cn4466 ~]$ spliceai-wrapper prepare -h usage: spliceai-wrapper prepare [-h] [--release RELEASE] [--precomputed-db-path PRECOMPUTED_DB_PATH] --precomputed-vcf-path PRECOMPUTED_VCF_PATH optional arguments: -h, --help show this help message and exit --release RELEASE Release to use. --precomputed-db-path PRECOMPUTED_DB_PATH --precomputed-vcf-path PRECOMPUTED_VCF_PATH Path to VCF file for loading precomputed data from [user@cn4466 ~]$ spliceai-wrapper prepare --release GRCh37 \ --precomputed-db-path precomputed.sqlite3 \ --precomputed-vcf-path whole_genome_filtered_spliceai_scores.vcf.gz [I 191011 12:49:37 __main__:148] Running 'prepare' with args = {'action': 'prepare', 'release': ' GRCh37', 'precomputed_db_path': './precomputed.sqlite3', 'precomputed_vcf_path': 'spliceai_wrappe r/whole_genome_filtered_spliceai_scores.vcf.gz'} [I 191011 12:49:37 __main__:153] Opening database file ./precomputed.sqlite3 [I 191011 12:49:37 __main__:158] Executing CREATE TABLE IF NOT EXISTS GRCh37_spliceai_scores ( var_desc TEXT PRIMARY KEY, chromosome VARCHAR(64), position INTEGER, reference TEXT, alternative TEXT, symbol TEXT, strand CHARACTER, var_type CHARACTER, distance INTEGER, delta_score_acceptor_gain FLOAT, delta_score_acceptor_loss FLOAT, delta_score_donor_gain FLOAT, delta_score_donor_loss FLOAT, delta_position_acceptor_gain INTEGER, delta_position_acceptor_loss INTEGER, delta_position_donor_gain INTEGER, delta_position_donor_loss INTEGER ); to create table... [I 191011 12:49:37 __main__:161] Opening VCF for import: spliceai_wrapper/whole_genome_filtered_s pliceai_scores.vcf.gz... ...The latter command takes over 20 min to complete and produces the database file ./precomputed.sqlite3. Alternatively, the already precomputed database file can be used:
[user@cn4466 ~]$ cp $SAIWR_DATA/precomputed.sqlite3 .Now annotate the variants from the database:
[user@cn4466 ~]$ spliceai-wrapper prepare -h usage: spliceai-wrapper annotate [-h] --genes-tsv GENES_TSV [--release RELEASE] [--precomputed-db-path PRECOMPUTED_DB_PATH] [--cache-db-path CACHE_DB_PATH] --input-vcf INPUT_VCF --output-vcf OUTPUT_VCF [--min-score MIN_SCORE] [--head HEAD] --path-reference PATH_REFERENCE optional arguments: -h, --help show this help message and exit --genes-tsv GENES_TSV Path to grch3[78].txt from SpliceAI --release RELEASE Release to use. --precomputed-db-path PRECOMPUTED_DB_PATH --cache-db-path CACHE_DB_PATH Path to SQLite3 file for the cache (to be updated) --input-vcf INPUT_VCF Path to VCF file to annotate --output-vcf OUTPUT_VCF Path to write annotated VCF to --min-score MIN_SCORE Minimal score to consider (report as 0 if smaller). --head HEAD Optional; only consider top N records. --path-reference PATH_REFERENCE Path to reference FASTA file. [user@cn4466 ~]$ cp $SAIWR_DATA/20190804.freebayes.filtered.vcf.gz . [user@cn4466 ~]$ cp $SAIWR_DATA/grch37.txt . [user@cn4466 ~]$ ln -s /fdb/igenomes/Homo_sapiens/Ensembl/GRCh37/Sequence/WholeGenomeFasta/genome.fa [user@cn4466 ~]$ ml spliceai-wrapper spliceai-wrapper annotate \ --input-vcf ./20190804.freebayes.filtered.vcf.gz \ --output-vcf OUTPUT.vcf.gz \ --precomputed-db-path ./precomputed.sqlite3 \ --release GRCh37 \ --path-reference genome.fa \ --genes-tsv ./grch37.txt ... 2th': PosixPath('/home/staff/.cache/spliceai-wrapper/cache.sqlite3'), 'input_vcf': './20190 804.freebayes.filtered.vcf.gz', 'output_vcf': 'OUTPUT.vcf.gz', 'min_score': 0.1, 'head': None, 'p ath_reference': '/fdb/igenomes/Homo_sapiens/Ensembl/GRCh37/Sequence/WholeGenomeFasta/genome.fa'} [I 191015 09:22:29 __main__:282] Opening ./precomputed.sqlite3 (read-only) [I 191015 09:22:29 __main__:283] URL = file:./precomputed.sqlite3?mode=ro [I 191015 09:22:29 __main__:292] Opening /home/staff/.cache/spliceai-wrapper/cache.sqlite3 (c ache; writeable) [I 191015 09:22:29 __main__:297] Executing CREATE TABLE IF NOT EXISTS GRCh37_spliceai_scores ( var_desc TEXT PRIMARY KEY, chromosome VARCHAR(64), position INTEGER, reference TEXT, alternative TEXT, symbol TEXT, strand CHARACTER, var_type CHARACTER, distance INTEGER, delta_score_acceptor_gain FLOAT, delta_score_acceptor_loss FLOAT, delta_score_donor_gain FLOAT, delta_score_donor_loss FLOAT, delta_position_acceptor_gain INTEGER, delta_position_acceptor_loss INTEGER, delta_position_donor_gain INTEGER, delta_position_donor_loss INTEGER ); ... [I 191015 09:22:29 __main__:300] Creating temporary directory... [I 191015 09:22:29 __main__:303] => /tmp/tmpcfre5h6w [I 191015 09:22:29 __main__:307] Splitting ./20190804.freebayes.filtered.vcf.gz [I 191015 09:22:29 __main__:308] into: /tmp/tmpcfre5h6w/cache_hit.vcf [I 191015 09:22:29 __main__:309] and: /tmp/tmpcfre5h6w/cache_nohit.vcf 18857records [00:22, 831.14records/s] [I 191015 09:22:52 __main__:249] Hits: 2448/15024 (16.3%), pre hits 12576/15024 (83.7%), pre low- score 0/12576 (0.0%), cache hits 2448/2448 (100.0%), no gene: 3833, cache misses: 0 [I 191015 09:22:52 __main__:347] No cache misses, no need to run spliceai [I 191015 09:22:52 __main__:359] Converting result with bcftools view -O z -o OUTPUT.vcf.gz /tmp/ tmpcfre5h6w/cache_hit.vcf [I 191015 09:22:53 __main__:412] Done running 'annotate'.The output file OUTPUT.vcf.gz will be produced.
[user@cn4466 ~]$ exit [user@biowulf ~]$
Create a batch input file (e.g. spliceai-wrapper.sh). For example:
#!/bin/bash module load SpliceAI cp $SAIW_DATA/* . spliceai-wrapper prepare --release GRCh37 --precomputed-db-path ./precomputed.sqlite3 \ --precomputed-vcf-path whole_genome_filtered_spliceai_scores.vcf.gz spliceai-wrapper annotate \ --input-vcf ./20190804.freebayes.filtered.vcf.gz \ --output-vcf OUTPUT.vcf.gz \ --precomputed-db-path ./precomputed.sqlite3 \ --release GRCh37 \ --path-reference /fdb/igenomes/Homo_sapiens/Ensembl/GRCh37/Sequence/WholeGenomeFasta/genome.fa \ --genes-tsv ./grch37.txt
Submit this job using the Slurm sbatch command.
sbatch [--cpus-per-task=#] [--mem=#] spliceai-wrapper.sh