Pangolin is a deep-learning based method for predicting splice site strengths. It is available as a command-line tool that can be run on a VCF or CSV file containing variants of interest; Pangolin will predict changes in splice site strength due to each variant, and return a file of the same format. Pangolin's models can also be used with custom sequences.
Allocate an interactive session and run the program.
Sample session (user input in bold):
[user@biowulf ~]$ sinteractive --cpus-per-task=4 --mem=16G --gres=lscratch:10,gpu:1 salloc.exe: Pending job allocation 41538656 salloc.exe: job 41538656 queued and waiting for resources salloc.exe: job 41538656 has been allocated resources salloc.exe: Granted job allocation 41538656 salloc.exe: Waiting for resource configuration salloc.exe: Nodes cn3114 are ready for job srun: error: x11: no local DISPLAY defined, skipping [user@cn3114 ~]$ cd /data/$USER [user@cn3114 user]$ git clone https://github.com/tkzeng/Pangolin.git Cloning into 'Pangolin'... remote: Enumerating objects: 198, done. remote: Counting objects: 100% (26/26), done. remote: Compressing objects: 100% (8/8), done. remote: Total 198 (delta 20), reused 18 (delta 18), pack-reused 172 Receiving objects: 100% (198/198), 190.04 MiB | 19.93 MiB/s, done. Resolving deltas: 100% (54/54), done. [user@cn3114 user]$ cd Pangolin/examples [user@cn3114 examples]$ module load pangolin-splice [+] Loading pangolin-splice 1.0.1 on cn4273 [+] Loading singularity 3.10.5 on cn4273 [user@cn3114 examples]$ cp /fdb/GENCODE/Gencode_human/release_38/GRCh37_mapping/gencode.v38lift37.annotation.gtf.gz . [user@cn3114 examples]$ create_db.py gencode.v38lift37.annotation.gtf.gz Database created: gencode.v38lift37.annotation.db [user@cn3114 examples]$ pangolin brca.vcf /fdb/GENCODE/Gencode_human/release_38/GRCh37_mapping/GRCh37.primary_assembly.genome.fa.gz gencode.v38lift37.annotation.db brca_pangolin Using GPU [user@cn3114 examples]$ tail brca_pangolin.vcf chr17 41276127 . A C . . Pangolin=ENSG00000012048.23_1|-8:0.0|5:-0.06|Warnings: chr17 41276126 . C T . . Pangolin=ENSG00000012048.23_1|-2:0.13|6:-0.07|Warnings: chr17 41276126 . C G . . Pangolin=ENSG00000012048.23_1|-7:0.23|6:-0.05|Warnings: chr17 41276126 . C A . . Pangolin=ENSG00000012048.23_1|-7:0.48|6:-0.16|Warnings: chr17 41276125 . C T . . Pangolin=ENSG00000012048.23_1|-6:0.04|7:-0.05|Warnings: chr17 41276125 . C G . . Pangolin=ENSG00000012048.23_1|-6:0.08|7:-0.03|Warnings: chr17 41276125 . C A . . Pangolin=ENSG00000012048.23_1|-6:0.24|7:-0.26|Warnings: chr17 41276124 . T G . . Pangolin=ENSG00000012048.23_1|-5:0.02|8:-0.05|Warnings: chr17 41276124 . T C . . Pangolin=ENSG00000012048.23_1|4:0.0|8:-0.04|Warnings: chr17 41276124 . T A . . Pangolin=ENSG00000012048.23_1|-5:0.12|8:-0.12|Warnings:
Create a batch input file (e.g. pangolin.sh). For example:
#!/bin/bash set -e module load pangolin-splice cd /data/$USER/Pangolin/examples/ pangolin brca.vcf /fdb/GENCODE/Gencode_human/release_38/GRCh37_mapping/GRCh37.primary_assembly.genome.fa.gz gencode.v38lift37.annotation.db brca_pangolin
Submit this job using the Slurm sbatch command.
sbatch --gres=gpu:1[,lscratch=#] [--cpus-per-task=#] [--mem=#] pangolin.sh
Create a swarmfile (e.g. pangolin.swarm). For example:
pangolin brca.vcf /fdb/GENCODE/Gencode_human/release_38/GRCh37_mapping/GRCh37.primary_assembly.genome.fa.gz gencode.v38lift37.annotation.db brca_pangolin pangolin p53.vcf /fdb/GENCODE/Gencode_human/release_38/GRCh37_mapping/GRCh37.primary_assembly.genome.fa.gz gencode.v38lift37.annotation.db p53_pangolin pangolin kras.vcf /fdb/GENCODE/Gencode_human/release_38/GRCh37_mapping/GRCh37.primary_assembly.genome.fa.gz gencode.v38lift37.annotation.db kras_pangolin
Submit this job using the swarm command.
swarm -f pangolin.swarm [-g #] [-t #] --gres=gpu:1 --module pangolin-splicewhere
| -g # | Number of Gigabytes of memory required for each process (1 line in the swarm command file) |
| -t # | Number of threads/CPUs required for each process (1 line in the swarm command file). |
| --gres=gpu:1 | Allocates a GPU for each subjob |
| --module pangolin-splice | Loads the pangolin-splice module for each subjob in the swarm |