DeepSEA is a deep learning-based algorithmic framework for predicting the chromatin effects of sequence alterations with single nucleotide sensitivity. DeepSEA can accurately predict the epigenetic state of a sequence, including transcription factors binding, DNase I sensitivities and histone marks in multiple cell types, and further utilize this capability to predict the chromatin effects of sequence variants and prioritize regulatory variants.
Allocate an interactive session and run the program.
Sample session (user input in bold):
[user@biowulf ~]$ sinteractive --constraint=x2650 --cpus-per-task=32 --mem=10g --ntasks=1 --exclusive --gres=lscratch:10
salloc.exe: Pending job allocation 47562568
salloc.exe: job 47562568 queued and waiting for resources
salloc.exe: job 47562568 has been allocated resources
salloc.exe: Granted job allocation 47562568
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn0236 are ready for job
srun: error: x11: no local DISPLAY defined, skipping
[user@cn0236 ~]$ export TMPDIR=/lscratch/${SLURM_JOB_ID}
[user@cn0236 ~]$ mkdir /lscratch/${SLURM_JOB_ID}/test
[user@cn0236 ~]$ cd /lscratch/${SLURM_JOB_ID}/test
[user@cn0236 test]$ cp /fdb/deepsea/0.94c/example* .
[user@cn0236 test]$ module load deepsea
[+] Loading deepsea 0.94c on cn0236
[+] Loading singularity 3.5.2 on cn0236
[user@cn0236 test]$ rundeepsea.py example.vcf out1
Successfully copied input to working directory /tmp/tmpIPQUab
Loading required package: BSgenome.Hsapiens.UCSC.hg19
Loading required package: BSgenome
Loading required package: methods
Loading required package: BiocGenerics
Loading required package: parallel
Attaching package: 'BiocGenerics'
The following objects are masked from 'package:parallel':
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB
The following objects are masked from 'package:stats':
IQR, mad, xtabs
The following objects are masked from 'package:base':
Filter, Find, Map, Position, Reduce, anyDuplicated, append,
as.data.frame, as.vector, cbind, colnames, do.call, duplicated,
eval, evalq, get, grep, grepl, intersect, is.unsorted, lapply,
lengths, mapply, match, mget, order, paste, pmax, pmax.int, pmin,
pmin.int, rank, rbind, rownames, sapply, setdiff, sort, table,
tapply, union, unique, unlist, unsplit
Loading required package: S4Vectors
Loading required package: stats4
Loading required package: IRanges
Loading required package: GenomeInfoDb
Loading required package: GenomicRanges
Loading required package: Biostrings
Loading required package: XVector
Loading required package: rtracklayer
RangedData with 4 rows and 3 value columns across 3 spaces
space ranges | ori mut
<factor> <IRanges> | <character> <character>
1 chr1 [109817041, 109818140] | G T
2 chr10 [ 23507814, 23508913] | A G
3 chr16 [ 209160, 210259] | T C
4 chr16 [ 52598639, 52599738] | C T
name
<character>
1 [known_CEBP_binding_increase]
2 [known_FOXA2_binding_decrease]
3 [known_GATA1_binding_increase]
4 [known_FOXA1_binding_increase]
Number of valid variants:
4
Number of input variants:
4
Successfully converted to input format
Processing options
Switch to float
1
Processing options
Switch to float
1
Finished running DeepSEA. Now prepare output files...
[W::hts_idx_load2] The index file is older than the data file: ./resources/phastCons/primates_nohuman.tsv.gz.tbi
(4, 4)
Finished creating output file. Now clean up...
Everything done.
[user@cn0236 test]$ ls out1/
infile.vcf.out.alt infile.vcf.out.logfoldchange infile.vcf.wt1100.fasta.ref.vcf.evoall
infile.vcf.out.diff infile.vcf.out.ref infile.vcf.wt1100.fasta.ref.vcf.evo.evalues
infile.vcf.out.evalue infile.vcf.out.snpclass
infile.vcf.out.funsig infile.vcf.out.summary
[user@cn0236 test]$ rundeepsea-insilicomut.py example.vcf out2 469 #this will try to use all of the CPUs on a node
Successfully copied input to working directory /tmp/tmpCLZ7CN
Loading required package: BSgenome.Hsapiens.UCSC.hg19
Loading required package: BSgenome
Loading required package: methods
Loading required package: BiocGenerics
Loading required package: parallel
Attaching package: 'BiocGenerics'
The following objects are masked from 'package:parallel':
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB
The following objects are masked from 'package:stats':
IQR, mad, xtabs
The following objects are masked from 'package:base':
Filter, Find, Map, Position, Reduce, anyDuplicated, append,
as.data.frame, as.vector, cbind, colnames, do.call, duplicated,
eval, evalq, get, grep, grepl, intersect, is.unsorted, lapply,
lengths, mapply, match, mget, order, paste, pmax, pmax.int, pmin,
pmin.int, rank, rbind, rownames, sapply, setdiff, sort, table,
tapply, union, unique, unlist, unsplit
Loading required package: S4Vectors
Loading required package: stats4
Loading required package: IRanges
Loading required package: GenomeInfoDb
Loading required package: GenomicRanges
Loading required package: Biostrings
Loading required package: XVector
Loading required package: rtracklayer
RangedData with 4 rows and 3 value columns across 3 spaces
space ranges | ori mut
<factor> <IRanges> | <character> <character>
1 chr1 [109817041, 109818140] | G T
2 chr10 [ 23507814, 23508913] | A G
3 chr16 [ 209160, 210259] | T C
4 chr16 [ 52598639, 52599738] | C T
name
<character>
1 [known_CEBP_binding_increase]
2 [known_FOXA2_binding_decrease]
3 [known_GATA1_binding_increase]
4 [known_FOXA1_binding_increase]
Number of valid variants:
4
Number of input variants:
4
Successfully converted to input format
Processing options
Switch to float
1
Processing options
Switch to float
1
1025
2049
3073
4097
5121
6145
7169
8193
9217
10241
11265
12289
13313
14337
15361
16385
17409
18433
19457
20481
21505
22529
23553
Finished running DeepSEA. Now prepare output files...
/opt/conda/lib/python2.7/site-packages/matplotlib/collections.py:590: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
if self._edgecolors == str('face'):
Finished creating output file. Now clean up...
Everything done.
[user@cn0236 test]$ ls out2/
colorbar.png log2foldchange_profile.csv preview.png vis.png
[user@cn0236 test]$ cp -r out* /data/${USER}
[user@cn0236 test]$ exit
exit
salloc.exe: Relinquishing job allocation 47562568
[user@biowulf ~]$
Create a batch input file (e.g. deepsea.sh). For example:
#!/bin/bash set -e module load deepsea export TMPDIR=/lscratch/$SLURM_JOB_ID rundeepsea-insilicomut.py example.vcf out
Submit this job using the Slurm sbatch command. As in the example above, be sure to allocate a node exclusively. For instance, you could submit this job using the following command
sbatch --constraint=x2650 --cpus-per-task=32 --mem=10g --ntasks=1 --exclusive --gres=lscratch:10 deepsea.sh
Create a swarmfile (e.g. deepsea.swarm). For example:
export TMPDIR=/lscratch/$SLURM_JOB_ID; rundeepsea-insilicomut.py example1.vcf out1 export TMPDIR=/lscratch/$SLURM_JOB_ID; rundeepsea-insilicomut.py example2.vcf out2 export TMPDIR=/lscratch/$SLURM_JOB_ID; rundeepsea-insilicomut.py example3.vcf out3 export TMPDIR=/lscratch/$SLURM_JOB_ID; rundeepsea-insilicomut.py example4.vcf out4
Submit this job using the swarm command. If you are using the rundeepsea-insilicomut.py script, be sure to allocate the nodes exclusively. For example:
swarm -f deepsea.swarm -g 10 -t 32 --module deepsea --sbatch "--constraint=x2650 --ntasks=1 --exclusive --gres=lscratch:10"