Biowulf High Performance Computing at the NIH
deepsignal on Biowulf

From the deepsignal home page

A deep-learning method for detecting DNA methylation state from Oxford Nanopore sequencing reads.

References:

  • P. Ni, N. Huang, Z. Zhang, D. Wang, F. Liang, Y. Miao, C. Xiao, F. Luo, and J. Wang. DeepSignal: detecting DNA methylation state from Nanopore sequencing reads using deep-learning., Bioinformatics 2019, 35:4586-4595. PubMed |  Journal
Documentation
Important Notes

Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

Allocate an interactive session and run the program. Sample session:

[user@biowulf]$ sinteractive --mem=10g -c6 --gres=lscratch:50,gpu:p100:1
salloc.exe: Pending job allocation 46116226
salloc.exe: job 46116226 queued and waiting for resources
salloc.exe: job 46116226 has been allocated resources
salloc.exe: Granted job allocation 46116226
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn3144 are ready for job

[user@cn3144]$ cd /lscratch/$SLURM_JOB_ID
[user@cn3144]$ module load guppy deepsignal
[+] Loading singularity  3.7.1  on cn2384
[+] Loading guppy 4.2.2  ...
[+] Loading deepsignal  0.1.8

[user@cn3144]$ tar -xzf ${DEEPSIGNAL_TEST_DATA:-none}/fast5s.sample.tar.gz
[user@cn3144]$ ll
total 13M
drwxr-xr-x 3 user group 548K Dec 17  2018 fast5s.al
-rw-r--r-- 1 user group  12M Dec 17  2018 GCF_000146045.2_R64_genomic.fna

To call modifications, the raw fast5 files should be basecalled (Guppy or Albacore) and then be re-squiggled by tombo. At last, modifications of specified motifs can be called by deepsignal. The following are commands to call 5mC in CG contexts from the example data:

[user@cn3144]$ guppy_basecaller -x cuda:all \
    -i fast5s.al -r -s fast5s.al.guppy --config dna_r9.4.1_450bps_hac_prom.cfg
ONT Guppy basecalling software version 4.2.2+effbaf8
config file:        /opt/ont/guppy/data/dna_r9.4.1_450bps_hac_prom.cfg
model file:         /opt/ont/guppy/data/template_r9.4.1_450bps_hac_prom.jsn
input path:         fast5s.al
save path:          fast5s.al.guppy
chunk size:         2000
chunks per runner:  1248
records per file:   4000
num basecallers:    4
gpu device:         cuda:all
kernel path:
runners per device: 12

Found 3621 fast5 files to process.
Init time: 6689 ms

0%   10   20   30   40   50   60   70   80   90   100%
|----|----|----|----|----|----|----|----|----|----|
***************************************************
Caller time: 23257 ms, Samples called: 53658471, samples/s: 2.3072e+06
Finishing up any open output files.
Basecalling completed successfully.

[user@cn3144]$ tombo preprocess annotate_raw_with_fastqs \
    --fast5-basedir fast5s.al \
    --fastq-filenames fast5s.al.guppy/*.fastq \
    --overwrite \
    --processes 6 \
    --sequencing-summary-filenames fast5s.al.guppy/sequencing_summary.txt
Using available GPU
[14:45:09] Getting read filenames.
[14:45:09] Parsing sequencing summary files.
[14:45:09] Annotating FAST5s with sequence from FASTQs.
100%|███████████████████████████████████████████████████████████████| 3621/3621 
[14:45:11] Added sequences to a total of 3621 reads.

[user@cn3144]$ tombo resquiggle fast5s.al \
    GCF_000146045.2_R64_genomic.fna \
    --processes 6 \
    --corrected-group RawGenomeCorrected_001  \
    --overwrite
Using available GPU
[14:47:15] Loading minimap2 reference.
[14:47:16] Getting file list.
[14:47:16] Loading default canonical ***** DNA ***** model.
[14:47:16] Re-squiggling reads (raw signal to genomic sequence alignment).
100%|█████████████████████████████████████████████████████████████████| 3621/3621
[14:49:08] Final unsuccessful reads summary (4.7% reads unsuccessfully processed; 169 total reads):
     3.2% (115 reads) : Alignment not produced
     1.2% ( 44 reads) : Poor raw to expected signal matching (revert with `tombo filter clear_filters`)
     0.3% ( 10 reads) : Read event to sequence alignment extends beyond bandwidth

[14:49:08] Saving Tombo reads index to file.

Notes:

The available models can be checked with ls /usr/local/apps/deepsignal/VER/model. Because deepsignal is installed as a container the path used for deepsignal is /model/NAME.

For deepsignal

[user@cn3144]$ deepsignal call_mods \
    --input_path fast5s.al/ \
    --model_path /model/model.CpG.R9.4_1D.human_hx1.bn17.sn360.v0.1.7+/bn_17.sn_360.epoch_9.ckpt \
    --result_file fast5s.al.CpG.call_mods.tsv \
    --reference_path GCF_000146045.2_R64_genomic.fna \
    --corrected_group RawGenomeCorrected_001  \
    --is_gpu yes
Using available GPU
[... tensorflow warnings omitted ...]
# ===============================================
## parameters:
input_path:
        fast5s.al/
model_path:
        /model/model.CpG.R9.4_1D.human_hx1.bn17.sn360.v0.1.7+/bn_17.sn_360.epoch_9.ckpt
is_cnn:
        yes
is_rnn:
        yes
is_base:
        yes
kmer_len:
        17
cent_signals_len:
        360
batch_size:
        512
learning_rate:
        0.001
class_num:
        2
result_file:
        fast5s.al.CpG.call_mods.tsv
recursively:
        yes
corrected_group:
        RawGenomeCorrected_001
basecall_subgroup:
        BaseCalled_template
reference_path:
        GCF_000146045.2_R64_genomic.fna
is_dna:
        yes
normalize_method:
        mad
methy_label:
        1
motifs:
        CG
mod_loc:
        0
f5_batch_num:
        100
positions:
        None
nproc:
        1
is_gpu:
        yes
# ===============================================
3621 fast5 files in total..
parse the motifs string..
read genome reference file..
read position file if it is not None..
write_process started..
[... tensorflow warnings omitted ...]

2021-01-27 14:55:09.347670: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2021-01-27 14:55:09.373708: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2394530000 Hz
2021-01-27 14:55:09.374427: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x55555de4f8f0 executing computations on platform Host. Devices:
2021-01-27 14:55:09.374491: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): , 
2021-01-27 14:55:09.534751: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties:
name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285
pciBusID: 0000:8e:00.0
totalMemory: 15.90GiB freeMemory: 15.65GiB
2021-01-27 14:55:09.534865: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2021-01-27 14:55:09.536485: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-01-27 14:55:09.536518: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      0
2021-01-27 14:55:09.536542: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0:   N
2021-01-27 14:55:09.536685: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 15224 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:8e:00.0, compute capability: 6.0)
2021-01-27 14:55:09.538689: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x55555de0db80 executing computations on platform CUDA. Devices:
2021-01-27 14:55:09.538728: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): Tesla P100-PCIE-16GB, Compute Capability 6.0
WARNING:tensorflow:From /opt/conda/envs/app/lib/python3.7/site-packages/tensorflow/python/training/saver.py:1266: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to check for files with this prefix.
2021-01-27 14:55:31.560982: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.9.0 locally
call_mods process 7927 ending, proceed 339 batches
finishing the write_process..
125 of 3621 fast5 files failed..
call_mods costs 199.17 seconds..

[user@cn3144]$ call_modification_frequency.py \
   --input_path fast5s.al.CpG.call_mods.tsv \
   --result_file fast5s.al.CpG.call_mods.frequency.tsv
Using available GPU
get 1 input file(s)..
reading the input files..
162916 of 162916 samples used..
writing the result..

[user@cn3144]$ exit
salloc.exe: Relinquishing job allocation 46116226
[user@biowulf]$

For deepsignal2:

[user@cn3144]$ deepsignal2 call_mods \
    --input_path fast5s.al/ \
    --model_path /model/model.dp2.CG.R9.4_1D.human_hx1.bn17_sn16.both_bilstm.b17_s16_epoch4.ckpt \
    --result_file fast5s.al.CpG.call_mods.tsv \
    --reference_path GCF_000146045.2_R64_genomic.fna \
    --corrected_group RawGenomeCorrected_001
[user@cn3144]$ call_modification_frequency.py \
   --input_path fast5s.al.CpG.call_mods.tsv \
   --result_file fast5s.al.CpG.call_mods.frequency.tsv

Batch job
Most jobs should be run as batch jobs.

Create a batch input file (e.g. deepsignal.sh). In this example we assume that the base calling and resquiggle has already been completed and we just run the call_mods step

#!/bin/bash
module load deepsignal/0.1.8
deepsignal call_mods \
    --input_path fast5s.al/ \
    --model_path /model/model.CpG.R9.4_1D.human_hx1.bn17.sn360.v0.1.7+/bn_17.sn_360.epoch_9.ckpt \
    --result_file fast5s.al.CpG.call_mods.tsv \
    --reference_path GCF_000146045.2_R64_genomic.fna \
    --corrected_group RawGenomeCorrected_001  \
    --is_gpu yes
deepsignal < deepsignal.in > deepsignal.out

Submit this job using the Slurm sbatch command.

sbatch --cpus-per-task=6 --mem=20g --gres=gpu:p100:1 deepsignal.sh