Methylation/modified base calling separated from basecalling. Remora primarily provides an API to call modified bases for basecaller programs such as Bonito. Remora also provides the tools to prepare datasets, train modified base models and run simple inference.
Allocate an interactive session and run the program.
Sample session (user input in bold):
[user@biowulf]$ sinteractive [user@cn4338 ~]$ module load remora [+] Loading remora 2.1.1 on cn4338 [+] Loading singularity 3.10.5 on cn4338
Running help command:
[user@cn4338] cp -a /usr/local/apps/remora/2.1.1/tests/data . [user@cn4338 data]$ remora --help remora --help usage: remora [-h] [-v] {dataset,model,infer,validate,analyze} ... ********** Remora ********* Modified base model training and application. optional arguments: -h, --help show this help message and exit -v, --version Show Remora version and exit. sub-commands: dataset Create or perform operations on a Remora dataset model Train or perform operations on Remora models infer Perform Remora model inference validate Validate modified base predictions analyze Analyze nanopore data including raw signal
Generate Training Data:
[apptest1@cn4338 data]$ remora \ dataset prepare \ can_reads.pod5 \ can_mappings.bam \ --output-remora-training-file can_chunks.npz \ --log-filename prep_can.log \ --refine-kmer-level-table levels.txt \ --refine-rough-rescale \ --motif CG 0 \ --mod-base-control Indexing BAM by read id: 14 Reads [00:00, 10146.93 Reads/s] [14:30:14] Extracting read IDs from POD5 [14:30:14] Found 14 BAM records, 14 POD5 reads, and 14 in common [14:30:14] Making reference-anchored training data [14:30:14] Allocating memory for output tensors [14:30:14] Processing reads Extracting chunks: 100%|█████████████████████████| 14/14 [00:00<00:00, 163.20 Reads/s] [14:30:14] Extracted 205 chunks from 14 reads. [14:30:14] Label distribution: Counter({0: 205})
For more examples, please visit the Remora Github Page |