From the documentation:
Winnowmap is a long-read mapping algorithm optimized for mapping ONT and PacBio reads to repetitive reference sequences. Winnowmap development began on top of minimap2 codebase, and since then we have incorporated the following two ideas to improve mapping accuracy within repeats. ...
$WINNOWMAP_TEST_DATAAllocate an interactive session and run the program. Sample session:
[user@biowulf]$ sinteractive --cpus-per-task=6 --mem=8g --gres=lscratch:10
salloc.exe: Pending job allocation 46116226
salloc.exe: job 46116226 queued and waiting for resources
salloc.exe: job 46116226 has been allocated resources
salloc.exe: Granted job allocation 46116226
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn3144 are ready for job
[user@cn3144]$ cd /lscratch/$SLURM_JOB_ID
[user@cn3144]$ module load winnowmap
[user@cn3144]$ cp -rL ${WINNOWMAP_TEST_DATA:-none}/* .
[user@cn3144]$ ## note that memory is slighly less the amount of allocated memory
[user@cn3144]$ meryl count k=15 threads=$SLURM_CPUS_PER_TASK memory=7 \
output merylDB Zymo-Isolates-SPAdes-Illumina.fasta
Found 1 command tree.
Counting 68 (estimated) million canonical 15-mers from 1 input file:
sequence-file: Zymo-Isolates-SPAdes-Illumina.fasta
SIMPLE MODE
-----------
15-mers
-> 1073741824 entries for counts up to 65535.
-> 16 Gbits memory used
71433493 input bases
-> expected max count of 285733, needing 4 extra bits.
-> 4096 Mbits memory used
2560 MB memory needed
COMPLEX MODE
------------
prefix # of struct kmers/ segs/ min data total
bits prefix memory prefix prefix memory memory memory
------ ------- ------- ------- ------- ------- ------- -------
1 2 P 37 kB 34 MM 1976 S 128 kB 247 MB 247 MB
2 4 P 42 kB 17 MM 954 S 256 kB 238 MB 238 MB
3 8 P 54 kB 8719 kM 460 S 512 kB 230 MB 230 MB
4 16 P 78 kB 4359 kM 222 S 1024 kB 222 MB 222 MB
5 32 P 127 kB 2179 kM 107 S 2048 kB 214 MB 214 MB
6 64 P 228 kB 1089 kM 52 S 4096 kB 208 MB 208 MB
7 128 P 429 kB 544 kM 25 S 8192 kB 200 MB 200 MB
8 256 P 832 kB 272 kM 12 S 16 MB 192 MB 192 MB
9 512 P 1640 kB 136 kM 6 S 32 MB 192 MB 193 MB
10 1024 P 3256 kB 68 kM 3 S 64 MB 192 MB 195 MB Best Value!
11 2048 P 6496 kB 34 kM 2 S 128 MB 256 MB 262 MB
12 4096 P 12 MB 17 kM 1 S 256 MB 256 MB 268 MB
13 8192 P 25 MB 8720 M 1 S 512 MB 512 MB 537 MB
14 16 kP 50 MB 4360 M 1 S 1024 MB 1024 MB 1074 MB
15 32 kP 101 MB 2180 M 1 S 2048 MB 2048 MB 2149 MB
16 64 kP 202 MB 1090 M 1 S 4096 MB 4096 MB 4298 MB
FINAL CONFIGURATION
-------------------
Configured complex mode for 0.191 GB memory per batch, and up to 1 batch.
Start counting with THREADED method.
Used 0.285 GB out of 125.715 GB to store 1048320 kmers.
Used 0.410 GB out of 125.715 GB to store 54504240 kmers.
Writing results to 'merylDB', using 4 threads.
finishIteration()--
Finished counting.
Cleaning up.
Bye.
[user@cn3144]$ meryl print greater-than distinct=0.9998 merylDB > repetitive_k15.txt
Found 1 command tree.
PROCESSING TREE #1 using 1 thread.
opGreaterThan
threshold=12
fraction-distinct=0.999800
merylDB
print to (stdout)
Cleaning up.
Bye.
[user@cn3144]$ ls -lh
total 715M
drwxr-xr-x 2 user group 12K Apr 2 08:30 merylDB
-rw-r--r-- 1 user group 647M Apr 2 08:27 ont_zymo.fastq.gz
-rw-r--r-- 1 user group 198K Apr 2 08:31 repetitive_k15.txt
-rw-r--r-- 1 user group 69M Apr 2 08:27 Zymo-Isolates-SPAdes-Illumina.fasta
[user@cn3144]$ head repetitive_k15.txt
AAAAAAAAAAAAAAA 8426
AAAAAAAAAAAAAAC 241
AAAAAAAAAAAAAAT 307
AAAAAAAAAAAAAAG 365
AAAAAAAAAAAAACA 143
AAAAAAAAAAAAACC 69
AAAAAAAAAAAAACT 72
AAAAAAAAAAAAACG 44
AAAAAAAAAAAAATA 154
AAAAAAAAAAAAATC 44
[user@cn3144]$ module load samtools
[user@cn3144]$ winnowmap -W repetitive_k15.txt -t $SLURM_CPUS_PER_TASK -ax map-ont \
Zymo-Isolates-SPAdes-Illumina.fasta ont_zymo.fastq.gz \
| samtools sort -T /lscratch/$SLURM_JOB_ID/part -@2 -m1G -O BAM -o ont_zymo.bam
[M::mm_idx_gen::0.000*32.22] reading downweighted kmers
[M::mm_idx_gen::0.004*1.69] collected downweighted kmers, no. of kmers read=10651
[M::mm_idx_gen::0.004*1.69] saved the kmers in a bloom filter: hash functions=2 and size=153136
[M::mm_idx_gen::4.113*1.06] collected minimizers
[M::mm_idx_gen::4.251*1.14] sorted minimizers
[M::main::4.251*1.14] loaded/built the index for 2428 target sequence(s)
[M::main::4.251*1.14] running winnowmap in SV-aware mode
[M::main::4.251*1.14] stage1-specific parameters minP:1000, incP:4.00, maxP:16000, sample:1000, min-qlen:10000, min-qcov:0.5, min-mapq:5, mid-occ:5000
[M::main::4.251*1.14] stage2-specific parameters s2_bw:2000, s2_zdropinv:25
[M::mm_idx_stat] kmer size: 15; skip: 50; is_hpc: 0; #seq: 2428
[M::mm_idx_stat::4.287*1.14] distinct minimizers: 2443036 (89.73% are singletons); average occurrences: 1.125; average spacing: 25.530
[M::worker_pipeline::135.789*4.94] mapped 160000 sequences
[M::main] Version: 2.0, pthreads=6, omp_threads=3
[M::main] CMD: winnowmap -W repetitive_k15.txt -t 6 -ax map-ont Zymo-Isolates-SPAdes-Illumina.fasta ont_zymo.fastq.gz
[M::main] Real time: 135.845 sec; CPU: 670.800 sec; Peak RSS: 3.423 GB
[user@cn3144]$ exit
salloc.exe: Relinquishing job allocation 46116226
[user@biowulf]$
Create a batch input file (e.g. winnowmap.sh). For example, for aligning reads to a reference with known repetitive k-mers:
#!/bin/bash
module load winnowmap/2.0 samtools/1.11
winnowmap -W repetitive_k15.txt -t $SLURM_CPUS_PER_TASK -ax map-ont \
${WINNOWMAP_TEST_DATA:-none}/Zymo-Isolates-SPAdes-Illumina.fasta \
${WINNOWMAP_TEST_DATA:-none}/ont_zymo.fastq.gz \
| samtools sort -T /lscratch/$SLURM_JOB_ID/part -@2 -m1G -O BAM -o ont_zymo.bam
Submit this job using the Slurm sbatch command.
sbatch --cpus-per-task=6 --mem=8g --gres=lscratch:10 winnowmap.sh
Create a swarmfile (e.g. winnowmap.swarm). For example:
winnowmap -W repetitive_k15.txt -t $SLURM_CPUS_PER_TASK -ax map-ont \
genome.fasta sample1.fastq.gz \
| samtools sort -T /lscratch/$SLURM_JOB_ID/part -@2 -m1G -O BAM sample1.bam
winnowmap -W repetitive_k15.txt -t $SLURM_CPUS_PER_TASK -ax map-ont \
genome.fasta sample2.fastq.gz \
| samtools sort -T /lscratch/$SLURM_JOB_ID/part -@2 -m1G -O BAM sample2.bam
Submit this job using the swarm command.
swarm -f winnowmap.swarm -g 8 -t 6 --module winnowmap/2.0,samtools/1.11where
| -g # | Number of Gigabytes of memory required for each process (1 line in the swarm command file) |
| -t # | Number of threads/CPUs required for each process (1 line in the swarm command file). |
| --module winnowmap | Loads the winnowmap module for each subjob in the swarm |