From the documentation:
Winnowmap is a long-read mapping algorithm optimized for mapping ONT and PacBio reads to repetitive reference sequences. Winnowmap development began on top of minimap2 codebase, and since then we have incorporated the following two ideas to improve mapping accuracy within repeats. ...
$WINNOWMAP_TEST_DATA
Allocate an interactive session and run the program. Sample session:
[user@biowulf]$ sinteractive --cpus-per-task=6 --mem=8g --gres=lscratch:10 salloc.exe: Pending job allocation 46116226 salloc.exe: job 46116226 queued and waiting for resources salloc.exe: job 46116226 has been allocated resources salloc.exe: Granted job allocation 46116226 salloc.exe: Waiting for resource configuration salloc.exe: Nodes cn3144 are ready for job [user@cn3144]$ cd /lscratch/$SLURM_JOB_ID [user@cn3144]$ module load winnowmap [user@cn3144]$ cp -rL ${WINNOWMAP_TEST_DATA:-none}/* . [user@cn3144]$ ## note that memory is slighly less the amount of allocated memory [user@cn3144]$ meryl count k=15 threads=$SLURM_CPUS_PER_TASK memory=7 \ output merylDB Zymo-Isolates-SPAdes-Illumina.fasta Found 1 command tree. Counting 68 (estimated) million canonical 15-mers from 1 input file: sequence-file: Zymo-Isolates-SPAdes-Illumina.fasta SIMPLE MODE ----------- 15-mers -> 1073741824 entries for counts up to 65535. -> 16 Gbits memory used 71433493 input bases -> expected max count of 285733, needing 4 extra bits. -> 4096 Mbits memory used 2560 MB memory needed COMPLEX MODE ------------ prefix # of struct kmers/ segs/ min data total bits prefix memory prefix prefix memory memory memory ------ ------- ------- ------- ------- ------- ------- ------- 1 2 P 37 kB 34 MM 1976 S 128 kB 247 MB 247 MB 2 4 P 42 kB 17 MM 954 S 256 kB 238 MB 238 MB 3 8 P 54 kB 8719 kM 460 S 512 kB 230 MB 230 MB 4 16 P 78 kB 4359 kM 222 S 1024 kB 222 MB 222 MB 5 32 P 127 kB 2179 kM 107 S 2048 kB 214 MB 214 MB 6 64 P 228 kB 1089 kM 52 S 4096 kB 208 MB 208 MB 7 128 P 429 kB 544 kM 25 S 8192 kB 200 MB 200 MB 8 256 P 832 kB 272 kM 12 S 16 MB 192 MB 192 MB 9 512 P 1640 kB 136 kM 6 S 32 MB 192 MB 193 MB 10 1024 P 3256 kB 68 kM 3 S 64 MB 192 MB 195 MB Best Value! 11 2048 P 6496 kB 34 kM 2 S 128 MB 256 MB 262 MB 12 4096 P 12 MB 17 kM 1 S 256 MB 256 MB 268 MB 13 8192 P 25 MB 8720 M 1 S 512 MB 512 MB 537 MB 14 16 kP 50 MB 4360 M 1 S 1024 MB 1024 MB 1074 MB 15 32 kP 101 MB 2180 M 1 S 2048 MB 2048 MB 2149 MB 16 64 kP 202 MB 1090 M 1 S 4096 MB 4096 MB 4298 MB FINAL CONFIGURATION ------------------- Configured complex mode for 0.191 GB memory per batch, and up to 1 batch. Start counting with THREADED method. Used 0.285 GB out of 125.715 GB to store 1048320 kmers. Used 0.410 GB out of 125.715 GB to store 54504240 kmers. Writing results to 'merylDB', using 4 threads. finishIteration()-- Finished counting. Cleaning up. Bye. [user@cn3144]$ meryl print greater-than distinct=0.9998 merylDB > repetitive_k15.txt Found 1 command tree. PROCESSING TREE #1 using 1 thread. opGreaterThan threshold=12 fraction-distinct=0.999800 merylDB print to (stdout) Cleaning up. Bye. [user@cn3144]$ ls -lh total 715M drwxr-xr-x 2 user group 12K Apr 2 08:30 merylDB -rw-r--r-- 1 user group 647M Apr 2 08:27 ont_zymo.fastq.gz -rw-r--r-- 1 user group 198K Apr 2 08:31 repetitive_k15.txt -rw-r--r-- 1 user group 69M Apr 2 08:27 Zymo-Isolates-SPAdes-Illumina.fasta [user@cn3144]$ head repetitive_k15.txt AAAAAAAAAAAAAAA 8426 AAAAAAAAAAAAAAC 241 AAAAAAAAAAAAAAT 307 AAAAAAAAAAAAAAG 365 AAAAAAAAAAAAACA 143 AAAAAAAAAAAAACC 69 AAAAAAAAAAAAACT 72 AAAAAAAAAAAAACG 44 AAAAAAAAAAAAATA 154 AAAAAAAAAAAAATC 44 [user@cn3144]$ module load samtools [user@cn3144]$ winnowmap -W repetitive_k15.txt -t $SLURM_CPUS_PER_TASK -ax map-ont \ Zymo-Isolates-SPAdes-Illumina.fasta ont_zymo.fastq.gz \ | samtools sort -T /lscratch/$SLURM_JOB_ID/part -@2 -m1G -O BAM -o ont_zymo.bam [M::mm_idx_gen::0.000*32.22] reading downweighted kmers [M::mm_idx_gen::0.004*1.69] collected downweighted kmers, no. of kmers read=10651 [M::mm_idx_gen::0.004*1.69] saved the kmers in a bloom filter: hash functions=2 and size=153136 [M::mm_idx_gen::4.113*1.06] collected minimizers [M::mm_idx_gen::4.251*1.14] sorted minimizers [M::main::4.251*1.14] loaded/built the index for 2428 target sequence(s) [M::main::4.251*1.14] running winnowmap in SV-aware mode [M::main::4.251*1.14] stage1-specific parameters minP:1000, incP:4.00, maxP:16000, sample:1000, min-qlen:10000, min-qcov:0.5, min-mapq:5, mid-occ:5000 [M::main::4.251*1.14] stage2-specific parameters s2_bw:2000, s2_zdropinv:25 [M::mm_idx_stat] kmer size: 15; skip: 50; is_hpc: 0; #seq: 2428 [M::mm_idx_stat::4.287*1.14] distinct minimizers: 2443036 (89.73% are singletons); average occurrences: 1.125; average spacing: 25.530 [M::worker_pipeline::135.789*4.94] mapped 160000 sequences [M::main] Version: 2.0, pthreads=6, omp_threads=3 [M::main] CMD: winnowmap -W repetitive_k15.txt -t 6 -ax map-ont Zymo-Isolates-SPAdes-Illumina.fasta ont_zymo.fastq.gz [M::main] Real time: 135.845 sec; CPU: 670.800 sec; Peak RSS: 3.423 GB [user@cn3144]$ exit salloc.exe: Relinquishing job allocation 46116226 [user@biowulf]$
Create a batch input file (e.g. winnowmap.sh). For example, for aligning reads to a reference with known repetitive k-mers:
#!/bin/bash module load winnowmap/2.0 samtools/1.11 winnowmap -W repetitive_k15.txt -t $SLURM_CPUS_PER_TASK -ax map-ont \ ${WINNOWMAP_TEST_DATA:-none}/Zymo-Isolates-SPAdes-Illumina.fasta \ ${WINNOWMAP_TEST_DATA:-none}/ont_zymo.fastq.gz \ | samtools sort -T /lscratch/$SLURM_JOB_ID/part -@2 -m1G -O BAM -o ont_zymo.bam
Submit this job using the Slurm sbatch command.
sbatch --cpus-per-task=6 --mem=8g --gres=lscratch:10 winnowmap.sh
Create a swarmfile (e.g. winnowmap.swarm). For example:
winnowmap -W repetitive_k15.txt -t $SLURM_CPUS_PER_TASK -ax map-ont \ genome.fasta sample1.fastq.gz \ | samtools sort -T /lscratch/$SLURM_JOB_ID/part -@2 -m1G -O BAM sample1.bam winnowmap -W repetitive_k15.txt -t $SLURM_CPUS_PER_TASK -ax map-ont \ genome.fasta sample2.fastq.gz \ | samtools sort -T /lscratch/$SLURM_JOB_ID/part -@2 -m1G -O BAM sample2.bam
Submit this job using the swarm command.
swarm -f winnowmap.swarm -g 8 -t 6 --module winnowmap/2.0,samtools/1.11where
-g # | Number of Gigabytes of memory required for each process (1 line in the swarm command file) |
-t # | Number of threads/CPUs required for each process (1 line in the swarm command file). |
--module winnowmap | Loads the winnowmap module for each subjob in the swarm |