Canu is a fork of the Celera Assembler designed for high-noise single-molecule sequencing (such as the PacBio RSII or Oxford Nanopore MinION). Canu will correct the reads, then trim suspicious regions (such as remaining SMRTbell adapter), then assemble the corrected and cleaned reads into unitigs.
Allocate an interactive session and run the program.
Sample session (user input in bold):
[user@biowulf]$ sinteractive --cpus-per-task=8 --mem=20g salloc.exe: Pending job allocation 46116226 salloc.exe: job 46116226 queued and waiting for resources salloc.exe: job 46116226 has been allocated resources salloc.exe: Granted job allocation 46116226 salloc.exe: Waiting for resource configuration salloc.exe: Nodes cn3144 are ready for job [user@cn3144 ~]$ module load canu [+] Loading gnuplot 5.2.2 ... [+] Loading canu 1.7 [user@cn3144 ~]$ canu \ -pacbio-raw /usr/local/apps/canu/p6.25x.fastq \ -usegrid=0 \ -maxMemory=$(( SLURM_MEM_PER_NODE - 1 )) \ -maxThreads=$SLURM_CPUS_PER_TASK \ -p ecoli \ -d ecoli-auto \ -genomeSize=4.8m -- Canu 1.7 -- -- CITATIONS -- -- Koren S, Walenz BP, Berlin K, Miller JR, Phillippy AM. -- Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. -- Genome Res. 2017 May;27(5):722-736. -- http://doi.org/10.1101/gr.215087.116 [...] [user@cn3144 ~]$ exit salloc.exe: Relinquishing job allocation 46116226 [user@biowulf ~]$
Create a batch input file (e.g. canu.sh). For example:
#!/bin/bash # this file is called canu_nogrid.sh cd /data/user/canu module load canu/1.1 canu \ -p ecoli -d ecoli-auto \ -genomeSize=4.8m \ -pacbio-raw p6.25x.fastq \ usegrid=0 \ -maxMemory=$(( SLURM_MEM_PER_NODE - 1 )) \ -maxThreads=$SLURM_CPUS_PER_TASK \
Submit this job using the Slurm sbatch command.
sbatch [--cpus-per-task=#] [--mem=#] canu.sh
In most cases users will want to use the grid options of Canu to distribute the work. Run the Canu command on the Biowulf login node or in an interactive session, and Canu will submit the jobs appropriately. For example:
[user@biowulf canu]$ module load canu [user@biowulf canu]$ canu -p asm -d lambda -genomeSize=50k -pacbio-raw \ > p6.25x.fastq \ > minReadLength=500 minOverlapLength=500 \ > usegrid=1 \ > gridOptions="--time=30:00 --partition quick" \ > gridOptionsJobName=lam -- Detected Java(TM) Runtime Environment '1.8.0_11' (from 'java'). -- Detected 30 CPUs and 118 gigabytes of memory. -- Detected Slurm with 'sinfo' binary in /usr/local/slurm/bin/sinfo. -- -- Found 16 hosts with 12 cores and 45 GB memory under Slurm control. -- Found 594 hosts with 32 cores and 124 GB memory under Slurm control. -- Found 64 hosts with 32 cores and 124 GB memory under Slurm control. -- Found 24 hosts with 32 cores and 251 GB memory under Slurm control. -- Found 384 hosts with 24 cores and 22 GB memory under Slurm control. -- Found 250 hosts with 8 cores and 6 GB memory under Slurm control. -- Found 103 hosts with 32 cores and 30 GB memory under Slurm control. -- Found 16 hosts with 16 cores and 69 GB memory under Slurm control. -- Found 16 hosts with 16 cores and 69 GB memory under Slurm control. -- Found 295 hosts with 16 cores and 22 GB memory under Slurm control. -- Found 4 hosts with 64 cores and 1008 GB memory under Slurm control. -- Found 64 hosts with 32 cores and 61 GB memory under Slurm control. -- Found 295 hosts with 32 cores and 61 GB memory under Slurm control. -- -- Allowed to run under grid control, and use up to 4 compute threads and 3 GB memory for stage 'bogart (unitigger)'. -- Allowed to run under grid control, and use up to 8 compute threads and 6 GB memory for stage 'mhap (overlapper)'. -- Allowed to run under grid control, and use up to 8 compute threads and 6 GB memory for stage 'mhap (overlapper)'. -- Allowed to run under grid control, and use up to 8 compute threads and 6 GB memory for stage 'mhap (overlapper)'. -- Allowed to run under grid control, and use up to 4 compute threads and 3 GB memory for stage 'read error detection (overlap error adjustment)'. -- Allowed to run under grid control, and use up to 1 compute thread and 2 GB memory for stage 'overlap error adjustment'. -- Allowed to run under grid control, and use up to 4 compute threads and 8 GB memory for stage 'utgcns (consensus'. -- Allowed to run under grid control, and use up to 1 compute thread and 2 GB memory for stage 'overlap store parallel bucketizer'. -- Allowed to run under grid control, and use up to 1 compute thread and 2 GB memory for stage 'overlapper'. -- Allowed to run under grid control, and use up to 8 compute threads and 6 GB memory for stage 'overlapper'. -- Allowed to run under grid control, and use up to 8 compute threads and 6 GB memory for stage 'overlapper'. -- Allowed to run under grid control, and use up to 4 compute threads and 4 GB memory for stage 'meryl (k-mer counting)'. -- Allowed to run under grid control, and use up to 4 compute threads and 6 GB memory for stage 'falcon_sense (read correction)'. -- Allowed to run under grid control, and use up to 8 compute threads and 6 GB memory for stage 'minimap (overlapper)'. -- Allowed to run under grid control, and use up to 8 compute threads and 6 GB memory for stage 'minimap (overlapper)'. -- Allowed to run under grid control, and use up to 8 compute threads and 6 GB memory for stage 'minimap (overlapper)'. ---------------------------------------- -- Starting command on Fri Mar 25 14:37:09 2016 with 114.6 GB free disk space sbatch \ --mem=4g \ --cpus-per-task=1 \ --time=30:00 \ --partition quick \ -D `pwd` \ -J "canu_asm_lam" \ -o /data/user/canu/lambda/canu-scripts/canu.01.out /data/user/canu/lambda/canu-scripts/canu.01.sh 16390160 -- Finished on Fri Mar 25 14:37:09 2016 (lickety-split) with 114.6 GB free disk space ----------------------------------------
At various times, the 'sjobs' command will show different Canu jobs running or pending....
[user@biowulf canu]$ sjobs User JobId JobName Part St Reason Runtime Walltime Nodes CPUs Memory Dependency Nodelist ================================================================ user 16390181 canu_asm_lam quick PD Dependency 0:00 30:00 1 1 4GB/node afterany:16390180_* ================================================================ [user@biowulf canu]$ sjobs User JobId JobName Part St Reason Runtime Walltime Nodes CPUs Memory Dependency Nodelist ================================================================ user 16390277_[1] meryl_asm_la quick PD --- 0:00 30:00 1 4 4GB/node user 16390278 canu_asm_lam quick PD Dependency 0:00 30:00 1 1 4GB/node afterany:16390277_* ================================================================