MEGAHIT is a single node assembler for large and complex metagenomics NGS reads, such as soil. It makes use of succinct de Bruijn graph (SdBG) to achieve low memory assembly.
Allocate an interactive session and run the program.
Sample session (user input in bold):
[user@biowulf]$ sinteractive --mem 8g --cpus-per-task 2 --gres lscratch:10 salloc.exe: Pending job allocation 46116226 salloc.exe: job 46116226 queued and waiting for resources salloc.exe: job 46116226 has been allocated resources salloc.exe: Granted job allocation 46116226 salloc.exe: Waiting for resource configuration salloc.exe: Nodes cn3144 are ready for job [user@cn3144 ~]$ module load megahit [+] Loading megahit, version 1.1.4... [user@cn3144 ~]$ cd /lscratch/$SLURM_JOB_ID [user@cn3144 46116226]$ cp -a $MEGAHIT_HOME/share/megahit/test_data/* . [user@cn3144 46116226]$ SLURM_MEM_PER_NODE_BYTES="${SLURM_MEM_PER_NODE}000000" [user@cn3144 46116226]$ megahit --12 r1.il.fa.gz --memory ${SLURM_MEM_PER_NODE_BYTES} `# max memory to use` -t $SLURM_CPUS_PER_TASK --tmp-dir /lscratch/$SLURM_JOB_ID --- [Fri Feb 1 19:59:29 2019] Start assembly. Number of CPU threads 2 --- --- [Fri Feb 1 19:59:29 2019] Available memory: 270168522752, used: 8192000000 --- [Fri Feb 1 19:59:29 2019] Converting reads to binaries --- [read_lib_functions-inl.h : 209] Lib 0 (readsInterleaved1.fa.gz): interleaved, 200 reads, 100 max length [utils.h : 126] Real: 0.0035 user: 0.0010 sys: 0.0029 maxrss: 6964 --- [Fri Feb 1 19:59:30 2019] k-max reset to: 119 --- --- [Fri Feb 1 19:59:30 2019] k list: 21,29,39,59,79,99,119 --- --- [Fri Feb 1 19:59:30 2019] Extracting solid (k+1)-mers for k = 21 --- --- [Fri Feb 1 19:59:30 2019] Building graph for k = 21 --- --- [Fri Feb 1 19:59:30 2019] Assembling contigs from SdBG for k = 21 --- --- [Fri Feb 1 19:59:30 2019] Local assembling k = 21 --- --- [Fri Feb 1 19:59:30 2019] Extracting iterative edges from k = 21 to 29 --- --- [Fri Feb 1 19:59:30 2019] Building graph for k = 29 --- --- [Fri Feb 1 19:59:30 2019] Assembling contigs from SdBG for k = 29 --- --- [Fri Feb 1 19:59:30 2019] Local assembling k = 29 --- --- [Fri Feb 1 19:59:30 2019] Extracting iterative edges from k = 29 to 39 --- --- [Fri Feb 1 19:59:30 2019] Building graph for k = 39 --- --- [Fri Feb 1 19:59:30 2019] Assembling contigs from SdBG for k = 39 --- --- [Fri Feb 1 19:59:30 2019] Local assembling k = 39 --- --- [Fri Feb 1 19:59:31 2019] Extracting iterative edges from k = 39 to 59 --- --- [Fri Feb 1 19:59:31 2019] Building graph for k = 59 --- --- [Fri Feb 1 19:59:31 2019] Assembling contigs from SdBG for k = 59 --- --- [Fri Feb 1 19:59:31 2019] Local assembling k = 59 --- --- [Fri Feb 1 19:59:31 2019] Extracting iterative edges from k = 59 to 79 --- --- [Fri Feb 1 19:59:31 2019] Building graph for k = 79 --- --- [Fri Feb 1 19:59:31 2019] Assembling contigs from SdBG for k = 79 --- --- [Fri Feb 1 19:59:31 2019] Local assembling k = 79 --- --- [Fri Feb 1 19:59:31 2019] Extracting iterative edges from k = 79 to 99 --- --- [Fri Feb 1 19:59:31 2019] Merging to output final contigs --- --- [STAT] 1 contigs, total 1207 bp, min 1207 bp, max 1207 bp, avg 1207 bp, N50 1207 bp --- [Fri Feb 1 19:59:31 2019] ALL DONE. Time elapsed: 1.405510 seconds --- [user@cn3144 ~]$ exit salloc.exe: Relinquishing job allocation 46116226 [user@biowulf ~]$
Create a batch input file (e.g. megahit.sh). For example:
#!/bin/bash set -e module load megahit test -n "$SLURM_CPUS_PER_TASK" || SLURM_CPUS_PER_TASK=2 test -n "$SLURM_MEM_PER_NODE" || SLURM_MEM_PER_NODE=1500 SLURM_MEM_PER_NODE_BYTES=${SLURM_MEM_PER_NODE}000000 megahit --12 readsInterleaved1.fa.gz --memory ${SLURM_MEM_PER_NODE_BYTES} -t $SLURM_CPUS_PER_TASK --tmp-dir /lscratch/$SLURM_JOB_ID
Submit this job using the Slurm sbatch command.
sbatch [--cpus-per-task=#] [--mem=#] --gres lscratch:10 megahit.sh
Create a swarmfile (e.g. megahit.swarm). For example:
megahit --12 readsInterleaved1.fa.gz --memory ${SLURM_MEM_PER_NODE}000000 -t $SLURM_CPUS_PER_TASK --tmp-dir /lscratch/$SLURM_JOB_ID megahit --12 readsInterleaved2.fa.gz --memory ${SLURM_MEM_PER_NODE}000000 -t $SLURM_CPUS_PER_TASK --tmp-dir /lscratch/$SLURM_JOB_ID megahit --12 readsInterleaved3.fa.gz --memory ${SLURM_MEM_PER_NODE}000000 -t $SLURM_CPUS_PER_TASK --tmp-dir /lscratch/$SLURM_JOB_ID megahit --12 readsInterleaved4.fa.gz --memory ${SLURM_MEM_PER_NODE}000000 -t $SLURM_CPUS_PER_TASK --tmp-dir /lscratch/$SLURM_JOB_ID
Submit this job using the swarm command.
swarm -f megahit.swarm -g # -t # --gres lscratch:# --module megahitwhere
-g # | Number of Gigabytes of memory required for each process (1 line in the swarm command file) |
-t # | Number of threads/CPUs required for each process (1 line in the swarm command file). |
--gres lscratch:# | Number of gigabytes of local scratch space for each subjob in the swarm. |
--module megahit | Loads the MEGAHIT module for each subjob in the swarm |