From the repository:
Minimizer based sparse de Bruijn Graph constructor. Homopolymer compress input sequences, pick syncmers from hpc-compressed sequences, connect syncmers with an edge if they are adjacent in a read, unitigify and homopolymer decompress. Suggested input is PacBio HiFi/CCS reads, or ONT duplex reads. May or may not work with Illumina reads. Not suggested for PacBio CLR or regular ONT reads
$MBG_TEST_DATA
Allocate an interactive session and run the program. Sample session:
[user@biowulf]$ sinteractive --gres=lscratch:10 --cpus-per-task=2 --mem=3g salloc.exe: Pending job allocation 46116226 salloc.exe: job 46116226 queued and waiting for resources salloc.exe: job 46116226 has been allocated resources salloc.exe: Granted job allocation 46116226 salloc.exe: Waiting for resource configuration salloc.exe: Nodes cn3144 are ready for job [user@cn3144]$ cd /lscratch/$SLURM_JOB_ID [user@cn3144]$ module load mbg [user@cn3144]$ cp ${MBG_TEST_DATA:-none}/SRR10971019.fasta . [user@cn3144]$ MBG -t $SLURM_CPUS_PER_TASK -i SRR10971019.fasta -o SRR10971019_graph.gfa -k 1501 -w 1450 -a 1 -u 3 Parameters: k=1501,w=1450,a=1,u=3,t=2,r=0,R=0,hpcvariantcov=0,errormasking=hpc,endkmers=no,blunt=no,keepgaps=no,guesswork=no,cache=no Collecting selected k-mers Reading sequences from SRR10971019.fasta 1210730 total selected k-mers in reads 265228 distinct selected k-mers in reads Unitigifying Filtering by unitig coverage 3513 distinct selected k-mers in unitigs after filtering Getting read paths Reading sequences from SRR10971019.fasta Building unitig sequences Reading sequences from SRR10971019.fasta Writing graph to SRR10971019_graph.gfa selecting k-mers and building graph topology took 19,594 s unitigifying took 0,81 s filtering unitigs took 0,4 s getting read paths took 19,186 s building unitig sequences took 36,835 s forcing edge consistency took 0,24 s writing the graph and calculating stats took 0,94 s nodes: 567 edges: 730 assembly size 5346906 bp, N50 29122 approximate number of k-mers ~ 4495839 [user@cn3144]$ exit salloc.exe: Relinquishing job allocation 46116226 [user@biowulf]$
Create a batch input file (e.g. mbg.sh), which uses the input file 'mbg.in'. For example:
#!/bin/bash module load mbg/1.0.16 cp ${MBG_TEST_DATA:-none}/SRR10971019.fasta . MBG -t $SLURM_CPUS_PER_TASK -i SRR10971019.fasta -o SRR10971019_graph.gfa -k 1501 -w 1450 -a 1 -u 3
Submit this job using the Slurm sbatch command.
sbatch [--cpus-per-task=#] [--mem=#] mbg.sh