phasebook is a novel approach for reconstructing the haplotypes of diploid genomes from long reads de novo, that is without the need for a reference genome.
Allocate an interactive session and run the program.
Sample session (user input in bold):
[user@biowulf ~]$ sinteractive -c8 --mem=4g --gres=lscratch:10 salloc: Pending job allocation 33141417 salloc: job 33141417 queued and waiting for resources salloc: job 33141417 has been allocated resources salloc: Granted job allocation 33141417 salloc: Waiting for resource configuration salloc: Nodes cn0881 are ready for job srun: error: x11: no local DISPLAY defined, skipping error: unable to open file /tmp/slurm-spank-x11.33141417.0 slurmstepd: error: x11: unable to read DISPLAY value [user@cn0881 ~]$ module load phasebook [+] Loading phasebook 1.0.0 on cn0881 [+] Loading singularity 3.8.5-1 on cn0881 [user@cn0881 ~]$ cd /lscratch/$SLURM_JOB_ID [user@cn0881 33141417]$ cp -r $PHASEBOOK_TESTDATA . [user@cn0881 33141417]$ cd TESTDATA [user@cn0881 TESTDATA]$ phasebook.py -i reads.fa -t $SLURM_CPUS_PER_TASK -p hifi -g small -x use preset parameters... 2022-02-25 13:20:59,226 - /opt/phasebook/scripts/phasebook.py[line:262] - INFO: splitting input fastx file into 1 subfiles... 2022-02-25 13:20:59,361 - /opt/phasebook/scripts/phasebook.py[line:270] - INFO: splitting finished. [...snip] start polishing... 2022-02-25 13:22:26,529 - /opt/phasebook/scripts/phasebook.py[line:374] - INFO: All has been finished successfully. 2022-02-25 13:22:26,529 - /opt/phasebook/scripts/phasebook.py[line:375] - INFO: The final output haplotype aware contigs are here: ./contigs.fa 2022-02-25 13:22:26,529 - /opt/phasebook/scripts/phasebook.py[line:376] - INFO: Thank you for using phasebook! [user@cn0881 TESTDATA]$ ls 1.split_fastx 3.cluster 5.polish clustered_reads.list phasebook.log reference.fa 2.overlap 4.asm_supereads all.supereads.fa contigs.fa reads.fa [user@cn0881 TESTDATA]$ exit exit salloc: Relinquishing job allocation 33141417 [user@biowulf ~]$
Create a batch input file (e.g. phasebook.sh). For example:
#!/bin/bash set -e mkdir -p /data/${USER}/phasebook/${SLURM_JOB_ID} cd /data/${USER}/phasebook/${SLURM_JOB_ID} module load phasebook phasebook.py -i /path/to/reads.fa -t $SLURM_CPUS_PER_TASK -p hifi -g small -x
Submit this job using the Slurm sbatch command.
sbatch [--cpus-per-task=#] [--mem=#] phasebook.sh
Create a swarmfile (e.g. phasebook.swarm). For example:
mkdir -p /data/${USER}/phasebook/${SLURM_JOB_ID}; cd !$; phasebook.py -i /path/to/reads1.fa -t $SLURM_CPUS_PER_TASK -p hifi -g small -x mkdir -p /data/${USER}/phasebook/${SLURM_JOB_ID}; cd !$; phasebook.py -i /path/to/reads2.fa -t $SLURM_CPUS_PER_TASK -p hifi -g small -x mkdir -p /data/${USER}/phasebook/${SLURM_JOB_ID}; cd !$; phasebook.py -i /path/to/reads3.fa -t $SLURM_CPUS_PER_TASK -p hifi -g small -x mkdir -p /data/${USER}/phasebook/${SLURM_JOB_ID}; cd !$; phasebook.py -i /path/to/reads4.fa -t $SLURM_CPUS_PER_TASK -p hifi -g small -x
Submit this job using the swarm command.
swarm -f phasebook.swarm [-g #] [-t #] --module phasebookwhere
-g # | Number of Gigabytes of memory required for each process (1 line in the swarm command file) |
-t # | Number of threads/CPUs required for each process (1 line in the swarm command file). |
--module phasebook | Loads the phasebook module for each subjob in the swarm |