phasebook is a novel approach for reconstructing the haplotypes of diploid genomes from long reads de novo, that is without the need for a reference genome.


Interactive job
[user@biowulf ~]$ sinteractive -c8 --mem=4g --gres=lscratch:10
salloc: Pending job allocation 33141417
salloc: job 33141417 queued and waiting for resources
salloc: job 33141417 has been allocated resources
salloc: Granted job allocation 33141417
salloc: Waiting for resource configuration
salloc: Nodes cn0881 are ready for job
srun: error: x11: no local DISPLAY defined, skipping
error: unable to open file /tmp/slurm-spank-x11.33141417.0
slurmstepd: error: x11: unable to read DISPLAY value

[user@cn0881 ~]$ module load phasebook
[+] Loading phasebook  1.0.0  on cn0881
[+] Loading singularity  3.8.5-1  on cn0881

[user@cn0881 ~]$ cd /lscratch/$SLURM_JOB_ID

[user@cn0881 33141417]$ cp -r $PHASEBOOK_TESTDATA .

[user@cn0881 33141417]$ cd TESTDATA

[user@cn0881 TESTDATA]$ -i reads.fa -t $SLURM_CPUS_PER_TASK -p hifi -g small -x
use preset parameters...
2022-02-25 13:20:59,226 - /opt/phasebook/scripts/[line:262] - INFO: splitting input fastx file into 1 subfiles...
2022-02-25 13:20:59,361 - /opt/phasebook/scripts/[line:270] - INFO: splitting finished.
start polishing...
2022-02-25 13:22:26,529 - /opt/phasebook/scripts/[line:374] - INFO: All has been finished successfully.

2022-02-25 13:22:26,529 - /opt/phasebook/scripts/[line:375] - INFO: The final output haplotype aware contigs are here: ./contigs.fa

2022-02-25 13:22:26,529 - /opt/phasebook/scripts/[line:376] - INFO: Thank you for using phasebook!

[user@cn0881 TESTDATA]$ ls
1.split_fastx  3.cluster        5.polish          clustered_reads.list  phasebook.log  reference.fa
2.overlap      4.asm_supereads  all.supereads.fa  contigs.fa            reads.fa

[user@cn0881 TESTDATA]$ exit
salloc: Relinquishing job allocation 33141417

[user@biowulf ~]$

Batch job
set -e
mkdir -p /data/${USER}/phasebook/${SLURM_JOB_ID}
cd /data/${USER}/phasebook/${SLURM_JOB_ID}
module load phasebook -i /path/to/reads.fa -t $SLURM_CPUS_PER_TASK -p hifi -g small -x

Submit this job using the Slurm sbatch command.

sbatch [--cpus-per-task=#] [--mem=#]
Swarm of Jobs
mkdir -p /data/${USER}/phasebook/${SLURM_JOB_ID}; cd !$; -i /path/to/reads1.fa -t $SLURM_CPUS_PER_TASK -p hifi -g small -x
mkdir -p /data/${USER}/phasebook/${SLURM_JOB_ID}; cd !$; -i /path/to/reads2.fa -t $SLURM_CPUS_PER_TASK -p hifi -g small -x
mkdir -p /data/${USER}/phasebook/${SLURM_JOB_ID}; cd !$; -i /path/to/reads3.fa -t $SLURM_CPUS_PER_TASK -p hifi -g small -x
mkdir -p /data/${USER}/phasebook/${SLURM_JOB_ID}; cd !$; -i /path/to/reads4.fa -t $SLURM_CPUS_PER_TASK -p hifi -g small -x

Submit this job using the swarm command.

swarm -f phasebook.swarm [-g #] [-t #] --module phasebook
-g # Number of Gigabytes of memory required for each process (1 line in the swarm command file)
-t # Number of threads/CPUs required for each process (1 line in the swarm command file).
--module phasebook Loads the phasebook module for each subjob in the swarm