DISCOVAR is a new variant caller and DISCOVAR de novo a new genome assembler, both designed for state-of-the-art data. Their inputs are chosen to optimize quality while keeping costs low. Currently it takes as input Illumina reads of length 250 or longer — produced on MiSeq or HiSeq 2500 — and from a single PCR-free library. These data enable a level of completeness and continuity that was not previously possible.
Allocate an interactive session and run the program. Sample session:
[user@biowulf]$ sinteractive -c 4 --mem 10g salloc.exe: Pending job allocation 46116226 salloc.exe: job 46116226 queued and waiting for resources salloc.exe: job 46116226 has been allocated resources salloc.exe: Granted job allocation 46116226 salloc.exe: Waiting for resource configuration salloc.exe: Nodes cn3144 are ready for job [user@cn3144 ~]$ mkdir -p /data/$USER/discovar && cd /data/$USER/discovar [user@cn3144 ~]$ module load discovar [user@cn3144 ~]$ cp $DISCOVAR_TEST/* . [user@cn3144 ~]$ ./run-discovar-assembly.sh running Discovar READS=sample-reads.bam REGIONS='10:30892106-30933760' OUT_HEAD=./discovar-assembly/assembly TMP=./discovar-assembly/tmp Performing re-exec to adjust stack size. -------------------------------------------------------------------------------- Tue Jul 24 08:57:12 2018 run on cn3206, pid=56652 [Jul 23 2018 16:14:14 R52488 ] Discovar READS=sample-reads.bam REGIONS=10:30892106-30933760 \ OUT_HEAD=./discovar-assembly/assembly TMP=./discovar-assembly/tmp -------------------------------------------------------------------------------- Tue Jul 24 08:57:12 2018: there are 9,644 reads Tue Jul 24 08:57:12 2018: mean read length = 250.0 Tue Jul 24 08:57:12 2018: mean base quality = 25.8 [...] DISCOVAR SUMMARY STATS 1 components 45 edges 45553 kmers Tue Jul 24 08:57:31 2018: done, time used = 19.1 seconds, peak mem used = 0.7 GB ==================================================================================== Discovar has completed correctly. See the output in ./discovar-assembly [user@cn3144 ~]$ ./run-discovar-variants.sh running Discovar READS=sample-reads.bam REFERENCE=sample-genome.fasta REGIONS='10:30892106-30933760' OUT_HEAD=./discovar-variants/assembly TMP=./discovar-variants/tmp Performing re-exec to adjust stack size. [...] DISCOVAR SUMMARY STATS 1 components 45 edges 45553 kmers Tue Jul 24 08:58:20 2018: done, time used = 23.3 seconds, peak mem used = 1.5 GB ==================================================================================== Discovar has completed correctly. See the output in ./discovar-variants [user@cn3144 ~]$ exit salloc.exe: Relinquishing job allocation 46116226 [user@biowulf ~]$
Create a batch input file (e.g. discovar.sh). For example:
#!/bin/bash module load discovar cd /data/$USER/ Discovar READS=sample-reads.bam REGIONS='10:30892106-30933760' OUT_HEAD=./output/assembly TMP=/lscratch/$SLURM_JOBID NUM_THREADS=$SLURM_CPUS_PER_TASK
Submit this job using the Slurm sbatch command.
sbatch --cpus-per-task=10 --mem=20g --gres=lscratch:10 discovar.sh
Create a swarmfile (e.g. discovar.swarm). For example:
Discovar READS=sample1-reads.bam [...] OUT_HEAD=./output1 TMP=/lscratch/$SLURM_JOBID NUM_THREADS=$SLURM_CPUS_PER_TASK Discovar READS=sample2-reads.bam [...] OUT_HEAD=./output2 TMP=/lscratch/$SLURM_JOBID NUM_THREADS=$SLURM_CPUS_PER_TASK Discovar READS=sample3-reads.bam [...] OUT_HEAD=./output3 TMP=/lscratch/$SLURM_JOBID NUM_THREADS=$SLURM_CPUS_PER_TASK
Submit this job using the swarm command.
swarm -f discovar.swarm -gres=lscratch:10 -g 10 -t 10 --module discovarwhere
-g # | Number of Gigabytes of memory required for each process (1 line in the swarm command file) |
-t # | Number of threads/CPUs required for each process (1 line in the swarm command file). |
--module discovar | Loads the discovar module for each subjob in the swarm |