Roary is a high speed pan genome pipeline, which takes annotated assemblies in GFF3 format (produced by Prokka) and calculates the pan genome.
ROARY_TEST_DATA
For this interactive session we will follow the roary Tutorial. In addition to roary we will also need prokka for annotating the bacterial genomes
[user@biowulf]$ sinteractive --cpus-per-task=6 --mem=20g --gres=lscratch:20 salloc.exe: Pending job allocation 46116226 salloc.exe: job 46116226 queued and waiting for resources salloc.exe: job 46116226 has been allocated resources salloc.exe: Granted job allocation 46116226 salloc.exe: Waiting for resource configuration salloc.exe: Nodes cn3144 are ready for job [user@cn3144]$ module load roary prokka [user@cn3144]$ cd /lscratch/$SLURM_JOB_ID [user@cn3144]$ cp ${ROARY_TEST_DATA:-none}/* . [user@cn3144]$ ls -1 GCA_000008285.1_ASM828v1_genomic.fna GCA_000021185.1_ASM2118v1_genomic.fna GCA_000026705.1_ASM2670v1_genomic.fna GCA_000168635.2_ASM16863v2_genomic.fna GCA_000168815.1_ASM16881v1_genomic.fna GCA_000196035.1_ASM19603v1_genomic.fna
Annotate the genomes with prokka and create a pan genome with roary.
[user@cn3144]$ fastas=( *.fna ) [user@cn3144]$ for fasta in ${fastas[@]}; do acc=${fasta:0:13} prokka --kingdom Bacteria --outdir prokka_$acc --genus Listeria \ --locustag $acc --prefix $acc --cpus $SLURM_CPUS_PER_TASK $fasta done [user@cn3144]$ roary -p $SLURM_CPUS_PER_TASK -f ./roary_demo -e -n -v -r */*.gff [...snip...] [user@cn3144]$ exit salloc.exe: Relinquishing job allocation 46116226 [user@biowulf]$
Create a batch input file (e.g. roary.sh), which uses the input file 'roary.in'. For example:
#! /bin/bash # this file is roary.batch function die { echo "$@" >&2 exit 1 } wd=$PWD module load roary/3.12.0 || die "Could not load modules" cd /lscratch/$SLURM_JOB_ID || die "no lscratch" roary -p ${SLURM_CPUS_PER_TASK} \ -f roary_out -e -n -r -v ./gff/*.gff \ && mv roary_out $wd
Submit this job using the Slurm sbatch command.
sbatch --gres=lscratch:10 --cpus-per-task=6 --mem=6g roary.sh
Create a swarmfile (e.g. roary.swarm). For example:
roary -p $SLURM_CPUS_PER_TASK -f ./roary_species1 -e -n -v -r species1/*.gff roary -p $SLURM_CPUS_PER_TASK -f ./roary_species2 -e -n -v -r species2/*.gff roary -p $SLURM_CPUS_PER_TASK -f ./roary_species3 -e -n -v -r species3/*.gff
Submit this job using the swarm command.
swarm -f roary.swarm -g 6 -t 6 --module roary/3.12.0where
-g # | Number of Gigabytes of memory required for each process (1 line in the swarm command file) |
-t # | Number of threads/CPUs required for each process (1 line in the swarm command file). |
--module roary | Loads the roary module for each subjob in the swarm |