Roary is a high speed pan genome pipeline, which takes annotated assemblies in GFF3 format (produced by Prokka) and calculates the pan genome.
ROARY_TEST_DATAFor this interactive session we will follow the roary Tutorial. In addition to roary we will also need prokka for annotating the bacterial genomes
[user@biowulf]$ sinteractive --cpus-per-task=6 --mem=20g --gres=lscratch:20
salloc.exe: Pending job allocation 46116226
salloc.exe: job 46116226 queued and waiting for resources
salloc.exe: job 46116226 has been allocated resources
salloc.exe: Granted job allocation 46116226
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn3144 are ready for job
[user@cn3144]$ module load roary prokka
[user@cn3144]$ cd /lscratch/$SLURM_JOB_ID
[user@cn3144]$ cp ${ROARY_TEST_DATA:-none}/* .
[user@cn3144]$ ls -1
GCA_000008285.1_ASM828v1_genomic.fna
GCA_000021185.1_ASM2118v1_genomic.fna
GCA_000026705.1_ASM2670v1_genomic.fna
GCA_000168635.2_ASM16863v2_genomic.fna
GCA_000168815.1_ASM16881v1_genomic.fna
GCA_000196035.1_ASM19603v1_genomic.fna
Annotate the genomes with prokka and create a pan genome with roary.
[user@cn3144]$ fastas=( *.fna )
[user@cn3144]$ for fasta in ${fastas[@]}; do
acc=${fasta:0:13}
prokka --kingdom Bacteria --outdir prokka_$acc --genus Listeria \
--locustag $acc --prefix $acc --cpus $SLURM_CPUS_PER_TASK $fasta
done
[user@cn3144]$ roary -p $SLURM_CPUS_PER_TASK -f ./roary_demo -e -n -v -r */*.gff
[...snip...]
[user@cn3144]$ exit
salloc.exe: Relinquishing job allocation 46116226
[user@biowulf]$
Create a batch input file (e.g. roary.sh), which uses the input file 'roary.in'. For example:
#! /bin/bash
# this file is roary.batch
function die {
echo "$@" >&2
exit 1
}
wd=$PWD
module load roary/3.12.0 || die "Could not load modules"
cd /lscratch/$SLURM_JOB_ID || die "no lscratch"
roary -p ${SLURM_CPUS_PER_TASK} \
-f roary_out -e -n -r -v ./gff/*.gff \
&& mv roary_out $wd
Submit this job using the Slurm sbatch command.
sbatch --gres=lscratch:10 --cpus-per-task=6 --mem=6g roary.sh
Create a swarmfile (e.g. roary.swarm). For example:
roary -p $SLURM_CPUS_PER_TASK -f ./roary_species1 -e -n -v -r species1/*.gff roary -p $SLURM_CPUS_PER_TASK -f ./roary_species2 -e -n -v -r species2/*.gff roary -p $SLURM_CPUS_PER_TASK -f ./roary_species3 -e -n -v -r species3/*.gff
Submit this job using the swarm command.
swarm -f roary.swarm -g 6 -t 6 --module roary/3.12.0where
| -g # | Number of Gigabytes of memory required for each process (1 line in the swarm command file) |
| -t # | Number of threads/CPUs required for each process (1 line in the swarm command file). |
| --module roary | Loads the roary module for each subjob in the swarm |