Prokka is a pipeline for rapidly annotating prokaryotic genomes. It produces GFF3, GBK and SQN files that are ready for editing in Sequin and ultimately submitted to Genbank/DDJB/ENA.
Allocate an interactive session and run the program. Sample session:
[user@biowulf]$ sinteractive --mem=2g --cpus-per-task=4 salloc.exe: Pending job allocation 46116226 salloc.exe: job 46116226 queued and waiting for resources salloc.exe: job 46116226 has been allocated resources salloc.exe: Granted job allocation 46116226 salloc.exe: Waiting for resource configuration salloc.exe: Nodes cn3144 are ready for job [user@cn3144 ~]$ node$ module load prokka node$ prokka --listdb [16:11:09] Looking for databases in: /opt/anaconda/bin/../db [16:11:09] * Kingdoms: Archaea Bacteria Mitochondria Viruses [16:11:09] * Genera: Enterococcus Staphylococcus [16:11:09] * HMMs: HAMAP [16:11:09] * CMs: Bacteria Viruses node$ cp /usr/local/apps/prokka/TEST_DATA/GCA_000021185.1_ASM2118v1_genomic.fna . node$ prokka --cpus 4 --force \ --kingdom Bacteria \ --outdir prokka_GCA_000021185 \ --genus Listeria \ --locustag GCA_000021185 GCA_000021185.1_ASM2118v1_genomic.fna [...snip...] [user@cn3144 ~]$ exit salloc.exe: Relinquishing job allocation 46116226 [user@biowulf ~]$
Create a batch input file (e.g. prokka.sh), which uses the input file 'prokka.in'. For example:
#! /bin/bash function die { echo "$@" >&2 exit 1 } module load prokka/1.13 || die "Could not load prokka module" cp /usr/local/apps/prokka/TEST_DATA/GCA_000021185.1_ASM2118v1_genomic.fna . \ || die "Could not find test data" prokka --cpus ${SLURM_CPUS_PER_TASK} --force \ --kingdom Bacteria \ --outdir prokka_GCA_000021185 \ --genus Listeria \ --locustag GCA_000021185 GCA_000021185.1_ASM2118v1_genomic.fna
Submit this job using the Slurm sbatch command.
sbatch --cpus-per-task=6 --mem=3g --time=10 prokka.sh
This should create the following output directory:
./prokka_GCA_000021185 |-- GCA_000021185_10252016.err |-- GCA_000021185_10252016.faa |-- GCA_000021185_10252016.ffn |-- GCA_000021185_10252016.fna |-- GCA_000021185_10252016.fsa |-- GCA_000021185_10252016.gbk |-- GCA_000021185_10252016.gff |-- GCA_000021185_10252016.log |-- GCA_000021185_10252016.sqn |-- GCA_000021185_10252016.tbl `-- GCA_000021185_10252016.txt
Copy the example data
biowulf$ cp -r /usr/local/apps/prokka/TEST_DATA .
Create a swarmfile (e.g. prokka.swarm). For example:
prokka --cpus ${SLURM_CPUS_PER_TASK} --force --kingdom Bacteria --outdir prokka_GCA_000008285 \ --genus Listeria --locustag GCA_000008285 TEST_DATA/GCA_000008285.1_ASM828v1_genomic.fna prokka --cpus ${SLURM_CPUS_PER_TASK} --force --kingdom Bacteria --outdir prokka_GCA_000021185 \ --genus Listeria --locustag GCA_000021185 TEST_DATA/GCA_000021185.1_ASM2118v1_genomic.fna prokka --cpus ${SLURM_CPUS_PER_TASK} --force --kingdom Bacteria --outdir prokka_GCA_000026705 \ --genus Listeria --locustag GCA_000026705 TEST_DATA/GCA_000026705.1_ASM2670v1_genomic.fna prokka --cpus ${SLURM_CPUS_PER_TASK} --force --kingdom Bacteria --outdir prokka_GCA_000168635 \ --genus Listeria --locustag GCA_000168635 TEST_DATA/GCA_000168635.2_ASM16863v2_genomic.fna prokka --cpus ${SLURM_CPUS_PER_TASK} --force --kingdom Bacteria --outdir prokka_GCA_000168815 \ --genus Listeria --locustag GCA_000168815 TEST_DATA/GCA_000168815.1_ASM16881v1_genomic.fna prokka --cpus ${SLURM_CPUS_PER_TASK} --force --kingdom Bacteria --outdir prokka_GCA_000196035 \ --genus Listeria --locustag GCA_000196035 TEST_DATA/GCA_000196035.1_ASM19603v1_genomic.fna
Submit this job using the swarm command.
swarm -f prokka.swarm -g 2 -t 6 --module prokka/1.13where
-g # | Number of Gigabytes of memory required for each process (1 line in the swarm command file) |
-t # | Number of threads/CPUs required for each process (1 line in the swarm command file). |
--module prokka | Loads the prokka module for each subjob in the swarm |