Biowulf High Performance Computing at the NIH
braker: a pipeline for fully automated prediction of protein coding gene structures with GeneMark-ES/ET and AUGUSTUS in novel eukaryotic genomes

BRAKER3 is the latest pipeline in the BRAKER suite. It enables the usage of RNA-seq and protein data in a fully automated pipeline to train and predict highly reliable genes with GeneMark-ETP and AUGUSTUS. The result of the pipeline is the combined gene set of both gene prediction tools, which only contains genes with very high support from extrinsic evidence.

References:

Documentation
Important Notes

Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

Allocate an interactive session and run the program. Sample session:

[user@biowulf]$ sinteractive -c 8 --mem=4g --gres=lscratch:10 
[user@cn3144 ~]$ module load braker 
Loading braker  3

[user@cn3144 ]$ cp -r $BRAKER_TEST_DATA/*.fa . 
run testing data
[user@cn3144 ]$ 
	braker.pl --genome=genome.fa --prot_seq=proteins.fa --threads $SLURM_CPUS_PER_TASK

Batch job
Most jobs should be run as batch jobs.

Create a batch input file (e.g. braker.sh). For example:

#! /bin/bash

module load braker || exit 1
wget http://bioinf.uni-greifswald.de/augustus/datasets/RNAseq.bam
braker.pl --genome genome.fa --bam RNAseq.bam --threads $SLURM_CPUS_PER_TASK

Submit this job using the Slurm sbatch command.

sbatch -c 8 --mem=10g braker.sh