braker:Tool to infer Orthologs from Genome Alignments

braker: a pipeline for fully automated prediction of protein coding gene structures with GeneMark-ES/ET and AUGUSTUS in novel eukaryotic genomes

Quick Links

BRAKER3 is the latest pipeline in the BRAKER suite. It enables the usage of RNA-seq and protein data in a fully automated pipeline to train and predict highly reliable genes with GeneMark-ETP and AUGUSTUS. The result of the pipeline is the combined gene set of both gene prediction tools, which only contains genes with very high support from extrinsic evidence.

References:

BRAKER2: Automatic Eukaryotic Genome Annotation with GeneMark-EP+ and AUGUSTUS Supported by a Protein Database

PubMed

NAR Genomics and Bioinformatics

Documentation

braker GitHub Page

Important Notes

Module Name: braker (see the modules page for more information)
Unusual environment variables set
- BRAKER_TEST_DATA sample data for running braker
- BRAKER has to run some steps on a single thread, others can take advantage of multiple threads. But please do not set up more than 8 threads for BRAKER. Otherwise, the rest of threads will be idle.

Interactive job

Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

Allocate an interactive session and run the program. Sample session:

[user@biowulf]$ sinteractive -c 8 --mem=4g --gres=lscratch:10 
[user@cn3144 ~]$ module load braker 
Loading braker  3

[user@cn3144 ]$ cp -r $BRAKER_TEST_DATA/*.fa .

run testing data

[user@cn3144 ]$ 
	braker.pl --genome=genome.fa --prot_seq=proteins.fa --threads $SLURM_CPUS_PER_TASK

Batch job

Most jobs should be run as batch jobs.

Create a batch input file (e.g. braker.sh). For example:

#! /bin/bash

module load braker || exit 1
wget http://bioinf.uni-greifswald.de/augustus/datasets/RNAseq.bam
braker.pl --genome genome.fa --bam RNAseq.bam --threads $SLURM_CPUS_PER_TASK

Submit this job using the Slurm sbatch command.

sbatch -c 8 --mem=10g braker.sh