braker: a pipeline for fully automated prediction of protein coding gene structures with GeneMark-ES/ET and AUGUSTUS in novel eukaryotic genomes
BRAKER3 is the latest pipeline in the BRAKER suite. It enables the usage of RNA-seq and protein data in a fully automated pipeline to train and predict highly reliable genes with GeneMark-ETP and AUGUSTUS. The result of the pipeline is the combined gene set of both gene prediction tools, which only contains genes with very high support from extrinsic evidence.
References:
-
Bruna, T., Hoff, K.J., Lomsadze, A., Stanke, M., & Borodovsky, M. BRAKER2: Automatic Eukaryotic Genome Annotation with GeneMark-EP+ and AUGUSTUS Supported by a Protein Database NAR Genomics and Bioinformatics 2021, 3(1):lqaa108, doi: 10.1093/nargab/lqaa108.
PubMed NAR Genomics and Bioinformatics, 2021
Documentation
Important Notes
- Module Name: braker (see the modules page for more information)
- Unusual environment variables set
- BRAKER_TEST_DATA sample data for running braker
- BRAKER has to run some steps on a single thread, others can take advantage of multiple threads. But please do not set up more than 8 threads for BRAKER. Otherwise, the rest of threads will be idle.
Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.
Allocate an interactive session and run the program. Sample session:
[user@biowulf]$ sinteractive -c 8 --mem=4g --gres=lscratch:10 [user@cn3144 ~]$ module load braker Loading braker 3 [user@cn3144 ]$ cp -r $BRAKER_TEST_DATA/*.fa .run testing data
[user@cn3144 ]$ braker.pl --genome=genome.fa --prot_seq=proteins.fa --threads $SLURM_CPUS_PER_TASK
Batch job
Most jobs should be run as batch jobs.
Create a batch input file (e.g. braker.sh). For example:
#! /bin/bash module load braker || exit 1 wget http://bioinf.uni-greifswald.de/augustus/datasets/RNAseq.bam braker.pl --genome genome.fa --bam RNAseq.bam --threads $SLURM_CPUS_PER_TASK
Submit this job using the Slurm sbatch command.
sbatch -c 8 --mem=10g braker.sh