Biowulf High Performance Computing at the NIH
busco on Biowulf

BUSCO completeness assessments employ sets of Benchmarking Universal Single-Copy Orthologs from OrthoDB (www.orthodb.org) to provide quantitative measures of the completeness of genome assemblies, annotated gene sets, and transcriptomes in terms of expected gene content.

References:

Documentation
Important Notes

Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

Allocate an interactive session and run the program.
Sample session (user input in bold):

[user@biowulf]$ sinteractive
salloc.exe: Pending job allocation 46116226
salloc.exe: job 46116226 queued and waiting for resources
salloc.exe: job 46116226 has been allocated resources
salloc.exe: Granted job allocation 46116226
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn3144 are ready for job

[user@cn3144 ~]$ module load busco
[+] Loading busco  3.0.2
Busco Config file /home/susanc/busco.config  does not exist
   copying from /usr/local/apps/busco/config/config.ini.default
 You should edit this file to set your own parameters
Augustus Config directory /home/susanc/augustus.config  does not exist
   copying from /usr/local/apps/busco/Augustus/config

#####Edit the busco.config file to set any additional parameters desired

[user@cn3144 ~]$ nano ~/busco.config

##### Download and unpack the required lineage-specific profile library.
[user@cn3144 ~]$ wget https://busco.ezlab.org/datasets/bacteria_odb9.tar.gz
[user@cn3144 ~]$ tar xvzf bacteria_odb9.tar.gz

##### Copy the sample data set 

[user@cn3144 ~]$ cp -r /usr/local/apps/busco/sample_data  .
[user@cn3144 ~]$ cd sample_data

##### Run BUSCO
[user@cn3144 ~]$ run_BUSCO.py -i target.fa -i target.fa -o test -l ../bacteria_odb9/ -m geno
INFO	****************** Start a BUSCO 3.0.2 analysis, current time: 10/01/2018 11:32:08 ******************
INFO	Configuration loaded from /home/susanc/busco.config
INFO	Init tools...
INFO	Check dependencies...
INFO	Check input file...
INFO	To reproduce this run: python /usr/local/apps/busco/3.0.2/run_BUSCO.py -i target.fa -o test -l ../bacteria_odb9/ -m genome -c 1 -sp E_coli_K12
INFO	Mode is: genome
INFO	The lineage dataset is: bacteria_odb9 (prokaryota)
[....]
INFO	Results written in /spin1/users/susanc/busco/sample_data/run_test/

[user@cn3144 ~]$ exit
salloc.exe: Relinquishing job allocation 46116226
[user@biowulf ~]$

Batch job
Most jobs should be run as batch jobs.

Create a batch input file (e.g. busco.sh). The example below assumes that you have downloaded the lineage-specific profile library and edited the BUSCO config as desired.

#!/bin/bash
set -e
module load busco
run_BUSCO.py  -i target.fa -o test -l ../bacteria_odb9/ -m geno

Submit this job using the Slurm sbatch command.

sbatch [--cpus-per-task=#] [--mem=#] busco.sh