BUSCO completeness assessments employ sets of Benchmarking Universal Single-Copy Orthologs from OrthoDB to provide quantitative measures of the completeness of genome assemblies, annotated gene sets, and transcriptomes in terms of expected gene content.
BUSCO configuration files between different BUSCO versions will likely be incompatible.
Please rename the configuration file in your home directory, busco.config to, say, busco.config.OLD. When you load different versions of BUSCO, a new configuration file will be created in your home directory ~/busco.config but only if it doesn't already exist. As a result it is important to rename previous configuration files when switching between different BUSCO versions. You can also set the contents of the environment variable BUSCO_CONFIG_FILE to a path and file name of your choosing if you are planning to go back and forth between different versions of BUSCO.
Allocate an interactive session and run the program.
Sample session (user input in bold):
[user@biowulf]$ sinteractive -c 6 --mem=6G salloc.exe: Pending job allocation 46116226 salloc.exe: job 46116226 queued and waiting for resources salloc.exe: job 46116226 has been allocated resources salloc.exe: Granted job allocation 46116226 salloc.exe: Waiting for resource configuration salloc.exe: Nodes cn3144 are ready for job [user@cn3144 ~]$ module load busco $ module load busco [+] Loading singularity 3.7.1 on cn3144 [+] Loading busco 4.1.3 Busco Config file /home/user/busco.config does not exist copying from /usr/local/apps/busco/4.1.3/share/config.ini.v4.1.3 You should edit this file to set your own parameters Augustus Config directory /home/user/augustus.config does not exist copying from /usr/local/apps/busco/4.1.3/share/augustus_config #####Edit the busco.config file to set any additional parameters desired [user@cn3144 ~]$ nano ~/busco.config ##### Copy the sample data set [user@cn3144 ~]$ cp $BUSCO_EXAMPLE_DATA/bacteria/genome.fna . ##### Run BUSCO [user@cn3144 ~]$ busco -i genome.fna -c 6 -m geno -f --out test_bacteria INFO: ***** Start a BUSCO v4.1.3 analysis, current time: 03/09/2021 12:22:58 ***** INFO: Configuring BUSCO with /home/user/busco.config INFO: Mode is genome INFO: Input file is genome.fna INFO: Downloading information on latest versions of BUSCO data... WARNING: Running Auto Lineage Selector as no lineage dataset was specified. This will take a little longer than normal. If you know what lineage dataset you want to use, please specify this in the config file or using the -l (--lineage-dataset) flag in the command line. INFO: No lineage specified. Running lineage auto selector. INFO: ***** Starting Auto Select Lineage ***** This process runs BUSCO on the generic lineage datasets for the domains archaea, bacteria and eukaryota. Once the optimal domain is selected, BUSCO automatically attempts to find the most appropriate BUSCO dataset to use based on phylogenetic placement. --auto-lineage-euk and --auto-lineage-prok are also available if you know your input assembly is, or is not, an eukaryote. See the user guide for more information. A reminder: Busco evaluations are valid when an appropriate dataset is used, i.e., the dataset belongs to the lineage of the species to test. Because of overlapping markers/spurious matches among domains, busco matches in another domain do not necessarily mean that your genome/proteome contains sequences from this domain. However, a high busco score in multiple domains might help you identify possible contaminations. INFO: Downloading file 'https://busco-data.ezlab.org/v4/data/lineages/archaea_odb10.2020-03-06.tar.gz' INFO: Decompressing file '/home/user/busco_downloads/lineages/archaea_odb10.tar.gz' INFO: Running BUSCO using lineage dataset archaea_odb10 (prokaryota, 2020-03-06) INFO: ***** Run Prodigal on input to predict and extract genes ***** INFO: Running Prodigal with genetic code 11 in single mode INFO: Running 1 job(s) on prodigal, starting at 03/09/2021 12:23:07 INFO: [prodigal] 1 of 1 task(s) completed INFO: Genetic code 11 selected as optimal INFO: ***** Run HMMER on gene sequences ***** [...] [user@cn3144 ~]$ exit salloc.exe: Relinquishing job allocation 46116226 [user@biowulf ~]$
Create a batch input file (e.g. busco.sh). The example below assumes that you have downloaded the lineage-specific profile library and edited the BUSCO config as desired.
#!/bin/bash set -e module load busco/4.1.3 busco -m genome -i target.fa -o test2 -l bacteria_odb10
Submit this job using the Slurm sbatch command.
sbatch [--cpus-per-task=#] [--mem=#] busco.sh