High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
BuddySuite on NIH HPC Systems
The BuddySuite modules are designed to be 'one-stop-shop' command line tools for common biological data file manipulations. They include: BuddySuite was developed by Steve Bond at NHGRI. BuddySuite website

The short form names of each toolkit are below. Users who use the BuddySuite frequently are likely to want to use the short names, as in some of the examples below.

SeqBuddy sb
AlignBuddy alb
PhyloBuddy pb
DatabaseBuddy db

On Helix

Sample session on Helix to convert a sequence file from uppercase to lowercase.

[susanc@helix ~]$ module load BuddySuite
[+] Loading BuddySuite 1.1 ...
[+] Loading python 3.3.5 ...
[+] Loading OpenMPI 1.8.1 for GCC 4.4.7 (ethernet) ...

[susanc@helix ~]$ seqbuddy aaa02816.fasta -lc
>AAA02816.1 AAA02816.1 dnaN protein

Batch job on Biowulf

The following example translates a Genbank-formatted nucleotide sequence file into all 6 reading frames and returns the output in Fasta format, then generates a table of sequence compositions.


module load BuddySuite/1.1

sb NM_169638.4 -tr6  -o fasta | sb -cr

Submit this job using the Slurm sbatch command:

sbatch myjob.bat
In this example, the output would appear in the file slurm-#####.out (where ##### is the job number). If you want the output in a specific filename, the command in your batch script would be:
sb NM_169638.4 -tr6  -o fasta | sb -cr > myfilename.out

Swarm of jobs

On Biowulf, most users will want to use BuddySuite to run large numbers of jobs, e.g. on different input sequences. The swarm utility is ideal for this purpose.

Set up a swarm command file along the following lines. This run will convert amino acid sequences into codons for sequence1, sequence2, sequence3 etc.

sb sequence1 -btr ecoli > sequence1.nt
sb sequence2 -btr ecoli > sequence2.nt
sb sequence3 -btr ecoli > sequence3.nt

Submit this job with:

biowulf% swarm -f swarm.file