Biowulf High Performance Computing at the NIH
SonicParanoid on Biowulf

SonicParanoid is a stand-alone software tool for the identification of orthologous relationships among multiple species.

References:

Documentation
Important Notes

Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

Allocate an interactive session and run the program.
Sample session (user input in bold):

[user@biowulf ~]$ sinteractive -c16 --mem=32g --gres=lscratch:10
salloc.exe: Pending job allocation 59415428
salloc.exe: job 59415428 queued and waiting for resources
salloc.exe: job 59415428 has been allocated resources
salloc.exe: Granted job allocation 59415428
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn3201 are ready for job
srun: error: x11: no local DISPLAY defined, skipping

[user@cn3201 ~]$ module load sonicparanoid
[+] Loading sonicparanoid  1.3.2  on cn3201
[+] Loading singularity  3.5.3  on cn3201

[user@cn3201 ~]$ cd /lscratch/$SLURM_JOB_ID

[user@cn3201 59415428]$ sonicparanoid-get-test-data --output-directory .
/lscratch/59415428/sonicparanoid_test/
INFO: all test files were succesfully copied to
/lscratch/59415428/sonicparanoid_test/

Go inside the directory
/lscratch/59415428/sonicparanoid_test/
and type

sonicparanoid -i ./test_input -o ./test_output -m fast -t 4

[user@cn3201 59415428]$ cd sonicparanoid_test/

[user@cn3201 sonicparanoid_test]$ tree
.
|-- test_input
|   |-- chlamydia_trachomatis
|   |-- deinococcus_radiodurans
|   |-- gloeobacter_violaceus
|   `-- thermotoga_maritima
`-- test_output
    `-- README.txt

2 directories, 5 files

[user@cn3201 sonicparanoid_test]$ sonicparanoid \
    --input-directory ./test_input \
    --output-directory ./test_output  \
    --mode fast \
    --threads $SLURM_CPUS_ON_NODE

Run START:      Mon Jun  8 17:13:53 2020
SonicParanoid 1.3.2 will be executed with the following parameters:
Run ID: sonic_8620171353_fast_16cpus_ml05
Run directory: /lscratch/59415428/sonicparanoid_test/test_output/runs/sonic_8620171353_fast_16cpus_ml05
Input directory: /lscratch/59415428/sonicparanoid_test/test_input/
Input proteomes: 4
Output directory: /lscratch/59415428/sonicparanoid_test/test_output
Alignments directory: /lscratch/59415428/sonicparanoid_test/test_output/alignments/
Pairwise tables directory: /lscratch/59415428/sonicparanoid_test/test_output/runs/sonic_8620171353_fast_16cpus_ml05/pairwise_orthologs/
Directory with ortholog groups: /lscratch/59415428/sonicparanoid_test/test_output/runs/sonic_8620171353_fast_16cpus_ml05/ortholog_groups/
Pairwise tables database directory: /lscratch/59415428/sonicparanoid_test/test_output/orthologs_db/
Runs directory: /lscratch/59415428/sonicparanoid_test/test_output/runs/
Update run:     False
Create pre-filter indexes:      True
Complete overwrite:     False
Re-create ortholog tables:      False
Threads:        16
Memory per thread (Gigabytes):  15.73
Minimum memory per thread (Gigabytes):  1.75
Run mode:       fast (MMseqs2 s=2.5)
MCL inflation:  1.50
/usr/local/lib/python3.6/dist-packages/sklearn/base.py:334: UserWarning: Trying to unpickle estimator DecisionTreeClassifier from version 0.22.2.post1 when using version 0.23.1. This might lead to breaking code or invalid results. Use at your own risk.
  UserWarning)
/usr/local/lib/python3.6/dist-packages/sklearn/base.py:334: UserWarning: Trying to unpickle estimator AdaBoostClassifier from version 0.22.2.post1 when using version 0.23.1. This might lead to breaking code or invalid results. Use at your own risk.
  UserWarning)

For the 4 input species 6 combinations are possible.

16 MMseqs2 alignments will be performed...
Creating 4 MMseqs2 databases...

MMseqs2 databases creation elapsed time (seconds):      4.195

All-vs-all alignments elapsed time (seconds):   8.272

Predicting 6 ortholog tables...
Ortholog tables creation elapsed time (seconds):        0.26

Creating ortholog groups...

Creating orthology matrixes...
Ortholog matrixes creation elapsed time (seconds):      0.072

Merging inparalog matrixes...
Inparalogs merging elapsed time (seconds):      0.071

Creating input matrix for MCL...
MCL graph creation elapsed time (seconds):      0.091

Running MCL...
MCL execution elapsed time (seconds):   3.461

Generating final output files...
Elapsed time for the creation of final output (seconds):        0.673

Ortholog groups creation elapsed time (seconds):        4.408

Total elapsed time (seconds):   18.217

[user@cn3201 sonicparanoid_test]$ exit
exit
salloc.exe: Relinquishing job allocation 59415428
salloc.exe: Job allocation 59415428 has been revoked.

[user@biowulf ~]$ 

Batch job
Most jobs should be run as batch jobs.

Create a batch input file (e.g. sonicparanoid.sh). For example:

#!/bin/bash
set -e
module load sonicparanoid
sonicparanoid \
    --input-directory ./test_input \
    --output-directory ./test_output  \
    --mode fast \
    --threads $SLURM_CPUS_ON_NODE

Submit this job using the Slurm sbatch command.

sbatch [--cpus-per-task=#] [--mem=#] sonicparanoid.sh
Swarm of Jobs
A swarm of jobs is an easy way to submit a set of independent commands requiring identical resources.

Create a swarmfile (e.g. sonicparanoid.swarm). For example:

sonicparanoid -i ./test_input1 -o ./test_output1 -m fast -t $SLURM_CPUS_ON_NODE
sonicparanoid -i ./test_input2 -o ./test_output2 -m fast -t $SLURM_CPUS_ON_NODE
sonicparanoid -i ./test_input3 -o ./test_output3 -m fast -t $SLURM_CPUS_ON_NODE
sonicparanoid -i ./test_input4 -o ./test_output4 -m fast -t $SLURM_CPUS_ON_NODE

Submit this job using the swarm command.

swarm -f sonicparanoid.swarm [-g #] [-t #] --module sonicparanoid
where
-g # Number of Gigabytes of memory required for each process (1 line in the swarm command file)
-t # Number of threads/CPUs required for each process (1 line in the swarm command file).
--module sonicparanoid Loads the sonicparanoid module for each subjob in the swarm