Diamond on Biowulf

DIAMOND is a sequence aligner for protein and translated DNA searches, designed for high performance analysis of big sequence data. The key features are:

References:

Documentation
Important Notes

Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

Allocate an interactive session and run the program. Sample session:

[user@biowulf ~]$ sinteractive -c8 --mem=10g --gres=lscratch:10
salloc.exe: Pending job allocation 12273309
salloc.exe: job 12273309 queued and waiting for resources
salloc.exe: job 12273309 has been allocated resources
salloc.exe: Granted job allocation 12273309
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn0885 are ready for job
srun: error: x11: no local DISPLAY defined, skipping
error: unable to open file /tmp/slurm-spank-x11.12273309.0
slurmstepd: error: x11: unable to read DISPLAY value

[user@cn0885 ~]$ module load diamond
[+] Loading diamond  2.0.8  on cn0885

[user@cn0885 ~]$ cd /lscratch/$SLURM_JOB_ID

[user@cn0885 12273309]$ cp /usr/local/apps/diamond/TEST_DATA/* .

[user@cn0885 12273309]$ diamond makedb --in uniprot_sprot.fasta.gz -d uniprot_sprot -p $SLURM_CPUS_PER_TASK
diamond v2.0.8.146 (C) Max Planck Society for the Advancement of Science
Documentation, support and updates available at http://www.diamondsearch.org

#CPU threads: 8
Scoring parameters: (Matrix=BLOSUM62 Lambda=0.267 K=0.041 Penalties=11/1)
Database input file: uniprot_sprot.fasta.gz
Opening the database file...  [0s]
Loading sequences...  [3.297s]
Masking sequences...  [2.902s]
Writing sequences...  [0.348s]
Hashing sequences...  [0.118s]
Loading sequences...  [0s]
Writing trailer...  [0.066s]
Closing the input file...  [0.017s]
Closing the database file...  [0.013s]
Database hash = 7190f6d1af560ffacdb1351b89d36883
Processed 556568 sequences, 199530821 letters.
Total time = 6.765s

[user@cn0885 12273309]$ diamond blastx -d uniprot_sprot.dmnd -q reads.fna -p ${SLURM_CPUS_PER_TASK} -o matches.m8
diamond v2.0.8.146 (C) Max Planck Society for the Advancement of Science
Documentation, support and updates available at http://www.diamondsearch.org

#CPU threads: 8
Scoring parameters: (Matrix=BLOSUM62 Lambda=0.267 K=0.041 Penalties=11/1)
Temporary directory:
#Target sequences to report alignments for: 25
Opening the database...  [0.115s]
Database: uniprot_sprot.dmnd (type: Diamond database, sequences: 556568, letters: 199530821)
Block size = 2000000000
Opening the input file...  [0s]
Opening the output file...  [0s]
Loading query sequences...  [0s]
Masking queries...  [0.003s]
Building query seed set...  [0.001s]
The host system is detected to have 405 GB of RAM. It is recommended to increase the block size for better performance using these parameters : -b8 -c1
Algorithm: Query-indexed
Building query histograms...  [0s]
Allocating buffers...  [0s]
Loading reference sequences...  [0.565s]
Masking reference...  [2.883s]
Initializing temporary storage...  [0s]
Building reference histograms...  [0.154s]
Allocating buffers...  [0s]
Processing query block 1, reference block 1/1, shape 1/1.
Building reference seed array...  [0.154s]
Building query seed array...  [0s]
Computing hash join...  [0s]
Building seed filter...  [0s]
Searching alignments...  [0.003s]
Deallocating buffers...  [0s]
Clearing query masking...  [0s]
Computing alignments...  [0.005s]
Deallocating reference...  [0.036s]
Loading reference sequences...  [0s]
Deallocating buffers...  [0s]
Deallocating queries...  [0s]
Loading query sequences...  [0s]
Closing the input file...  [0s]
Closing the output file...  [0s]
Closing the database file...  [0.008s]
Deallocating taxonomy...  [0s]
Total time = 4.089s
Reported 25 pairwise alignments, 25 HSPs.
1 queries aligned.
The host system is detected to have 405 GB of RAM. It is recommended to increase the block size for better performance using these parameters : -b8 -c1

[user@cn0885 12273309]$ exit
exit
salloc.exe: Relinquishing job allocation 12273309
salloc.exe: Job allocation 12273309 has been revoked.

[user@biowulf ~]$ 

Batch job
Most jobs should be run as batch jobs.

Create a batch input file (e.g. diamond.sh). For example:

#!/bin/bash
set -e
module load diamond
diamond blastx -d uniprot_sprot.dmnd -q reads.fna -p ${SLURM_CPUS_PER_TASK} -o matches.m8

Submit this job using the Slurm sbatch command.

sbatch --cpus-per-task=8 --mem=10g diamond.sh
Swarm of Jobs
A swarm of jobs is an easy way to submit a set of independent commands requiring identical resources.

Create a swarmfile (e.g. diamond.swarm). For example:

diamond blastx -d db_name -q read1.fna -p ${SLURM_CPUS_PER_TASK} -o out1
diamond blastx -d db_name -q read2.fna -p ${SLURM_CPUS_PER_TASK} -o out2
diamond blastx -d db_name -q read3.fna -p ${SLURM_CPUS_PER_TASK} -o out3

Submit this job using the swarm command.

swarm -f diamond.swarm -g 10 -t 8 --module diamond
where
-g # Number of Gigabytes of memory required for each process (1 line in the swarm command file)
-t # Number of threads/CPUs required for each process (1 line in the swarm command file).
--module diamond Loads the diamond module for each subjob in the swarm