DIAMOND is a sequence aligner for protein and translated DNA searches, designed for high performance analysis of big sequence data. The key features are:
Allocate an interactive session and run the program. Sample session:
[user@biowulf ~]$ sinteractive -c8 --mem=10g --gres=lscratch:10 salloc.exe: Pending job allocation 12273309 salloc.exe: job 12273309 queued and waiting for resources salloc.exe: job 12273309 has been allocated resources salloc.exe: Granted job allocation 12273309 salloc.exe: Waiting for resource configuration salloc.exe: Nodes cn0885 are ready for job srun: error: x11: no local DISPLAY defined, skipping error: unable to open file /tmp/slurm-spank-x11.12273309.0 slurmstepd: error: x11: unable to read DISPLAY value [user@cn0885 ~]$ module load diamond [+] Loading diamond 2.0.8 on cn0885 [user@cn0885 ~]$ cd /lscratch/$SLURM_JOB_ID [user@cn0885 12273309]$ cp /usr/local/apps/diamond/TEST_DATA/* . [user@cn0885 12273309]$ diamond makedb --in uniprot_sprot.fasta.gz -d uniprot_sprot -p $SLURM_CPUS_PER_TASK diamond v2.0.8.146 (C) Max Planck Society for the Advancement of Science Documentation, support and updates available at http://www.diamondsearch.org #CPU threads: 8 Scoring parameters: (Matrix=BLOSUM62 Lambda=0.267 K=0.041 Penalties=11/1) Database input file: uniprot_sprot.fasta.gz Opening the database file... [0s] Loading sequences... [3.297s] Masking sequences... [2.902s] Writing sequences... [0.348s] Hashing sequences... [0.118s] Loading sequences... [0s] Writing trailer... [0.066s] Closing the input file... [0.017s] Closing the database file... [0.013s] Database hash = 7190f6d1af560ffacdb1351b89d36883 Processed 556568 sequences, 199530821 letters. Total time = 6.765s [user@cn0885 12273309]$ diamond blastx -d uniprot_sprot.dmnd -q reads.fna -p ${SLURM_CPUS_PER_TASK} -o matches.m8 diamond v2.0.8.146 (C) Max Planck Society for the Advancement of Science Documentation, support and updates available at http://www.diamondsearch.org #CPU threads: 8 Scoring parameters: (Matrix=BLOSUM62 Lambda=0.267 K=0.041 Penalties=11/1) Temporary directory: #Target sequences to report alignments for: 25 Opening the database... [0.115s] Database: uniprot_sprot.dmnd (type: Diamond database, sequences: 556568, letters: 199530821) Block size = 2000000000 Opening the input file... [0s] Opening the output file... [0s] Loading query sequences... [0s] Masking queries... [0.003s] Building query seed set... [0.001s] The host system is detected to have 405 GB of RAM. It is recommended to increase the block size for better performance using these parameters : -b8 -c1 Algorithm: Query-indexed Building query histograms... [0s] Allocating buffers... [0s] Loading reference sequences... [0.565s] Masking reference... [2.883s] Initializing temporary storage... [0s] Building reference histograms... [0.154s] Allocating buffers... [0s] Processing query block 1, reference block 1/1, shape 1/1. Building reference seed array... [0.154s] Building query seed array... [0s] Computing hash join... [0s] Building seed filter... [0s] Searching alignments... [0.003s] Deallocating buffers... [0s] Clearing query masking... [0s] Computing alignments... [0.005s] Deallocating reference... [0.036s] Loading reference sequences... [0s] Deallocating buffers... [0s] Deallocating queries... [0s] Loading query sequences... [0s] Closing the input file... [0s] Closing the output file... [0s] Closing the database file... [0.008s] Deallocating taxonomy... [0s] Total time = 4.089s Reported 25 pairwise alignments, 25 HSPs. 1 queries aligned. The host system is detected to have 405 GB of RAM. It is recommended to increase the block size for better performance using these parameters : -b8 -c1 [user@cn0885 12273309]$ exit exit salloc.exe: Relinquishing job allocation 12273309 salloc.exe: Job allocation 12273309 has been revoked. [user@biowulf ~]$
Create a batch input file (e.g. diamond.sh). For example:
#!/bin/bash set -e module load diamond diamond blastx -d uniprot_sprot.dmnd -q reads.fna -p ${SLURM_CPUS_PER_TASK} -o matches.m8
Submit this job using the Slurm sbatch command.
sbatch --cpus-per-task=8 --mem=10g diamond.sh
Create a swarmfile (e.g. diamond.swarm). For example:
diamond blastx -d db_name -q read1.fna -p ${SLURM_CPUS_PER_TASK} -o out1 diamond blastx -d db_name -q read2.fna -p ${SLURM_CPUS_PER_TASK} -o out2 diamond blastx -d db_name -q read3.fna -p ${SLURM_CPUS_PER_TASK} -o out3
Submit this job using the swarm command.
swarm -f diamond.swarm -g 10 -t 8 --module diamondwhere
-g # | Number of Gigabytes of memory required for each process (1 line in the swarm command file) |
-t # | Number of threads/CPUs required for each process (1 line in the swarm command file). |
--module diamond | Loads the diamond module for each subjob in the swarm |