The three-dimensional co-ordinates of each protein are used to calculate residue - residue distance matrices.
#!/bin/bash #SBATCH -J dali_test --ntasks=4 --nodes=1 rm -rf test ml dali $DALI_HOME/test.csh
Allocate an interactive session and run the program.
Sample session (user input in bold):
[user@biowulf]$ sinteractive --ntasks=4 --nodes=1 salloc.exe: Pending job allocation 46116226 salloc.exe: job 46116226 queued and waiting for resources salloc.exe: job 46116226 has been allocated resources salloc.exe: Granted job allocation 46116226 salloc.exe: Waiting for resource configuration salloc.exe: Nodes cn3144 are ready for job [user@cn3144 ~]$ module load dali [user@cn3144 ~]$ cp /pdb/pdb/pp/pdb1ppt.ent.gz . [user@cn3144 ~]$ cp /pdb/pdb/bb/pdb1bba.ent.gz . [user@cn3144 ~]$ import.pl --pdbfile pdb1ppt.ent.gz --pdbid 1ppt --dat ./ [user@cn3144 ~]$ import.pl --pdbfile pdb1bba.ent.gz --pdbid 1bba --dat ./ [user@cn3144 ~]$ dali.pl --pdbfile1 pdb1ppt.ent.gz --pdbfile2 pdb1bba.ent.gz --dat1 ./ --dat2 ./ --outfmt "summary,alignments" [user@cn3144 ~]$ cat mol1A.txt # Job: test # Query: mol1A # No: Chain Z rmsd lali nres %id PDB Description 1: mol2-A 3.6 1.8 33 36 39 MOLECULE: BOVINE PANCREATIC POLYPEPTIDE; # Pairwise alignments No 1: Query=mol1A Sbjct=mol2A Z-score=3.6 DSSP LLLLLLLLLLLLLHHHHHHHHHHHHHHHHHHLLlll Query GPSQPTYPGDDAPVEDLIRFYDNLQQYLNVVTRhry 36 ident | | |||| | | | | | || Sbjct APLEPEYPGDNATPEQMAQYAAELRRYINMLTRpry 36 DSSP LLLLLLLLLLLLLLLHHHHHHHHHHHHHHHHLLlll [user@cn3144 ~]$ exit salloc.exe: Relinquishing job allocation 46116226 [user@biowulf ~]$
Create a batch input file (e.g. dali.sh). For example:
#!/bin/bash module load dali import.pl --pdbfile pdb1ppt.ent.gz --pdbid 1ppt --dat ./ import.pl --pdbfile pdb1bba.ent.gz --pdbid 1bba --dat ./ dali.pl --pdbfile1 pdb1ppt.ent.gz --pdbfile2 pdb1bba.ent.gz --dat1 ./ --dat2 ./ --outfmt "summary,alignments"
Submit this job using the Slurm sbatch command.
sbatch dali.sh
In certain circumstances, dali can be accelerated using MPI. To do so, include --np $SLURM_NTASKS with the command, and submit the job using --ntasks=# --nodes=1 , where # is the number of MPI tasks requested. MPI only works on a single node, so # must be less than the maximum number of cpus available on a single node. At present the maximum is 128; however, most nodes have only 56 cpus and so jobs requesting more than 56 cpus may wait a considerable time in the queue.
... dali.pl --np $SLURM_NTASKS ... ...
Submit this job using the Slurm sbatch command.
sbatch --ntasks=32 --nodes=1 dali.sh
Running with the AlphaFold database:
#!/bin/bash module load dali zcat /pdb/pdb/fd/pdb1fd3.ent.gz > 1fd3.pdb import.pl --pdbfile 1fd3.pdb --pdbid 1fd3 --dat ./ --clean dali.pl \ --title "my search" \ --cd1 1fd3B \ --dat1 ./ \ --db ${DALI_AF}/Digest/HUMAN.list \ --BLAST_DB ${DALI_AF}/Digest/AF.fasta \ --repset ${DALI_AF}/Digest/HUMAN_70.list \ --dat2 ${DALI_AF}/DAT/ \ --clean \ --hierarchical \ --oneway \ --np ${SLURM_NTASKS}
Type ls ${DALI_AF}/Digest to see all the lists.
NOTES:
1fd3A.dat 1fd3B.dat 1fd3C.dat 1fd3D.dat 1fd3.pdb
Running dali in a swarm presents some complications because dali creates temporary files with identical names in the current working directory. Running multiple dali instances simultaneously in the same directory will cause file clobbering.
To prevent this, each step must be run within its own unique directory. To simplify this, a convenience script auto.sh
should be used, rather than calling dali directly in a swarm. Here is the code:
#!/bin/bash # this script is 'auto.sh' # require a 5 character pdb string pdb5c=$1 [[ -n ${pdb5c} ]] || { echo "No 5 character pdb string given"; exit 1; } # base directory where everything stored here=$(pwd) # Check if a .dat file exists, and if not run import.pl if ls ${here}/DAT/${pdb5c:0:4}/${pdb5c:0:4}*.dat 1> /dev/null 2>&1; then echo import.pl ${pdb5c:0:4} done else # run import.pl in a unique temporary subdirectory to prevent file clobbering mkdir -p ${here}/DAT/${pdb5c:0:4} t=$(mktemp -d) && cd ${t} import.pl --pdbfile ${pdb5c:0:4}.pdb --pdbid ${pdb5c:0:4} --dat ${here}/DAT/${pdb5c:0:4}/ --clean cd ${here} && rm -rf ${t} fi # check if dali has already been run if ls ${here}/OUT/${pdb5c:0:5}/*.blast 1> /dev/null 2>&1; then echo dali.pl ${pdb5c:0:5} done exit fi # check if the .dat file actually exists if [[ ! -f ${here}/DAT/${pdb5c:0:4}/${pdb5c:0:5}.dat ]]; then echo "${here}/DAT/${pdb5c:0:4}/${pdb5c:0:5}.dat does not exist" exit fi # run dali.pl in a unique subdirectory to prevent file clobbering rm -f ${here}/OUT/${pdb5c:0:5}.txt mkdir -p ${here}/OUT/${pdb5c:0:5} && cd $_ dali.pl \ --title "mysearch_${pdb5c:0:5}" \ --cd1 ${pdb5c:0:5} \ --dat1 ${here}/DAT/${pdb5c:0:4}/ \ --db ${DALI_AF}/Digest/AF.list \ --BLAST_DB ${DALI_AF}/Digest/AF.fasta \ --repset ${DALI_AF}/Digest/AF.list \ --dat2 ${DALI_AF}/DAT/ \ --clean \ --hierarchical \ --oneway --np ${SLURM_NTASKS:-2}
This script assumes that the pdb file (e.g. 1efe.pdb) exists in the current working directory, and can be run like so:
module load dali; bash auto.sh 1efeA
The script will create files in a distinct hierarchical layout in the current working directory:
. ├── 1efe.pdb ├── DAT │ └── 1efe │ └── 1efeA.dat └── OUT └── 1efeA ├── 1840494.blast ├── 1840494.fasta └── 1efeA.txt
This script can be used to run a swarm job. A swarm file swarm.txt
is created:
#SWARM --sbatch='--ntasks 16' --module dali --logdir LOG --time 120 bash auto.sh 1efeA bash auto.sh 1fd3B bash auto.sh 1fldA bash auto.sh 1aieA bash auto.sh 1c26A bash auto.sh 1hgvA bash auto.sh 1hgzA bash auto.sh 1hh0A bash auto.sh 1ifpA bash auto.sh 1oiaA
Again, this assumes all pdb files exist in the current working directory. The swarm can be launched as follows:
swarm swarm.txt
NOTES:
--ntasks=16
) and 120 minutes (--time 120
) for each run.LOG
subdirectory.