The three-dimensional co-ordinates of each protein are used to calculate residue - residue distance matrices.
#!/bin/bash #SBATCH -J dali_test --ntasks=4 --nodes=1 rm -rf test ml dali $DALI_HOME/test.csh
Allocate an interactive session and run the program.
Sample session (user input in bold):
[user@biowulf]$ sinteractive --ntasks=4 --nodes=1 salloc.exe: Pending job allocation 46116226 salloc.exe: job 46116226 queued and waiting for resources salloc.exe: job 46116226 has been allocated resources salloc.exe: Granted job allocation 46116226 salloc.exe: Waiting for resource configuration salloc.exe: Nodes cn3144 are ready for job [user@cn3144 ~]$ module load dali [user@cn3144 ~]$ cp /pdb/pdb/pp/pdb1ppt.ent.gz . [user@cn3144 ~]$ cp /pdb/pdb/bb/pdb1bba.ent.gz . [user@cn3144 ~]$ import.pl --pdbfile pdb1ppt.ent.gz --pdbid 1ppt --dat ./ [user@cn3144 ~]$ import.pl --pdbfile pdb1bba.ent.gz --pdbid 1bba --dat ./ [user@cn3144 ~]$ dali.pl --pdbfile1 pdb1ppt.ent.gz --pdbfile2 pdb1bba.ent.gz --dat1 ./ --dat2 ./ --outfmt "summary,alignments" [user@cn3144 ~]$ cat mol1A.txt # Job: test # Query: mol1A # No: Chain Z rmsd lali nres %id PDB Description 1: mol2-A 3.6 1.8 33 36 39 MOLECULE: BOVINE PANCREATIC POLYPEPTIDE; # Pairwise alignments No 1: Query=mol1A Sbjct=mol2A Z-score=3.6 DSSP LLLLLLLLLLLLLHHHHHHHHHHHHHHHHHHLLlll Query GPSQPTYPGDDAPVEDLIRFYDNLQQYLNVVTRhry 36 ident | | |||| | | | | | || Sbjct APLEPEYPGDNATPEQMAQYAAELRRYINMLTRpry 36 DSSP LLLLLLLLLLLLLLLHHHHHHHHHHHHHHHHLLlll [user@cn3144 ~]$ exit salloc.exe: Relinquishing job allocation 46116226 [user@biowulf ~]$
Create a batch input file (e.g. dali.sh). For example:
#!/bin/bash module load dali import.pl --pdbfile pdb1ppt.ent.gz --pdbid 1ppt --dat ./ import.pl --pdbfile pdb1bba.ent.gz --pdbid 1bba --dat ./ dali.pl --pdbfile1 pdb1ppt.ent.gz --pdbfile2 pdb1bba.ent.gz --dat1 ./ --dat2 ./ --outfmt "summary,alignments"
Submit this job using the Slurm sbatch command.
sbatch dali.sh
In certain circumstances, dali can be accelerated using MPI. To do so, include --np $SLURM_NTASKS with the command, and submit the job using --ntasks=# --nodes=1 , where # is the number of MPI tasks requested. MPI only works on a single node, so # must be less than the maximum number of cpus available on a single node. At present the maximum is 128; however, most nodes have only 56 cpus and so jobs requesting more than 56 cpus may wait a considerable time in the queue.
... dali.pl --np $SLURM_NTASKS ... ...
Submit this job using the Slurm sbatch command.
sbatch --ntasks=32 --nodes=1 dali.sh
Running with the AlphaFold database:
#!/bin/bash module load dali zcat /pdb/pdb/fd/pdb1fd3.ent.gz > 1fd3.pdb import.pl --pdbfile 1fd3.pdb --pdbid 1fd3 --dat ./ --clean dali.pl \ --title "my search" \ --cd1 1fd3B \ --dat1 ./ \ --db ${DALI_AF}/Digest/HUMAN.list \ --BLAST_DB ${DALI_AF}/Digest/AF.fasta \ --repset ${DALI_AF}/Digest/HUMAN_70.list \ --dat2 ${DALI_AF}/DAT/ \ --clean \ --hierarchical \ --oneway \ --np ${SLURM_NTASKS}
Type ls ${DALI_AF}/Digest to see all the lists.
NOTES:
1fd3A.dat 1fd3B.dat 1fd3C.dat 1fd3D.dat 1fd3.pdb