OmegaFold is the first computational method to successfully predict high-resolution protein structure from a single primary sequence alone. Using a new combination of a protein language model that allows us to make predictions from single sequences and a geometry-inspired transformer model trained on protein structures, OmegaFold outperforms RoseTTAFold and achieves similar prediction accuracy to AlphaFold2 on recently released structures.
The OmegaFold application installed on Biowulf is supposed to be run on a GPU node. As the first step, please allocate an interactive session:
[user@biowulf]$ sinteractive --gres=gpu:a100:1,lscratch:10 --mem=100g -c16
[user@cn1113 ~]$ module load OmegaFold
[+] Loading singularity 3.8.5-1 on cn1113
[+] Loading CUDA Toolkit 11.3.0 ...
[+] Loading cuDNN/8.2.1/CUDA-11.3 libraries...
[+] Loading OmegaFold 1.1.0
[user@cn1113 user]$ omegafold --help
usage: omegafold [-h] [--num_cycle NUM_CYCLE] [--subbatch_size SUBBATCH_SIZE] [--device DEVICE]
[--weights_file WEIGHTS_FILE] [--weights WEIGHTS]
[--pseudo_msa_mask_rate PSEUDO_MSA_MASK_RATE] [--num_pseudo_msa NUM_PSEUDO_MSA]
[--allow_tf32 ALLOW_TF32]
input_file output_dir
Launch OmegaFold and perform inference on the data. Some examples (both the input and output
files) are included in the Examples folder, where each folder contains the output of each
available model from model1 to model3. All of the results are obtained by issuing the general
command with only model number chosen (1-3).
positional arguments:
input_file The input fasta file
output_dir The output directory to write the output pdb files. If the directory does
not exist, we just create it. The output file name follows its unique
identifier in the rows of the input fasta file"
optional arguments:
-h, --help show this help message and exit
--num_cycle NUM_CYCLE
The number of cycles for optimization, default to 10
--subbatch_size SUBBATCH_SIZE
The subbatching number, the smaller, the slower, the less GRAM
requirements. Default is the entire length of the sequence. This one takes
priority over the automatically determined one for the sequences
--device DEVICE The device on which the model will be running, default to the accelerator
that we can find
--weights_file WEIGHTS_FILE
The model cache to run
--weights WEIGHTS The url to the weights of the model
--pseudo_msa_mask_rate PSEUDO_MSA_MASK_RATE
The masking rate for generating pseudo MSAs
--num_pseudo_msa NUM_PSEUDO_MSA
The number of pseudo MSAs
--allow_tf32 ALLOW_TF32
if allow tf32 for speed if available, default to True
Copy testing data to your current folder:
[user@cn1113 user]$ cp $OMEGAFOLD_DATA/* .Run the omegafold executable on testingf data:
[user@cn1113 user]$ omegafold pi3k.fa outdir & [1] 530515 [user@cn1113 OmegaFold]$ INFO:root:Loading weights from /home/user/.cache/omegafold_ckpt/model.pt INFO:root:Constructing OmegaFold INFO:root:Reading pi3k.fa INFO:root:Predicting 1th chain in pi3k.fa INFO:root:724 residues in this chain. ... [user@cn1113 OmegaFold]$ nvidia-smi +-----------------------------------------------------------------------------+ | NVIDIA-SMI 470.82.01 Driver Version: 470.82.01 CUDA Version: 11.4 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVIDIA A100-SXM... On | 00000000:C7:00.0 Off | 0 | | N/A 61C P0 367W / 400W | 23136MiB / 81251MiB | 100% Default | | | | Disabled | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | 0 N/A N/A 530539 C /usr/bin/python3 23133MiB | +-----------------------------------------------------------------------------+ ... INFO:root:Finished prediction in 185.76 seconds. INFO:root:Saving prediction to outdir/sp|P27986|P85A_HUMAN Phosphatidylinositol 3-kinase regulatory subunit alpha OS=Homo sapiens OX=9606 GN=PIK3R1 PE=1 SV=2.pdb INFO:root:Saved INFO:root:Predicting 2th chain in pi3k.fa INFO:root:1068 residues in this chain. INFO:root:Finished prediction in 529.51 seconds. INFO:root:Saving prediction to outdir/sp|P42336|PK3CA_HUMAN Phosphatidylinositol 4,5-bisphosphate 3-kinase catalytic subunit alpha isoform OS=Homo sapiens OX=9606 GN=PIK3CA PE=1 SV=2.pdb INFO:root:Saved INFO:root:Done! [user@cn1113 OmegaFold]$ ls outdir 'sp|P27986|P85A_HUMAN Phosphatidylinositol 3-kinase regulatory subunit alpha OS=Homo sapiens OX=9606 GN=PIK3R1 PE=1 SV=2.pdb' 'sp|P42336|PK3CA_HUMAN Phosphatidylinositol 4,5-bisphosphate 3-kinase catalytic subunit alpha isoform OS=Homo sapiens OX=9606 GN=PIK3CA PE=1 SV=2.pdb'En the interactive session:
[user@cn1113 ~]$ exit salloc.exe: Relinquishing job allocation 46116226