OmegaFold is the first computational method to successfully predict high-resolution protein structure from a single primary sequence alone. Using a new combination of a protein language model that allows us to make predictions from single sequences and a geometry-inspired transformer model trained on protein structures, OmegaFold outperforms RoseTTAFold and achieves similar prediction accuracy to AlphaFold2 on recently released structures.
The OmegaFold application installed on Biowulf is supposed to be run on a GPU node. As the first step, please allocate an interactive session:
[user@biowulf]$ sinteractive --gres=gpu:a100:1,lscratch:10 --mem=100g -c16 [user@cn1113 ~]$ module load OmegaFold [+] Loading singularity 3.8.5-1 on cn1113 [+] Loading CUDA Toolkit 11.3.0 ... [+] Loading cuDNN/8.2.1/CUDA-11.3 libraries... [+] Loading OmegaFold 1.1.0 [user@cn1113 user]$ omegafold --help usage: omegafold [-h] [--num_cycle NUM_CYCLE] [--subbatch_size SUBBATCH_SIZE] [--device DEVICE] [--weights_file WEIGHTS_FILE] [--weights WEIGHTS] [--pseudo_msa_mask_rate PSEUDO_MSA_MASK_RATE] [--num_pseudo_msa NUM_PSEUDO_MSA] [--allow_tf32 ALLOW_TF32] input_file output_dir Launch OmegaFold and perform inference on the data. Some examples (both the input and output files) are included in the Examples folder, where each folder contains the output of each available model from model1 to model3. All of the results are obtained by issuing the general command with only model number chosen (1-3). positional arguments: input_file The input fasta file output_dir The output directory to write the output pdb files. If the directory does not exist, we just create it. The output file name follows its unique identifier in the rows of the input fasta file" optional arguments: -h, --help show this help message and exit --num_cycle NUM_CYCLE The number of cycles for optimization, default to 10 --subbatch_size SUBBATCH_SIZE The subbatching number, the smaller, the slower, the less GRAM requirements. Default is the entire length of the sequence. This one takes priority over the automatically determined one for the sequences --device DEVICE The device on which the model will be running, default to the accelerator that we can find --weights_file WEIGHTS_FILE The model cache to run --weights WEIGHTS The url to the weights of the model --pseudo_msa_mask_rate PSEUDO_MSA_MASK_RATE The masking rate for generating pseudo MSAs --num_pseudo_msa NUM_PSEUDO_MSA The number of pseudo MSAs --allow_tf32 ALLOW_TF32 if allow tf32 for speed if available, default to TrueCopy testing data to your current folder:
[user@cn1113 user]$ cp $OMEGAFOLD_DATA/* .Run the omegafold executable on testingf data:
[user@cn1113 user]$ omegafold pi3k.fa outdir & [1] 530515 [user@cn1113 OmegaFold]$ INFO:root:Loading weights from /home/user/.cache/omegafold_ckpt/model.pt INFO:root:Constructing OmegaFold INFO:root:Reading pi3k.fa INFO:root:Predicting 1th chain in pi3k.fa INFO:root:724 residues in this chain. ... [user@cn1113 OmegaFold]$ nvidia-smi +-----------------------------------------------------------------------------+ | NVIDIA-SMI 470.82.01 Driver Version: 470.82.01 CUDA Version: 11.4 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVIDIA A100-SXM... On | 00000000:C7:00.0 Off | 0 | | N/A 61C P0 367W / 400W | 23136MiB / 81251MiB | 100% Default | | | | Disabled | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | 0 N/A N/A 530539 C /usr/bin/python3 23133MiB | +-----------------------------------------------------------------------------+ ... INFO:root:Finished prediction in 185.76 seconds. INFO:root:Saving prediction to outdir/sp|P27986|P85A_HUMAN Phosphatidylinositol 3-kinase regulatory subunit alpha OS=Homo sapiens OX=9606 GN=PIK3R1 PE=1 SV=2.pdb INFO:root:Saved INFO:root:Predicting 2th chain in pi3k.fa INFO:root:1068 residues in this chain. INFO:root:Finished prediction in 529.51 seconds. INFO:root:Saving prediction to outdir/sp|P42336|PK3CA_HUMAN Phosphatidylinositol 4,5-bisphosphate 3-kinase catalytic subunit alpha isoform OS=Homo sapiens OX=9606 GN=PIK3CA PE=1 SV=2.pdb INFO:root:Saved INFO:root:Done! [user@cn1113 OmegaFold]$ ls outdir 'sp|P27986|P85A_HUMAN Phosphatidylinositol 3-kinase regulatory subunit alpha OS=Homo sapiens OX=9606 GN=PIK3R1 PE=1 SV=2.pdb' 'sp|P42336|PK3CA_HUMAN Phosphatidylinositol 4,5-bisphosphate 3-kinase catalytic subunit alpha isoform OS=Homo sapiens OX=9606 GN=PIK3CA PE=1 SV=2.pdb'En the interactive session:
[user@cn1113 ~]$ exit salloc.exe: Relinquishing job allocation 46116226