ReLeaSE (Reinforcement Learning for Structural Evolution) is an application for de-novo Drug Design
based on Reinforcement Learning. It integrates two deep neural networks: generative and
predictive, that are trained separately but are used jointly to generate novel targeted chemical libraries.
ReLeaSE uses simple representation of molecules by their simplified molecular input line entry specification (SMILES)
strings only.
This application, reimplemented in Keras from the original version developed in PyTorch,
is being used as a biological example by class #5 of the course
"Deep Learning by Example on Biowulf".
Allocate an interactive session and run the program. Sample session:
[user@biowulf]$ sinteractive --mem=160g --gres=gpu:k80,lscratch:10 -c8or, in order to use 4 GPUs,
[user@biowulf]$ sinteractive --mem=160g --gres=gpu:k80:4,lscratch:20 -c14 [user@cn4204 ~]$module load release [+] Loading cuDNN/7.6.5/CUDA-10.0 libraries... [+] Loading CUDA Toolkit 10.0.130 ... [+] Loading release 20200516The ReLeaSE application, as it is currently implemented on Biowulf, comprises the following source files:
[user@cn4204]$ ls $RELEASE_SRC data.py options.py predict.py visualize.py stackAugmentedRNN.py models.py train.py smiles.py utils.pyThe executables are:
[user@cn4204]$ ls $RELEASE_BIN predict.py train.py visualize.pyIn particular, the command line options of the executable train.py are as follows:
[user@cn4204]$ train.py -h Using TensorFlow backend. usage: train.py [-h] [-a ALPHA] [-b BATCH_SIZE] [-d DATA_TYPE] [-D] [--dropout DROPOUT_FRACTION] [--embedding_dim EMBEDDING_DIM] [--epochs NUM_EPOCHS] [-f NUM_FOLDS] [-g num_gpus] [--delta DELTA] [--hidden_size HIDDEN_SIZE] [-i input_checkpoint] [--lr LR] -m TRAINING_MODE [--max_length MAX_LENGTH] [--min_length MIN_LENGTH] [-n N_ROLLOUT] [-N NUM_EPISODES] [-p PREDICT_LEN] [-o output_checkpoint] [-O OPTIMIZER] [-r RECURRENT_LAYER] [--stack_width STACK_WIDTH] [--stack_depth STACK_DEPTH] [--steps_per_epoch STEPS_PER_EPOCH] [-v] optional arguments: -h, --help show this help message and exit -a ALPHA, --alpha ALPHA reinforcement learning rate; default = 0.001 -b BATCH_SIZE, --batch_size BATCH_SIZE input batch size; default=512 -d DATA_TYPE, --data_type DATA_TYPE Data type: jak2 | logp; default = jak2 -D, --debug output debugging info --dropout DROPOUT_FRACTION the dropout fraction; default=0.2 --embedding_dim EMBEDDING_DIM the dimension of the embedding space; default=512 --epochs NUM_EPOCHS, -e NUM_EPOCHS number of iterations to train for; default = 51 for jak2 and =25 for logp -f NUM_FOLDS, --num_folds NUM_FOLDS number of folds in cross-validation split; default=5 -g num_gpus, --num_gpus num_gpus number of gpus to use; default=1 --delta DELTA discount rate; default = 0.98 --hidden_size HIDDEN_SIZE hidden_size; default = 256 -i input_checkpoint, --input_checkpoint input_checkpoint The name of the input checkpoint file; default = None --lr LR learning rate (for generator) default: 3.e-4 --max_length MAX_LENGTH max length of a SMILES string to be used for training; default=None --min_length MIN_LENGTH min length of a SMILES string to be used for training; default=None -n N_ROLLOUT, --n_rollout N_ROLLOUT number of rollout trajectories to average over; default=16 -N NUM_EPISODES, --num_episodes NUM_EPISODES number of episodesin the REINFORCE algorithm; default=100 -p PREDICT_LEN, --predict_len PREDICT_LEN max length of a sequence used to predict next character; default=120 -o output_checkpoint, --output_checkpoint output_checkpoint The name of the output checkpoint file; default=None -O OPTIMIZER, --optimizer OPTIMIZER Optimizer to use for training: adadelta | adam | rmsprop; default=adam -r RECURRENT_LAYER, --recurrent_layer RECURRENT_LAYER Recurrent layer for generator: SimpleRNN | GRU | LSTM | SA_SimpleRNN | SA_LSTM | SA_GRU (default) --stack_width STACK_WIDTH stack_width; default = 1 --stack_depth STACK_DEPTH stack_depth; default = 100 --steps_per_epoch STEPS_PER_EPOCH stack_width; default = None -v, --verbose increase the verbosity level of output required arguments: -m TRAINING_MODE, --training_mode TRAINING_MODE Training mode: generator | predictor | reinforceTo perform training of the predictor network using this executable, copy training data to the current folder:
[user@cn4204]$ cp -r $RELEASE_DATA/* .ReLeaSE takes as input data in the SMILES (Simplified Molecular Input Line Entry Specification) format. For example, the data used for training Generator model look like:
[user@cn4204]$ more data/chembl_22_clean_1576904_sorted_std_final.smi CCO CHEMBL545 C CHEMBL17564 CO CHEMBL14688 NCCS CHEMBL602 NCCN CHEMBL816 CN CHEMBL43280 C=O CHEMBL1255 CCN CHEMBL14449 ... CCC[N+](C)(C)CCO CHEMBL105059 CCCCCCCCN1CCCCC1 CHEMBL105218 ClCC(=O)c1ccccc1 CHEMBL105712 CCCCCCCNc1ccccc1 CHEMBL106128 ... CCOc1ccc(CN2CCN(Cc3ccon3)CC2CCO)cc1 CHEMBL1890577 Fc1ccc(Cc2nnc(o2)C(=O)NCC2CCCO2)cc1 CHEMBL1889842 COc1ccc(CC(N)c2csc(Nc3ccccn3)n2)cc1 CHEMBL1884880 CN1CCN(CC1)NC(=O)c1cccc(c1)C(F)(F)F CHEMBL1899891 O=C(NC1CCCCC1)C1CCCN1C(=O)Nc1ccccc1 CHEMBL1899656 CCc1nnc(NC(=O)CSc2nccn2Cc2ccccc2)s1 CHEMBL1901068 CCOC(=O)C(C)N1C=C(O)N(C1=O)c1ccccc1 CHEMBL1890667 ...A sample command to train Generator model on these data would be:
[user@cn4204]$ train.py -m generator -r SA_GRU -g 4 -b 1000 --lr 3.e-4 Using TensorFlow backend. ... Model: "sequential_1" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= embedding_1 (Embedding) (None, 11, 512) 45568 _________________________________________________________________ stack_augmented_rnn_1 (Stack (None, 11, 256) 617216 _________________________________________________________________ dropout_1 (Dropout) (None, 11, 256) 0 _________________________________________________________________ stack_augmented_rnn_2 (Stack (None, 11, 256) 420608 _________________________________________________________________ dropout_2 (Dropout) (None, 11, 256) 0 _________________________________________________________________ stack_augmented_rnn_3 (Stack (None, 256) 420608 _________________________________________________________________ dropout_3 (Dropout) (None, 256) 0 _________________________________________________________________ dense_1 (Dense) (None, 89) 22873 ================================================================= Total params: 1,526,873 Trainable params: 1,526,873 Non-trainable params: 0 _________________________________________________________________ Model compiled with loss='categorical_crossentropy', lr= 0.0003 Training generator ... ...or, using 4 GPUs,
[user@cn4204]$ train.py -m generator -r SA_GRU -g 4 -b 1000 --lr 3.e-4 ...The data for training Predictor model look as follows:
[user@cn4204]$ more data/jak2_data.csv SMILES,pIC50 O=S(=O)(Nc1cccc(-c2cnc3ccccc3n2)c1)c1cccs1,4.26 O=c1cc(-c2nc(-c3ccc(-c4cn(CCP(=O)(O)O)nn4)cc3)[nH]c2-c2ccc(F)cc2)cc[nH]1,4.34 NC(=O)c1ccc2c(c1)nc(C1CCC(O)CC1)n2CCCO,4.53 NCCCn1c(C2CCNCC2)nc2cc(C(N)=O)ccc21,4.56 CNC(=S)Nc1cccc(-c2cnc3ccccc3n2)c1,4.59 O=C(Nc1cccc(-c2cnc3ccccc3n2)c1)C1CC1,4.6 O=C(Nc1cccc(-c2cnc3ccccc3n2)c1)c1ccco1,4.61 COc1cc(Nc2nccc(-c3ccc(Cl)cc3)n2)cc(OC)c1OC,4.67 CN1C(=O)COc2c1cnc1ccc(Sc3nnc4c(F)cc(-c5cnn(C)c5)cn34)cc21,4.68 COc1ccc2c(c1)C(=O)N(c1nc(C(=O)Nc3cnccc3N3CCNCC3)cs1)C2,4.68 ...- for predicting logP property value:
[user@cn4204]$ more data/logp_labels.csv ,SMILES,Kow 1,CC1CC2C3CCC4=CC(=O)C=CC4(C)C3(F)C(O)CC2(C)C1(O)C(=O)CO,1.885 2,CC(=O)OCC(=O)C1(O)CCC2C3CCC4=CC(=O)CCC4(C)C3C(O)CC21C,2.19 3,CC(=O)OCC(=O)C1(O)CCC2C3CCC4=CC(=O)CCC4(C)C3C(=O)CC21C,2.1 4,CCC1(c2ccccc2)C(=O)NC(=O)NC1=O,1.47 5,COC12C(COC(N)=O)C3=C(C(=O)C(C)=C(N)C3=O)N1CC1NC12,-0.4 6,CCC1(CC)C(=O)NC(=O)N(C)C1=O,1.15 7,CCC1(c2ccccc2)NC(=O)N(C)C1=O,1.69 8,O=P1(N(CCCl)CCCl)NCCCO1,0.63 9,CC(O)C(=O)O,-0.72 10,CC12CCC(=O)C=C1CCC1C2C(O)CC2(C)C(C(=O)CO)CCC12,1.94 11,CC12CCC(=O)C=C1CCC1C2C(O)CC2(C)C1CCC2(O)C(=O)CO,1.61 12,CC12C=CC(=O)C=C1CCC1C2C(O)CC2(C)C1CCC2(O)C(=O)CO,1.62 13,CC12CCC3c4ccc(O)cc4CCC3C1CC(O)C2O,2.45 14,CC12CCC3c4ccc(O)cc4CCC3C1CCC2O,4.01 15,Clc1ccc(C(c2ccc(Cl)cc2)C(Cl)(Cl)Cl)cc1,6.91 16,O=C(O)c1c(Cl)cccc1Cl,2.23 17,c1ccc2c(c1)cc1ccc3cccc4ccc2c1c34,6.13 18,CCCCC1C(=O)N(c2ccccc2)N(c2ccccc2)C1=O,3.16 ...Sample commands to train the Predictor model are:
[user@cn4204]$ train.py -d jak2 -m predictor -g 1 -b 128 --lr 0.0001 -e 500 ... Model: "sequential_1" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= embedding_1 (Embedding) (None, 120, 512) 45568 _________________________________________________________________ lstm_1 (LSTM) (None, 128) 328192 _________________________________________________________________ dropout_1 (Dropout) (None, 128) 0 _________________________________________________________________ batch_normalization_1 (Batch (None, 128) 512 _________________________________________________________________ dense_1 (Dense) (None, 128) 16512 _________________________________________________________________ activation_1 (Activation) (None, 128) 0 _________________________________________________________________ dense_2 (Dense) (None, 1) 129 _________________________________________________________________ activation_2 (Activation) (None, 1) 0 ================================================================= Total params: 390,913 Trainable params: 390,657 Non-trainable params: 256 _________________________________________________________________ Reading predictor data ... ... [user@cn4204]$ train.py -d logp -m predictor -g 1 -b 128 --lr 0.0001 -e 500 ... [user@cn4204]$ train.py -m reinforce -d jak2 ... [user@cn4204]$ train.py -m reinforce -d jak2 -i checkpoints/generator.weights.SA_GRU.1.h5,checkpoints/predictor.weights.jak2.h5 ...After training the models, the checkpoints (model weights or the entire model) are save in the checkpoints folder.
[user@cn4204]$ predict.py -i checkpoints/generator.weights.SA_GRU.1.h5 ... generated SMILES = C(NCCN2CCOCC2)=NC(=S)N(CCC)C [user@cn4204]$ predict.py -i checkpoints/generator.weights.SA_GRU.2.h5 --stack_width 2 ... Generated SMILES= 'CIOC(=O)CCCCCCc1nnno1' [user@cn4204]$ predict.py -r LSTM -i checkpoints/generator.weights.LSTM.h5 ... Generated SMILES= 'CC(O)=C(C(O)=O)N(C)C(=O)C(CCC(O)=O)NC(=O)COC(C)=O' [user@cn4204]$ predict.py -i checkpoints/reinforce.weights.jak2.SA_GRU.1.h5,checkpoints/predictor.weights.jak2.h5 ... Generated SMILES= 'C[CH2]C=C(C)CCSSSCCCN=C=N' jak2 property value= 0.4330919 ... [user@cn4204]$ predict.py -s 'CC(Cl)=Cc3ccc(Cl)cc3' -i checkpoints/predictor.model.jak2.h5 ... Input SMILES= 'CC(Cl)=Cc3ccc(Cl)cc3' jak2 property value= 0.03461966NOTE: since each the SMILES token is sampled randomly from the respective distribution, different runs of the same command may produce different SMILES strings.
[user@cn4204]$ visualize.py -s "C(NCCN2CCOCC2)=NC(=S)N(CCC)C"
[user@cn4204]$ visualize.py -s "CIOC(=O)CCCCCCc1nnno1"
[user@cn4204]$ visualize.py -s "CC(O)=C(C(O)=O)N(C)C(=O)C(CCC(O)=O)NC(=O)COC(C)=O"
[user@cn4204 ~]$ exit salloc.exe: Relinquishing job allocation 46116226 [user@biowulf ~]$