This introductory course teaches the basics of deep learning and of different types of deep learning networks
through a set of hands-on biological examples implemented in Keras, one example per class.
It is intended for NIH researchers using the compute cluster Biowulf. Each class comprises two parts:
an introductory part, which employs simple intuitive/prototype example(s) to illustrate the class topic,
and a biological part that discusses a more realistic biological example.
All the examples have been implemented as Biowulf modules. The code for each biological example
has been adopted and/or reimplemented in Keras from the original code version available on GitHub.
Instructions for running the examples are provided in the lecture slides, as well as in the software manuals.
Important Notes
Each class is a part of a series, but is stand-alone.
Expected knowledge: basic Python, Basic Linux/Unix, some Math.
Class #1: Introduction to the deep learning with Keras.
Convolutional Neural Networks and their application to semantic segmentation of bioimages.
This class provides an introduction to the Deep Learning with Keras.
It starts with discussion of the simplest neural network models: the Perceptron and the Multilayer Perceptron,
and introduces the basic terminology used by Deep Learning, including tensors, network layers, parameters, hyperparameters,
network model, compiling and training a model, loss function, optimizer, hidden layer and deep network.
It also discusses two approaches to building neural network models in Keras:
the Sequential construct and the Functional API approach. The biological example of this class
focuses on semantic segmentation of bioimages using the U-Net model, as applied to the fly brain connectome project.
It involves discussion of relevant network layers (Convolutional, MaxPooling, UpSampling and Concatenation)
and of the notions of model overfitting and data augmentation.
Taught on: 03 March 2021; 31 October 2019; 23 April 2019
Class #2: Recurrent and 1D-Convolutional neural networks
and their application to prediction of the function of non-coding DNA.
This class provides introduction to the Recurrent Neural Networks (RNNs)
and 1D Convolutional Neural Networks (1D-CNNs). To this end, it employs three simple/prototype examples illustrating
the motif detection task, the motif discovery task and the vanishing gradients issue.
It also introduces the notion of memory possessed by a Recurrent network
(while a 1D-Convolutional network is memoryless),
as well as the notion of backpropagation for training of any neural network.
The biological example of this class focuses on application of the RNNs and 1D-CNNs
to prediction of the function of non-coding DNA directly from its sequence. The
class involves discussion of the recurrent layers SimpleRNN and LSTM (Long Short-Term Memory),
as well as other relevant types of network layers: Conv1D, MaxPooling1D, Dropout, and Flatten;
and the Stochastic Gradient Descent optimizer used for training neural networks.
Taught on: 5 May, 2021; 13 November 2019; 29 May 2019
Class #3: Autoencoders, hyperparameter optimization and their application to reduction of dimensionality of cancer transcriptome.
This class employs two simple/prototype examples to provide an introduction
to the reduction of data dimensionality using Autoencoder networks and
to optimization/tuning of network hyperparameters, such as the number of network layers,
the dimensions of intermediate data tensors (including the code tensor), the types of the layer activations, etc.
The biological example of this class focuses on extraction of a biologically meaningful latent space
from cancer transcriptomes using Denoising and Variational autoencoders.
It involves discussion of the Lambda layer and the reparameterization trick used for network implementation of
a Variational Autoencoder; two hyperparameter optimization (HPO) packages available on Biowulf (KerasTuner and CANDLE);
four different HPO algorithms (Grid Search, Random Search, Bayesian Optimization and Hyperband); and
visualization of high-dimensional data using the t-Distributed Stochastic Neighbor Embedding (t-SNE) approach.
Taught on: 25 August, 2021; 26 November 2019; 23 July 2019
Class #4: Generative Adversarial Networks and their application to bioimage synthesis.
This class employs a simple example to introduce a Generative Adversarial Network (GAN), which is
a composite network comprising two subnetworks, the Generator and the Discriminator.
The discussion of this example involves introduction of a Transposed Convolutional layer (Conv2DTranspose)
used by the network model, the minimax optimization objective and
the procedure for training adversarial neural network, as opposed to traditional neural network.
Yet another simple example has been used to illustrate the mode collapse issue,
which often occurs in training GANs when the training dataset comprises data
of two or more different types, a.k.a. modes.
The biological example of this class focuses on
the computational task arising in Developmental Biology: generation/synthesis of fluorescence microscopy
images that visualize localization patterns of certain proteins in a yeast cell.
These patterns reflect an important biological function, but due to technical limitations
they cannot be visualized simultaneously for multiple proteins
at the same stage of a cell growth cycle, so the task of "synchronization" of the patterns
can only be solved computationally.
To this end, three different types of GAN algorithms, including
the (vanilla) GAN, the Wassertsein GAN (WGAN) and the WGAN with Gradient Penalty (WGAN-GP),
have been implemented for training any of the three relevant network architectures,
including the regular deep convolutional, the separable and the star-shaped architecture.
In additions to these algorithms/architectures, discussed is also
the RMSprop optimizer that is being used for training the models of this class.
Class #5: Deep Reinforcement Learning Networks
and their aplication to drug molecule design.
The introductory part of this class starts from discussion of the basic terms used in Reinforcement Learning (RL),
such as Agent, Environment, Action, State, Reward, and Policy. It then employs three simple/prototype examples
to provide introduction to the value-based RL (Q-learning) and policy-based RL, as well as their
application to the problems of sequence optimization and de-novo sequence generation using deep networks.
The biological example focuses on a composite Deep Reinforcement Learning Network comprising two
subnetworks, the Generator and the Predictor, and its application to de novo generation of drug molecules. It involves
discussion of the Embedding, StackAugmentedRNN and GRU (Gated Recurrent Unit) network layers;
delayed rewards and their handling using a rollout procedure; and the Adam optimizer used for training the models
of this class.
Class #6: Graph Convolutional Networks,
handling imbalanced data and their application to classification of cancer types.
The introductory part of this class focuses on two prototype examples,
both dealing with binary classification task.
These examples allow for introduction of the basic graph terminology, overview of the
graph convolution/filtering and pooling procedures and their comparison to the
"regular" convolution and to the processing performed by a Dense layer.
The biological part of the class discusses the task of classification of the gene expression samples
from The Cancer Genome Atlas as normal or 33 tumor types.
It makes use of four available gene expression datasets
and considers two possible types of association between genes.
It discusses in details the GCNConv layer ("vanilla" Graph Convolution) and
the ChebConv layer (Chebyshev Convolution), which both can be used by the models.
It also overviews the methods for balancing
data samples across different classes using the SMOTE variants library. Finally, it
shows that the accuracy of the classification performed by the models reimplemented in Keras specifically for this class
is comparable to the accuracy reported in the original publication,
and that this accuracy can be improved dramatically if
1) the training samples are balanced across different classes and
2) Chebyshev polynomials of higher degree are employed by the ChebConv layer.
Class #7: Message Passing and Self Attention-based Networks,
data augmentation, transfer learning and their application to drug molecule property prediction.
The purpose of this class is to instroduce and explore the Deep Learning techniques used for prediction of
properties of biologically active molecules represented by either graphs or SMILES strings. To this end,
two prototype examples have been discussed in the introductory part of the class, which allow for introduction of
the Message Passing Network (MPN) and Self-Attention Network (SAN) models.
A detailed description is provided of the algorithms for Message Passing and Self-attention
transformations, which have been implemented as a part of the MessagePassing and MultiHeadAttention layers.
The biological part of the class focuses on two approaches aimed at handling the issue of overfitting,
which typically arises in molecular property prediction, since the amount of available
data with ground truth labels/property values is limited. The class demonstrates for the first time that the
approach based on using the MPN model together with data augmented by SMILES enumerastion
dramatically outperforms the SAN model and allows for a multi-fold reduction
in the error of molecular property prediction as compared to the case where original/unaugmented data were used.
This finding has been reproduced consistently across all the four labeled datasets considered in the class.
The second approach is based on transfer learning, as implemented in the SAN-BERT model based on published SMILES-BERT study.
This approach exploits an analogy between the processing of SMILES strings/tokens and processing
of sentences/words in a Natural Language. The approach may or may not succeed,
depending on how closely are related the source task and the target task.