Deep Learning by Example on Biowulf: Course Syllabus.

This introductory course teaches the basics of deep learning and of different types of deep learning networks
through a set of hands-on biological examples implemented in Keras, one example per class.

It is intended for NIH researchers using the compute cluster Biowulf. Each class comprises two parts:
an introductory part, which employs simple intuitive/prototype example(s) to illustrate the class topic,
and a biological part that discusses a more realistic biological example.
All the examples have been implemented as Biowulf modules. The code for each biological example
has been adopted and/or reimplemented in Keras from the original code version available on GitHub.
Instructions for running the examples are provided in the lecture slides, as well as in the software manuals.

Important Notes

- Each class is a part of a series, but is stand-alone.
- Expected knowledge: basic Python, Basic Linux/Unix, some Math.

Class #1: Introduction to the deep learning with Keras.
Convolutional Neural Networks and their application to semantic segmentation of bioimages.

This class provides an introduction to the Deep Learning with Keras. It starts with discussion of the simplest neural network models: the Perceptron and the Multilayer Perceptron, and introduces the basic terminology used by Deep Learning, including tensors, network layers, parameters, hyperparameters, network model, compiling and training a model, loss function, optimizer, hidden layer and deep network. It also discusses two approaches to building neural network models in Keras: the Sequential construct and the Functional API approach. The biological example of this class focuses on semantic segmentation of bioimages using the U-Net model, as applied to the fly brain connectome project. It involves discussion of relevant network layers (Convolutional, MaxPooling, UpSampling and Concatenation) and of the notions of model overfitting and data augmentation.

Class #2: Recurrent and 1D-Convolutional neural networks
and their application to prediction of the function of non-coding DNA.

This class provides introduction to the Recurrent Neural Networks (RNNs) and 1D Convolutional Neural Networks (1D-CNNs). To this end, it employs three simple/prototype examples illustrating the motif detection task, the motif discovery task and the vanishing gradients issue. It also introduces the notion of memory possessed by a Recurrent network (while a 1D-Convolutional network is memoryless), as well as the notion of backpropagation for training of any neural network. The biological example of this class focuses on application of the RNNs and 1D-CNNs to prediction of the function of non-coding DNA directly from its sequence. The class involves discussion of the recurrent layers SimpleRNN and LSTM (Long Short-Term Memory), as well as other relevant types of network layers: Conv1D, MaxPooling1D, Dropout, and Flatten; and the Stochastic Gradient Descent optimizer used for training neural networks.

Class #3: Autoencoders, hyperparameter optimization and their application to reduction of dimensionality of cancer transcriptome.

This class employs two simple/prototype examples to provide an introduction to the reduction of data dimensionality using Autoencoder networks and to optimization/tuning of network hyperparameters, such as the number of network layers, the dimensions of intermediate data tensors (including the code tensor), the types of the layer activations, etc. The biological example of this class focuses on extraction of a biologically meaningful latent space from cancer transcriptomes using Denoising and Variational autoencoders. It involves discussion of the Lambda layer and the reparameterization trick used for network implementation of a Variational Autoencoder; two hyperparameter optimization (HPO) packages available on Biowulf (KerasTuner and CANDLE); four different HPO algorithms (Grid Search, Random Search, Bayesian Optimization and Hyperband); and visualization of high-dimensional data using the t-Distributed Stochastic Neighbor Embedding (t-SNE) approach.

Class #4: Generative Adversarial Networks and their application to bioimage synthesis.

This class employs a simple example to introduce a Generative Adversarial Network (GAN), which is a composite network comprising two subnetworks, the Generator and the Discriminator. The discussion of this example involves introduction of a Transposed Convolutional layer (Conv2DTranspose) used by the network model, the minimax optimization objective and the procedure for training adversarial neural network, as opposed to traditional neural network. Yet another simple example has been used to illustrate the mode collapse issue, which often occurs in training GANs when the training dataset comprises data of two or more different types, a.k.a. modes. The biological example of this class focuses on the computational task arising in Developmental Biology: generation/synthesis of fluorescence microscopy images that visualize localization patterns of certain proteins in a yeast cell. These patterns reflect an important biological function, but due to technical limitations they cannot be visualized simultaneously for multiple proteins at the same stage of a cell growth cycle, so the task of "synchronization" of the patterns can only be solved computationally. To this end, three different types of GAN algorithms, including the (vanilla) GAN, the Wassertsein GAN (WGAN) and the WGAN with Gradient Penalty (WGAN-GP), have been implemented for training any of the three relevant network architectures, including the regular deep convolutional, the separable and the star-shaped architecture. In additions to these algorithms/architectures, discussed is also the RMSprop optimizer that is being used for training the models of this class.

Class #5: Deep Reinforcement Learning Networks
and their aplication to drug molecule design.

The introductory part of this class starts from discussion of the basic terms used in Reinforcement Learning (RL), such as Agent, Environment, Action, State, Reward, and Policy. It then employs three simple/prototype examples to provide introduction to the value-based RL (Q-learning) and policy-based RL, as well as their application to the problems of sequence optimization and de-novo sequence generation using deep networks. The biological example focuses on a composite Deep Reinforcement Learning Network comprising two subnetworks, the Generator and the Predictor, and its application to de novo generation of drug molecules. It involves discussion of the Embedding, StackAugmentedRNN and GRU (Gated Recurrent Unit) network layers; delayed rewards and their handling using a rollout procedure; and the Adam optimizer used for training the models of this class.

Class #6: Graph Convolutional Networks,
handling imbalanced data and their application to classification of cancer types.

The introductory part of this class focuses on two prototype examples, both dealing with binary classification task. These examples allow for introduction of the basic graph terminology, overview of the graph convolution/filtering and pooling procedures and their comparison to the "regular" convolution and to the processing performed by a Dense layer. The biological part of the class discusses the task of classification of the gene expression samples from The Cancer Genome Atlas as normal or 33 tumor types. It makes use of four available gene expression datasets and considers two possible types of association between genes. It discusses in details the GCNConv layer ("vanilla" Graph Convolution) and the ChebConv layer (Chebyshev Convolution), which both can be used by the models. It also overviews the methods for balancing data samples across different classes using the SMOTE variants library. Finally, it shows that the accuracy of the classification performed by the models reimplemented in Keras specifically for this class is comparable to the accuracy reported in the original publication, and that this accuracy can be improved dramatically if 1) the training samples are balanced across different classes and 2) Chebyshev polynomials of higher degree are employed by the ChebConv layer.