Book: Generative Deep Learning by David Foster

Author

Carol Willing

Published

June 4, 2023

Chapter 1: Generative Modeling

Generative Modeling

  • model the probability of observing an observation x
  • p(x)
flowchart LR
    training_data -- training --> generative_model
    generative_model -- "sampling (plus random noise)" --> generated_sample
Figure 1: Generative model where an observation has many features

Discriminative Modeling

  • Model the probability of a label y given an observation x
  • p(y|x)
flowchart LR
    training_data["Training Data
    label
    observation"]
    training_data -- training --> discriminative_model
    result["Prediction
    0.83
    likely to be van Gogh"]
    discriminative_model -- prediction --> result
Figure 2: Discriminative model

Conditional Generative Model

  • Model the probability of an observation x given a label y
  • p(x|y)

Representation Learning

high-dimensional data

representation

latent space

encoder-decoder

manifold

Note

The fundamentals of representational learning are very similar to the mathematical concepts of non-linear behavior in electrical engineering and digital communications theory.

Chapter 2: Deep Learning

Multilayer Perceptron (MLP)

  • discriminative model
  • supervised learning
  • loss function: compare predicted to actual
  • optimizer: used to adjust weights in neural network based on the gradient of the loss function
    • Adam (Adaptive Moment Estimation)
    • RMSProp (Root Mean Square Propagation)

Convolution Neural Network (CNN)

  • Convolutional layer is a collection of filters
  • strides: step size used to move the filter across input
  • padding: padding="same" pads input data with zeros so the output layer is the same size as the input size if strides=1
  • stacking
  • Batch normalization - calculation of gradient grows too large causing weights to wildly oscillate
    • covariate shift: weights move farther away from the random initial values
    • training using batch normalization reduces covariate shift problem
    • prediction using batch normalization
    • trainable parameters
      • scale (gamma)
      • shift (beta)
    • nontrainable parameters
      • moving average
      • standard deviation
  • Dropout
    • during training, choose a random set of units from the prior layer and set their output to zero
    • reduces reliance on any one value so better at generalizing to unseen data
  • Modern approaches tend to favor batch normalization

Chapter 3: Variational Autoencoders