flowchart LR
training_data -- training --> generative_model
generative_model -- "sampling (plus random noise)" --> generated_sample
Book: Generative Deep Learning by David Foster
Chapter 1: Generative Modeling
Generative Modeling
- model the probability of observing an observation
x p(x)
Discriminative Modeling
- Model the probability of a label
ygiven an observationx p(y|x)
flowchart LR
training_data["Training Data
label
observation"]
training_data -- training --> discriminative_model
result["Prediction
0.83
likely to be van Gogh"]
discriminative_model -- prediction --> result
Conditional Generative Model
- Model the probability of an observation
xgiven a labely p(x|y)
Representation Learning
high-dimensional data
representation
latent space
encoder-decoder
manifold
The fundamentals of representational learning are very similar to the mathematical concepts of non-linear behavior in electrical engineering and digital communications theory.
Chapter 2: Deep Learning
Multilayer Perceptron (MLP)
- discriminative model
- supervised learning
- loss function: compare predicted to actual
- optimizer: used to adjust weights in neural network based on the gradient of the loss function
- Adam (Adaptive Moment Estimation)
- RMSProp (Root Mean Square Propagation)
Convolution Neural Network (CNN)
- Convolutional layer is a collection of filters
- strides: step size used to move the filter across input
- padding:
padding="same"pads input data with zeros so the output layer is the same size as the input size ifstrides=1 - stacking
- Batch normalization - calculation of gradient grows too large causing weights to wildly oscillate
- covariate shift: weights move farther away from the random initial values
- training using batch normalization reduces covariate shift problem
- prediction using batch normalization
- trainable parameters
- scale (gamma)
- shift (beta)
- nontrainable parameters
- moving average
- standard deviation
- Dropout
- during training, choose a random set of units from the prior layer and set their output to zero
- reduces reliance on any one value so better at generalizing to unseen data
- Modern approaches tend to favor batch normalization