Assignment 3, Programming Part Solution

$24.99 $18.99

Problem 1 Variational Autoencoders (VAEs) are probabilistic generative models to model data distribution p(x). In this question, you will be asked to train a VAE on the Binarised MNIST dataset, using the negative ELBO loss as shown in class. Note that each pixel in this image dataset is binary: The pixel is either black or…

5/5 – (2 votes)

You’ll get a: zip file solution

 

Categorys:

Description

5/5 – (2 votes)

Problem 1

Variational Autoencoders (VAEs) are probabilistic generative models to model data distribution p(x). In this question, you will be asked to train a VAE on the Binarised MNIST dataset, using the negative ELBO loss as shown in class. Note that each pixel in this image dataset is binary: The pixel is either black or white, which means each datapoint (image) a collection of binary values. You have to model the likelihood pθ(x|z), i.e. the decoder, as a product of bernoulli distributions.1

  1. (unittest, 4 pts) Implement the function ‘log likelihood bernoulli’ in ‘q1 solution.py’ to compute the log-likelihood log p(x) for a given binary sample x and Bernoulli distribution p(x). p(x) will be parameterized by the mean of the distribution p(x = 1), and this will be given as input for the function.

  1. (unittest, 4 pts) Implement the function ‘log likelihood normal’ in ‘q1 solution.py’ to compute the log-likelihood log p(x) for a given float vector x and isotropic Normal distribution p(z) = N (µ, diag(σ2)). Note that µ and log(σ2) will be given for Normal distributions.

  2. (unittest, 4 pts) Implement the function ‘log mean exp’ in ‘q1 solution.py’ to compute

the following equation2 for each yi in a given Y = {y1, y2, . . . , yi, . . . yM };

1 K

(k)

ai

log

exp yi

+ ai,

K

k=1

where ai = maxk yi(k). Note that yi = [yi(1), yi(2), . . . , yi(k), . . . , yi(K)]s.

1The binarized MNIST is not interchangeable with the MNIST dataset available on torchvision. So the data loader as well as dataset will be provided.

IFT6135-H2022

Assignment 3, Programming Part

Prof: Aaron Courville

Generative Models and Self-supervised Learning

  1. (unittest, 4 pts) Implement the function ‘kl gaussian gaussian analytic’ in ‘q1 solution.py’ to compute KL divergence DKL (q(z|x)p(z)) via analytic solution for given p and q. Note that p and q are multivariate normal distributions with diagonal covariance.

  1. (unittest, 4 pts) Implement the function ‘kl gaussian gaussian mc’ in ‘q1 solution.py’ to compute KL diveregence DKL (q(z|x)p(z)) by using Monte Carlo estimate for given p and q. Note that p and q are multivariate normal distributions with diagonal covariance.

report

variable model p

(x) = p

(x

z)p(z)dz. The prior

6. (

, 15 pts) Consider a latent

R

L

θ

θ

|

is define as p(z) = N (0, IL) and z

.

Train a VAE with a latent variable of 100-

dimensions (L = 100). Use the provided network architecture and hyperparameters described in ‘vae.ipynb’3. Use ADAM with a learning rate of 3×10−4, and train for 20 epochs. Evaluate the model on the validation set using the ELBO. Marks will neither be deducted nor awarded if you do not use the given architecture. Note that for this question you have to:

  1. Train a model to achieve an average per-instance ELBO of ≥ −102 on the validation set, and report the ELBO of your model. The ELBO on validation is written as:

1

LELBO(xi) ≥ −102

|Dvalid| xi Dvalid

Feel free to modify the above hyperparameters (except the latent variable size) to ensure it works.

  1. (report, 15 pts) Evaluate log-likelihood of the trained VAE models by using importance sampling, which was covered during the lecture. Use the codes described in ‘vae.ipynb’. The formula is reproduced here with additional details:

IFT6135-H2022

Assignment 3, Programming Part

Prof: Aaron Courville

Generative Models and Self-supervised Learning

the Lipschitz-constraint, we will penalize the violation of it as suggested by Petzka et. al. 4. In this question, you will first implement a function to estimate the Wasserstein distance between two points x µ and y ν (real and generated samples respectively):

Eyν [f(y)] − Exµ[f(x)]

(1)

and with an added regularization term

Eτ [(max{0, ∥∇f(ˆx) − 1}]2

(2)

then in the network training process minimize:

Eyν [f(y)] − Exµ[f(x)] + λEτ [(max{0, ∥∇f(ˆx) − 1}]2

(3)

where xˆ = tx + (1 − t)y for t U[0, 1].

Dataset & dataloader In this question, you will use the GAN framework train a generator to generate a distribution of images X R32×32×3, namely the Street View House Numbers dataset (SVHN) 5. We will consider the prior distribution p(z) = N (0, I) the isotropic gaussian distribution. We provide a function for sampling from the SVHN datasets in ‘q2 samplers.py‘.

Hyperparameters & Training Pointers We provide code for the GANs network as well as the hyperparameters you should be using. We ask you to code the Wasserstein distance and training procedure to train the GANs as well as the qualitative exploration that you will include in your report.

  1. (unittest, 4 pts) Implement the functions ‘vf wasserstein distance’ and ‘lp reg’ in ‘q2 solution.py’ to compute the objective function of the Wasserstein distance and compute the “Lipschitz Penalty”. Consider that the norm used in the regularizer is the L2-norm.

Qualitative Evaluation In your report,

  1. (report, 8 pts) Provide visual samples. Comment the quality of the samples from each model (e.g. blurriness, diversity).

  1. (report, 8 pts) We want to see if the model has learned a disentangled repre-sentation in the latent space. Sample a random z from your prior distribution. Make

small perturbations to your sample z for each dimension (e.g. for a dimension i, zi = zi + ϵ).

  • has to be large enough to see some visual difference. For each dimension, observe if the changes result in visual variations (that means variations in g(z)). You do not have to show all dimensions, just a couple that result in interesting changes.

5The SVHN dataset can be downloaded at http://ufldl.stanford.edu/housenumbers/. Please note that the pro-vided sampler can download the dataset so you do not need to download it separately.

– Do not distribute –

Assignment 3, Programming Part Solution
$24.99 $18.99