Assignment 3, Programming Part Solution

Description

5/5 – (2 votes)

Problem 1

Variational Autoencoders (VAEs) are probabilistic generative models to model data distribution p(x). In this question, you will be asked to train a VAE on the Binarised MNIST dataset, using the negative ELBO loss as shown in class. Note that each pixel in this image dataset is binary: The pixel is either black or white, which means each datapoint (image) a collection of binary values. You have to model the likelihood p_θ(x|z), i.e. the decoder, as a product of bernoulli distributions.¹

(unittest, 4 pts) Implement the function ‘log likelihood bernoulli’ in ‘q1 solution.py’ to compute the log-likelihood log p(x) for a given binary sample x and Bernoulli distribution p(x). p(x) will be parameterized by the mean of the distribution p(x = 1), and this will be given as input for the function.

(unittest, 4 pts) Implement the function ‘log likelihood normal’ in ‘q1 solution.py’ to compute the log-likelihood log p(x) for a given float vector x and isotropic Normal distribution p(z) = N (µ, diag(σ²)). Note that µ and log(σ²) will be given for Normal distributions.
(unittest, 4 pts) Implement the function ‘log mean exp’ in ‘q1 solution.py’ to compute

the following equation² for each y_i in a given Y = {y₁, y₂, . . . , y_i, . . . y_M };

	₁ K		(k)	− a_i
log			exp y_i		+ a_i,
	K
		k=1

where a_i = max_k y_i^(k). Note that y_i = [y_i⁽¹⁾, y_i⁽²⁾, . . . , y_i^(k), . . . , y_i^(K)]s.

¹The binarized MNIST is not interchangeable with the MNIST dataset available on torchvision. So the data loader as well as dataset will be provided.

This is a type of log-sum-exp trick to deal with numerical underflow issues: the generation of a number that is too small to be represented in the device meant to store it. For example, probabilities of pixels of image can get really small. For more details of numerical underflow in computing log-probability, see http://blog.smola.org/ post/987977550/log-probabilities-semirings-and-floating-point.

IFT6135-H2022	Assignment 3, Programming Part
Prof: Aaron Courville	Generative Models and Self-supervised Learning

(unittest, 4 pts) Implement the function ‘kl gaussian gaussian analytic’ in ‘q1 solution.py’ to compute KL divergence D_KL (q(z|x)∥p(z)) via analytic solution for given p and q. Note that p and q are multivariate normal distributions with diagonal covariance.

(unittest, 4 pts) Implement the function ‘kl gaussian gaussian mc’ in ‘q1 solution.py’ to compute KL diveregence D_KL (q(z|x)∥p(z)) by using Monte Carlo estimate for given p and q. Note that p and q are multivariate normal distributions with diagonal covariance.

	report		variable model p					(x) = p	(x	z)p(z)dz. The prior
6. (		, 15 pts) Consider a latent		R	L		θ	θ	\|
is define as p(z) = N (0, I_L) and z ∈				R		.	Train a VAE with a latent variable of 100-

dimensions (L = 100). Use the provided network architecture and hyperparameters described in ‘vae.ipynb’³. Use ADAM with a learning rate of 3×10⁻⁴, and train for 20 epochs. Evaluate the model on the validation set using the ELBO. Marks will neither be deducted nor awarded if you do not use the given architecture. Note that for this question you have to:

Train a model to achieve an average per-instance ELBO of ≥ −102 on the validation set, and report the ELBO of your model. The ELBO on validation is written as:

L_ELBO(x_i) ≥ −102

^|D^valid^| x_i D_valid

Feel free to modify the above hyperparameters (except the latent variable size) to ensure it works.

(report, 15 pts) Evaluate log-likelihood of the trained VAE models by using importance sampling, which was covered during the lecture. Use the codes described in ‘vae.ipynb’. The formula is reproduced here with additional details:

IFT6135-H2022	Assignment 3, Programming Part
Prof: Aaron Courville	Generative Models and Self-supervised Learning

the Lipschitz-constraint, we will penalize the violation of it as suggested by Petzka et. al. ⁴. In this question, you will first implement a function to estimate the Wasserstein distance between two points x ∼ µ and y ∼ ν (real and generated samples respectively):

E_y_∼_ν [f(y)] − E_x_∼_µ[f(x)]

(1)

and with an added regularization term

E_xˆ_∼_τ [(max{0, ∥∇f(ˆx)∥ − 1}]²

(2)

then in the network training process minimize:

E_y_∼_ν [f(y)] − E_x_∼_µ[f(x)] + λE_xˆ_∼_τ [(max{0, ∥∇f(ˆx)∥ − 1}]²

(3)

where xˆ = tx + (1 − t)y for t ∼ U[0, 1].

Dataset & dataloader In this question, you will use the GAN framework train a generator to generate a distribution of images X ⊂ R^32×32×3, namely the Street View House Numbers dataset (SVHN) ⁵. We will consider the prior distribution p(z) = N (0, I) the isotropic gaussian distribution. We provide a function for sampling from the SVHN datasets in ‘q2 samplers.py‘.

Hyperparameters & Training Pointers We provide code for the GANs network as well as the hyperparameters you should be using. We ask you to code the Wasserstein distance and training procedure to train the GANs as well as the qualitative exploration that you will include in your report.

(unittest, 4 pts) Implement the functions ‘vf wasserstein distance’ and ‘lp reg’ in ‘q2 solution.py’ to compute the objective function of the Wasserstein distance and compute the “Lipschitz Penalty”. Consider that the norm used in the regularizer is the L2-norm.

Qualitative Evaluation In your report,

(report, 8 pts) Provide visual samples. Comment the quality of the samples from each model (e.g. blurriness, diversity).

(report, 8 pts) We want to see if the model has learned a disentangled repre-sentation in the latent space. Sample a random z from your prior distribution. Make

small perturbations to your sample z for each dimension (e.g. for a dimension i, z_i^′ = z_i + ϵ).

has to be large enough to see some visual difference. For each dimension, observe if the changes result in visual variations (that means variations in g(z)). You do not have to show all dimensions, just a couple that result in interesting changes.

See Section 5 of https://arxiv.org/pdf/1709.08894.pdf

⁵The SVHN dataset can be downloaded at http://ufldl.stanford.edu/housenumbers/. Please note that the pro-vided sampler can download the dataset so you do not need to download it separately.

– Do not distribute –

Assignment 3, Programming Part Solution

Share this:

Share this:

Description

Share this:

Related products

Develop a multithreaded app that can find the integer in the range– Assignment 2 Solution

Programming Assignment # 1 Dynamic Memory Allocation Solution

Assignment-(H) Solution

Lab 4: Bash Script and Bitwise Operations Solution

Problem Set 4 Solution