Name: Deep Learning Homework #3 Solution
SKU: 11649
Price: 30.00 USD
Availability: InStock

Description

5/5 – (2 votes)

(15 points) Backpropagation for autoencoders. In an autoencoder, we seek to recon-struct the original data after some operation that reduces the data’s dimensionality. We may be interested in reducing the data’s dimensionality to gain a more compact representation of the data.

For example, consider x 2 Rⁿ. Further, consider W 2 R^{m n} is of lower dimensionality than x. One way to design W so features of x is to minimize the following expression

L =	1	W^T Wx	x	2
L =	2	W^T Wx	x

where m < n. Then Wx that Wx still contains key

with respect to W. (To be complete, autoencoders also have a nonlinearity in each layer,

i.e., the loss is ₂¹			f(W^T f(Wx))	x	2	. However, we’ll work with the linear example.)

1. (3 points) In words, describe why this minimization nds a W that ought to preserve information about x.

1. (3 points) Draw the computational graph for L.

1. (3 points) In the computational graph, there should be two paths to W. How do we account for these two paths when calculating r_WL? Your answer should include a mathematical argument.

1. (6 points) Calculate the gradient: r_WL.

(20 points) Backpropagation for Gaussian-process latent variable model. An im-portant component of unsupervised learning is visualizing high-dimensional data in low-dimensional spaces. One such nonlinear algorithm to do so is from Lawrence, NIPS 2004, called GP-LVM. GP-LVM optimizes the maximum-likelihood of a probabilistic model. We won’t get into the details here, but rather to the bottom line: in this paper, a log-likelihood has to be di erentiated with respect to a matrix to derive the optimal parameters.

To do so, we will use apply the chain rule for multivariate derivatives via backpropagation. The log-likelihood is:

L = c	D	log jKj	1	tr(K ¹YY^T )

	2		2

where K = XX^T + ¹I and c is a constant. To solve this, we’ll take the derivatives with

respect to the two terms with dependencies on X:

L₁	=	D		log j XX^T +	¹Ij

		2
L₂	=	1	tr ( XX^T +		1_I) 1_YYT

		2

Hint: To receive full credit, you will be required to show all work. You may use the following matrix derivative without proof:

@L	=	K ^T	@L	K ^T
@K		@K ¹

1. (3 points) Draw a computational graph for L₁.

1. (6 points) Compute ^@_@^L_X¹ .

1. (3 points) Draw a computational graph for L₂.

1. (6 points) Compute ^@_@^L_X² .

1. (2 points) Compute _@^@_X^L.

(40 points) 2-layer neural network. Compete the two-layer neural network Jupyter note-book. Print out the entire workbook and relevant code and submit it as a pdf to gradescope. Download the CIFAR-10 dataset, as you did in HW #2.

(25 points) General FC neural network. Compete the FC Net Jupyter notebook. Print out the entire workbook and relevant code and submit it as a pdf to gradescope.

Deep Learning Homework #3 Solution

Share this:

Share this:

Description

Share this:

Related products

Programming Assignment # 1 Dynamic Memory Allocation Solution

Examining the Effect of Cache Parameters and Program Factors on Cache Hit Rate Solution

LAB 05 QUESTIONS SOLUTION

Homework 10 Solution

Parallel Programming MP2 Solution