Machine Learning Homework #1 solved

$24.99 $18.99

[42 points] Linear regression on a polynomial The filesq1xTrain.npy, q1xTest.npy, q1yTrain.npy and q1yTest.npy specify a linear regression problem for a polynomial. q1xTrain.npy represent the inputs (x(i) ∈ R) and q1yTrain.npy represents the outputs (y(i) ∈ R) of the training set, with one training example per row. [15 points] You will compare the following two optimization…

5/5 – (2 votes)

You’ll get a: zip file solution

 

Categorys:

Description

5/5 – (2 votes)
  • [42 points] Linear regression on a polynomial

The filesq1xTrain.npy, q1xTest.npy, q1yTrain.npy and q1yTest.npy specify a linear regression problem for a polynomial. q1xTrain.npy represent the inputs (x(i) R) and q1yTrain.npy represents the outputs (y(i) R) of the training set, with one training example per row.

  1. [15 points] You will compare the following two optimization methods, in finding the coefficients of a polynomial of degree one (i.e. slope and intercept) that minimize the training loss.

      • Batch gradient descent

      • Stochastic gradient descent

    1. [12 points] Give the coefficients generated by each of the optimization methods. Report the hy-perparameters used to generate the coefficients.

    1. [3 points] We compare two optimization methods in terms of the number of epochs required for convergence. We define an “epoch” as one pass through the entire training samples. Compare the number of epochs to converge for the methods. Which method converges faster? Report the hyperparameters used for comparison. [Hint: in this question, the training process can be viewed

as convergent when the mean squared error EMS = N1 Ni=1(w0 + w1x(i) − y(i))2 on the training dataset is consistently small enough (e.g. ≤ 0.2).]

  1. [15 points] Next, you will investigate the problem of over-fitting. Recall the figure from lecture that explored over-fitting as a function of the degree of the polynomial M, where the Root-Mean-Square (RMS) Error is defined as

I.e., y(i) has mean wT x(i) and variance (σ (i))2 (where the σ(i)’s are fixed, known, constants). Show that finding the maximum likelihood estimate of w reduces to solving a weighted linear regression problem. State clearly what the r(i)’s are in terms of the σ(i)’s.

  1. [18 points] The following items will use the files q2x.npy which contains the inputs (x(i)) and q2y.npy which contains the outputs (y(i)) for a linear regression problem, with one training example per row.

    1. [4 points] Implement (unweighted) linear regression (y = wT x) on this dataset (using the closed form solution we learned in lectures, remember to include the intercept term.). Plot on the same figure the data (each data sample can be shown as a point (x(i), y(i)) in the figure) and the straight line resulting from your fit.

    1. [8 points] Implement locally weighted linear regression on this dataset (using the weighted normal equations you derived in part (b)), and plot on the same figure the data and the curve resulting from your fit. When evaluating local regression at a query point x (which is real-valued in this problem), use weights

r(i) = exp (x x(i))2

2

with a bandwidth parameter τ = 0.8. (Again, remember to include the intercept term.)

  1. [6 points] Repeat (ii) four times with τ = 0.1, 0.3, 2 and 10. Comment briefly on what happens to the fit when τ is too small or too large.

  • [22 points] Derivation and Proof

(a) [8 points] Consider the linear regression problem for 1D data, where we would like to learn a function h(x) = w1x + w0 with parameters w0 and w1 to minimize the sum squared error L = 12 Ni=1(y(i) − h(x(i)))2 for N pairs of data samples (x(i), y(i)). Derive the solution for w0 and w1 for this 1D case of

1

N

(i)

(i)

¯ ¯

¯

¯

i=1 x

y

Y X

N

linear regression. Show the derivation to get the solution w

0

= Y

w

1

X and w

1

=

1

N

2

X¯2

i=1 x(i)

¯

¯

N

(1)

(2)

(N)

(1)

, y

(2)

,···

, y

(N)

}.

where X is the mean of {x

, x

, · · · , x

} and Y is the mean of {y

(b) [14 points] Consider the definition and property of positive (semi-)definite matrix. Let A be a real, symmetric d × d matrix. A is positive semi-definite (PSD) if, for all z Rd, zT Az ≥ 0. A is positive definite (PD) if, for all z ̸= 0, zT Az > 0. We write A 0 when A is PSD, and A 0 when A is PD. The spectral theorem says that every real symmetric matrix A can be expressed via the spectral decomposition A = UΛUT where U is a d × d matrix such that UUT = UT U = I and

  • = diag(λ1, λ2, · · · , λd). Multiplying on the right by U we see that AU = UΛ. If we let ui denote the i-th column of U, we have Aui = λiui for each i. Then λi are eigenvalues of A, and the corresponding columns are eigenvectors associated to λi. The eigenvalues constitute the “spectrum” of A, and the spectral decomposition is also called the eigenvalue decomposition of A.

    1. [6 points] Prove A is PD iff λi > 0 for each i.

    1. [8 points] Consider the linear regression problem where Φ and y are as defined in class and the closed form solution is (ΦT Φ)−1ΦT y. We can get the eigenvalues of symmetric matrix ΦT Φ using spectral decomposition. We apply ridge regression, and the symmetric matrix in the solution is ΦT Φ + βI. Prove that the ridge regression has an effect of shifting all singular values by a constant β. For any β > 0, ridge regression makes the matrix ΦT Φ + βI PD.

Code Submission Instructions

Your solution to Q1 and Q2 should be written in a single python program q1.py and q2.py, respectively. The program should be runnable with the following command line: python3 q1.py (or python3 q2.py) and produce outputs and plots for all subproblems. The program should read the data files (i.e.,*.npy) from the same (current) working directory: for example, X train = np.load(‘q1xTrain.npy’).

The program should print all necessary outputs (e.g. coefficients computed) into standard output (std-out). For plots, it is okay to save figures as multiple files (e.g. q1-b.png) as you want. There are no requirements on the filename or format, as long as it produces valid outputs. Be sure to include all outputs in your writeup.

Please upload your code files (q1.py and q2.py) to Gradescope. Additional python files are allowed.

It is fine to include your outputs into your submission, but this is not required (we will re-run the code).

However, please DO NOT include data files (*.npy) into your Gradescope submission.

Credits

Some questions adopted/adapted from Stanford CS229 materials and from Bishop PRML.

5

Machine Learning Homework #1 solved
$24.99 $18.99