Machine Learning Homework #1 solved

Description

5/5 – (2 votes)

[42 points] Linear regression on a polynomial

The filesq1xTrain.npy, q1xTest.npy, q1yTrain.npy and q1yTest.npy specify a linear regression problem for a polynomial. q1xTrain.npy represent the inputs (x⁽ⁱ⁾ ∈ R) and q1yTrain.npy represents the outputs (y⁽ⁱ⁾ ∈ R) of the training set, with one training example per row.

[15 points] You will compare the following two optimization methods, in finding the coefficients of a polynomial of degree one (i.e. slope and intercept) that minimize the training loss.

1. - Batch gradient descent

1. - Stochastic gradient descent

1. [12 points] Give the coefficients generated by each of the optimization methods. Report the hy-perparameters used to generate the coefficients.

1. [3 points] We compare two optimization methods in terms of the number of epochs required for convergence. We define an “epoch” as one pass through the entire training samples. Compare the number of epochs to converge for the methods. Which method converges faster? Report the hyperparameters used for comparison. [Hint: in this question, the training process can be viewed

as convergent when the mean squared error E_MS = _N¹ ^N_i=1(w₀ + w₁x⁽ⁱ⁾ − y⁽ⁱ⁾)² on the training dataset is consistently small enough (e.g. ≤ 0.2).]

[15 points] Next, you will investigate the problem of over-fitting. Recall the figure from lecture that explored over-fitting as a function of the degree of the polynomial M, where the Root-Mean-Square (RMS) Error is defined as

I.e., y⁽ⁱ⁾ has mean w^T x⁽ⁱ⁾ and variance (σ ⁽ⁱ⁾)² (where the σ⁽ⁱ⁾’s are fixed, known, constants). Show that finding the maximum likelihood estimate of w reduces to solving a weighted linear regression problem. State clearly what the r⁽ⁱ⁾’s are in terms of the σ⁽ⁱ⁾’s.

[18 points] The following items will use the files q2x.npy which contains the inputs (x⁽ⁱ⁾) and q2y.npy which contains the outputs (y⁽ⁱ⁾) for a linear regression problem, with one training example per row.

1. [4 points] Implement (unweighted) linear regression (y = w^T x) on this dataset (using the closed form solution we learned in lectures, remember to include the intercept term.). Plot on the same figure the data (each data sample can be shown as a point (x⁽ⁱ⁾, y⁽ⁱ⁾) in the figure) and the straight line resulting from your fit.

1. [8 points] Implement locally weighted linear regression on this dataset (using the weighted normal equations you derived in part (b)), and plot on the same figure the data and the curve resulting from your fit. When evaluating local regression at a query point x (which is real-valued in this problem), use weights

_r(i) ₌ _exp ₋^(x ⁻ ^x⁽ⁱ⁾⁾²

2τ²

with a bandwidth parameter τ = 0.8. (Again, remember to include the intercept term.)

[6 points] Repeat (ii) four times with τ = 0.1, 0.3, 2 and 10. Comment briefly on what happens to the fit when τ is too small or too large.

[22 points] Derivation and Proof

(a) [8 points] Consider the linear regression problem for 1D data, where we would like to learn a function h(x) = w₁x + w₀ with parameters w₀ and w₁ to minimize the sum squared error L = ¹₂ ^N_i=1(y⁽ⁱ⁾ − h(x⁽ⁱ⁾))² for N pairs of data samples (x⁽ⁱ⁾, y⁽ⁱ⁾). Derive the solution for w₀ and w₁ for this 1D case of

(i)

¯ ¯

−

i=1 ^x

−Y X

linear regression. Show the derivation to get the solution w

= Y

X and w

₋_X¯2

_i₌₁ x⁽ⁱ⁾

(1)

(2)

(N)

(1)

, y

(2)

,···

, y

(N)

where X is the mean of {x

, x

, · · · , x

} and Y is the mean of {y

(b) [14 points] Consider the definition and property of positive (semi-)definite matrix. Let A be a real, symmetric d × d matrix. A is positive semi-definite (PSD) if, for all z ∈ R^d, z^T Az ≥ 0. A is positive definite (PD) if, for all z ̸= 0, z^T Az > 0. We write A 0 when A is PSD, and A ≻ 0 when A is PD. The spectral theorem says that every real symmetric matrix A can be expressed via the spectral decomposition A = UΛU^T where U is a d × d matrix such that UU^T = U^T U = I and

= diag(λ₁, λ₂, · · · , λ_d). Multiplying on the right by U we see that AU = UΛ. If we let u_i denote the i-th column of U, we have Au_i = λ_iu_i for each i. Then λ_i are eigenvalues of A, and the corresponding columns are eigenvectors associated to λ_i. The eigenvalues constitute the “spectrum” of A, and the spectral decomposition is also called the eigenvalue decomposition of A.

1. [6 points] Prove A is PD iff λ_i > 0 for each i.

1. [8 points] Consider the linear regression problem where Φ and y are as defined in class and the closed form solution is (Φ^T Φ)⁻¹Φ^T y. We can get the eigenvalues of symmetric matrix Φ^T Φ using spectral decomposition. We apply ridge regression, and the symmetric matrix in the solution is Φ^T Φ + βI. Prove that the ridge regression has an effect of shifting all singular values by a constant β. For any β > 0, ridge regression makes the matrix Φ^T Φ + βI PD.

Code Submission Instructions

Your solution to Q1 and Q2 should be written in a single python program q1.py and q2.py, respectively. The program should be runnable with the following command line: python3 q1.py (or python3 q2.py) and produce outputs and plots for all subproblems. The program should read the data files (i.e.,*.npy) from the same (current) working directory: for example, X train = np.load(‘q1xTrain.npy’).

The program should print all necessary outputs (e.g. coefficients computed) into standard output (std-out). For plots, it is okay to save figures as multiple files (e.g. q1-b.png) as you want. There are no requirements on the filename or format, as long as it produces valid outputs. Be sure to include all outputs in your writeup.

Please upload your code files (q1.py and q2.py) to Gradescope. Additional python files are allowed.

It is fine to include your outputs into your submission, but this is not required (we will re-run the code).

However, please DO NOT include data files (*.npy) into your Gradescope submission.

Credits

Some questions adopted/adapted from Stanford CS229 materials and from Bishop PRML.

Machine Learning Homework #1 solved

Share this:

Share this:

Description

Share this:

Related products

Homework 5: Heap ADT using STL Solution

Examining the Effect of Cache Parameters and Program Factors on Cache Hit Rate Solution

Homework 10 Solution

Simulation project Task 1- Hand simulation Solution

Lab 4: Bash Script and Bitwise Operations Solution