Machine Learning Homework #1

Description

5/5 – (2 votes)

Short Answer and \True or False” Conceptual questions

The answers to these questions should be answerable without referring to external materials. Brie y justify your answers with a few words.

1. [2 points] What is bias-variance tradeo ?

1. [2 points] What typically happens to bias and variance when the model complexity increases/decreases?

1. [1 point] True or False: A learning algorithm will always generalize better if we use fewer features to represent our data.

1. [2 points] True or False: Hyperparameters should be tuned on the test set. Explain your choice and detail a procedure for hyperparameter tuning.

1. [1 point] True or False: The training error of a function on the training set provides an overestimate of the true error of that function.

What to Submit:

Parts c-e: True or False

Parts a-e: True or False with brief (2-3 sentence) explanation

Maximum Likelihood Estimation (MLE)

You’re the Reign FC manager, and the team is ve games into its 2021 season. The number of goals scored by the team in each game so far are given below:

[2; 4; 6; 0; 1]:

Let’s call these scores x₁; : : : ; x₅. Based on your (assumed iid) data, you’d like to build a model to understand how many goals the Reign are likely to score in their next game. You decide to model the number of goals scored per game using a Poisson distribution. Recall that the Poisson distribution with parameter assigns every non-negative integer x = 0; 1; 2; : : : a probability given by

Poi(xj ) = e ^x :

a. [5 points] Derive an expression for the maximum-likelihood estimate of the parameter governing the Poisson distribution in terms of goal counts for the rst n games: x₁; : : : ; x_n. (Hint: remember that the log of the likelihood has the same maximizer as the likelihood function itself.)

[2 points] Give a numerical estimate of after the rst ve games. Given this , what is the probability that the Reign score 6 goals in their next game?

What to Submit:

Part a: An expression for the MLE of after n games and relevant derivation

Parts b: A numerical estimate for and the probability that the Reign score 6 next game.

Polynomial Regression

Relevant Files¹

•	polyreg.py		•	test	polyreg	learningCurve.py

•	linreg	closedform.py	•	data/polydata.dat

- test polyreg univariate.py

[10 points] Recall that polynomial regression learns a function h (x) = ₀ + ₁x + ₂x² + : : : + _dx^d, where d represents the polynomial’s highest degree. We can equivalently write this in the form of a linear model with d features

h (x) = ₀ + _{1 1}(x) + _{2 2}(x) + : : : + _{d d}(x) ;

(1)

using the basis expansion that _j(x) = x^j. Notice that, with this basis expansion, we obtain a linear model where the features are various powers of the single univariate x. We’re still solving a linear regression problem, but are tting a polynomial function of the input.

Implement regularized polynomial regression in polyreg.py. You may implement it however you like, using gradient descent or a closed-form solution. However, I would recommend the closed-form solution since the data sets are small; for this reason, we’ve included an example closed-form implementation of linear regression in linreg closedform.py. Please follow the API below. Note that all matrices are actually 2D numpy arrays in the implementation.

init (degree=1, regLambda=1E-8) : constructor with arguments of d and

fit(X,Y): method to train the polynomial regression model

predict(X): method to use the trained polynomial regression model for prediction

polyfeatures(X, degree): expands the given n 1 matrix X into an n d matrix of polynomial features of degree d. Note that the returned matrix will not include the zero-th power.

Note that the polyfeatures(X, degree) function maps the original univariate data into its higher order powers. Speci cally, X will be an n 1 matrix (X 2 Rⁿ ¹) and this function will return the polynomial expansion of this data, a n d matrix. Note that this function will not add in the zero-th order feature (i.e., x₀ = 1). You should add the x₀ feature separately, outside of this function, before training the model.

By not including the x₀ column in the matrix polyfeatures(), this allows the polyfeatures function to be more general, so it could be applied to multi-variate data as well. (If it did add the x₀ feature, we’d end up with multiple columns of 1’s for multivariate data.)

Also, notice that the resulting features will be badly scaled if we
use them in raw form. For example, with a polynomial of de-
gree d = 8 and x = 20, the basis expansion yields x¹ = 20 while
x⁸ = 2:56 10¹⁰ { an absolutely huge di erence in range. Con-
sequently, we will need to standardize the data before solving
linear regression. In fit(), after you perform the polynomial
feature expansion, you should standardize the values with the
same degrees by subtracting their mean and divide the result	Figure 1: Fit of polynomial regression with =
with their standard deviation. You’ll need to apply the same	0 and d = 8

standardization transformation in predict() before you apply it to new data. After standardization, be sure to add bias term before the column with lowest degree (i.e. add a column of ones to the left most of resulting matrix). You can achieve this with pad() function in numpy.

Run test polyreg univariate.py to test your implementation, which will plot the learned function. In this case, the script ts a polynomial of degree d = 8 with no regularization = 0. From the plot, we see that the function ts the data well, but will not generalize well to new data points. Try increasing the amount of regularization, and in 1-2 sentences, describe the resulting e ect on the function (you may also provide an additional plot to support your analysis).

What to Submit:

Write-up: The description of resulting e ect of increasing regularization.

Code: Implement all functions with NotImplementedError in polreg.py and submit on on Gradescope

Ridge Regression on MNIST

In this problem we will implement a regularized least squares classi er for the MNIST data set. The task is to classify handwritten images of numbers between 0 to 9.

You are NOT allowed to use any of the pre-built classi ers in sklearn. Feel free to use any method from numpy or scipy. Remember: if you are inverting a matrix in your code, you are probably doing something wrong (Hint: look at scipy.linalg.solve).

Each example has features x_i 2 R^d (with d = 28 28 = 784) and label z_j 2 f0; : : : ; 9g. You can visualize a single example x_i with imshow after reshaping it to its original 28 28 image shape (and noting that the label z_j is accurate). We wish to learn a predictor fbthat takes as input a vector in R^d and outputs an index in f0; : : : ; 9g. We de ne our training and testing classi cation error on a predictor f as

_train(f) =	1	X	1ff(x) 6= zg
	^Ntrain
b	1	(x;z)2Training Set
_test(f) =		X	1ff(x) 6= zg
	^N^test(x;z)2Test Set
b

We will use one-hot encoding of the labels: for each observation (x; z), the original label z 2 f0; : : : ; 9g is mapped to the standard basis vector e_z+1 where e_i is a vector of size k containing all zeros except for a 1 in the i^th position (positions in these vectors are indexed starting at one, hence the z + 1 o set for the digit labels). We adopt the notation where we have n data points in our training objective with features x_i 2 R^d and label one-hot encoded as y_i 2 f0; 1g^k. Here, k = 10 since there are 10 digits.

a. [10 points] In this problem we will choose a linear classi er to minimize the regularized least squares objective:

c	n
c	X_i	y_ik₂² + kW k_F²
W = argmin_W _2Rd k	kW ^T x_i	y_ik₂² + kW k_F²
	=1

Implement a function train that takes as input X 2 R^{n d}, Y 2 f0; 1g^{n k}, > 0 and returns

_Wc 2 _R^{d k}_.

• Implement a function one_hot that takes as input Y 2 f0; :::; k 1gⁿ, and returns Y 2 f0; 1g^{n k}.

Share this:

Share this:

Description

Share this: