Homework 1 (Week 2), Problem 2

Description

5/5 – (2 votes)

Please turn this problem in with your other Homework 1 solutions, and also upload your code file as instructed.

Problem 2

After learning the regression part and different regularizations, Bob is interested in trying them out right away! He starts with the linear regression problem: given that the feature vector and the observation have a linear relationship = + ₀ + , estimate the weight vector = [ ₁, ₂, … ] and the bias ₀ from multiple data points. For simplicity in writing, we can augment the feature space and now parameters to be estimated can be written as = [ ₀, ₁, ₂, … ]. Here is the observation noise on the output labels y.

Bob starts to collect some samples to generate his dataset. He does the collection for several times and gets several datasets with different numbers of samples:

	number of training	number of testing
	samples ( )	samples
Dataset1	10	1000
Dataset2	100	1000
Dataset3	1000	1000

Could you help him out on analysis on all the datasets above?

Given that the dimension of features is 9 (before augmentation), estimate the and try three regularization settings: [no regularization, ₁ regularization, ₂ regularization] and report the

corresponding statistics. For each regularization setting to try, you need to search for a good regularization coefficient over the range −10 ≤ log₂ ≤ 10 with step size of 0.5 for log₂ , and use MSE (mean squared error) on the validation set to choose the best one. During the

parameter search, you need to do 5-fold cross validation on each parameter value you try. Tip: after finding the best value of , use that value for one final training run using all training data points (nothing held out as a validation set), to get the weight vector and training MSE.

1. Fill all your numerical results into the following table. (Each dataset should have a different table. So for this question you’ll have 3 tables.)

1. Based on statistics on all datasets, answer the following questions:
  1. Comparison of test MSE with no regularizer, ₁ regularizer, and ₂ regularizer for a given (your answer might also depend on )

1. 1. Does each regularizer lower the corresponding norm of ? by very much? Please explain. Why are these answers different depending on ?

1. 1. Observe and explain the dependence of sparsity on regression method, and on different values of and .

			Model selection		Performance
	Best
	param		Mean of MSE	Std of MSE	MSE on train	MSE on test
	log₂
Least	–		–	–
Least		(show your estimated w)
square		(show your estimated w)
square		₁	( ) =	₂( ) =	Spars=
		₁	( ) =	₂( ) =	Spars=

LASSO		(show your estimated w)
		₁	( ) =	₂( ) =	Spars=

Ridge		(show your estimated w)
		₁	( ) =	₂( ) =	Spars=
		₁	( ) =	₂( ) =	Spars=

Caption for statistics in the table:

Best param : the regularization coefficient you choose using cross validation.

Mean of MSE: the averaged MSE of the 5-fold cross validation process for your chosen .

Std of MSE: the standard deviation of MSE of the 5-fold cross validation process for your chosen

₁( ): ₁ norm of
₂( ): ₂ norm of

Spars: Sparsity, i.e., the number of zeros in the augmented weight vector

Bob learned that ₁ regularization could lead to more sparsity, and he really wants to visualize this. So he collects another bunch of datasets for 2-dimensional (before augmentation) features:

	number of training	number of testing
	samples ( )	samples
Dataset4	10	1000
Dataset5	30	1000
Dataset6	100	1000
Dataset7	10	1000
Dataset8	30	1000
Dataset9	100	1000

He tries them out and find that the last three datasets (7,8,9) are “special cases” where the ₁ norm might not provide the intended result.

Repeat (a)(i) for all new datasets. (You’ll have 6 tables)
For each dataset, draw the following plot in the 2D space ₂ vs. ₁ with ₀ =

your estimated ₀: (1) draw the curve of ‘MSE = training_MSE of your estimated and ‘MSE=10+training_MSE of your estimated ; (2) draw the curve for ‖ ‖₁ =

the ₁ norm of your estimated . Repeat this plot drawing for ridge regression results, except for (2) draw the curve for ‖ ‖₂ = the ₂ norm of your estimated . (therefore you have 2 plots for each dataset. An example is shown below.)

Based on the statistics and plots, answer the following questions: 1. Observe and explain how the plots relate to sparsity.

Can you explain how much effect the regularizer has, from looking at the plots (i.e., how different the regularized performance (MSE) is from the unregularized performance)

Observe and explain how Lasso has a different effect with the “special case” datasets than the other datasets

Hint: please refer to the example code file in the homework folder on how to generate such plots.

Homework 1 (Week 2), Problem 2

Share this:

Share this:

Description

Share this:

Related products

Assignment #2 Solution

Foundation of statistical inference

Homework 10 Solution

Assignment-7 Binary Search Trees II:Solution

Assignment 1 C++ FUNDAMENTALS Solution