Name: MDS5210 · Homework 1
SKU: 28307
Price: 30.00 USD
Availability: InStock

Description

Rate this product

Problem 1 (30pts). Fundamental Knowledge

Clearly state the difference between supervised learning and unsupervised learning.

Explain the usage of training set, validation set, and test set in a learning task. Also explain why we need a validation set.

Suppose we have a dataset of people in which we record their heights h_i, as well as length of left arms l_i, and right arms r_i. Suppose h_i ∼ N (10, 2) (in unspecified units), and l_i ∼ N (ρ_ih_i, 0.02) and r_i ∼ N (ρ_ih_i, σ²), with ρ_i ∼ Unif(0.6, 0.7). Is using both arms necessarily a better choice than using only one arm to approximate h_i? What if σ² = 0.02? Explain the intuition.

Let X ∈ R^n×d be a full column rank matrix. Explain why X^T X is positive definite using SVD. (Hint: The singular matrices are orthonormal)

Consider the problem of

min ∥Xθ − y∥²₂ + λ∥θ∥²₂.

Suppose X is full column rank, write down its optimal solution θ^∗.

Problem 2 (30pts). Least Square without Full Column Rank Consider the problem

min ∥Xθ − y∥²₂,

where X ∈ R^n×d, θ ∈ R^d, y ∈ Rⁿ.

(1) Given

X = 1 0 , y = 1 .

Draw the figure of the objective function using python.

(2) The thin SVD of X is given by
	U^T	= VΣ₁U₁^T .
X=V Σ₁ 0	_T1	= VΣ₁U₁^T .
	^U2

Show that when n < d, optimal solutions are non-unique. Derive the expression of the optimal

solutions using thin SVD. (Hint: Let A := VΣ₁, z := U^T₁ θ. Solve ∥Az − y∥²₂ first, then solve U^T₁ θ = z)

Problem 3 (50pts). A Robust LP Formulation Suppose we have the generative linear regression model

=Xθ^⋆+ϵ,

where ϵ is the error term and ϵ ∼ N(0, Σ). The maximum likelihoog estimator for θ is:

θ_LS = argmin

∥Xθ − y∥₂

θ∈R^d

= (X^T X)⁻¹X^T y.

distribution,

i.e.

i.i.d

(a) Suppose

the error

term, ϵ = [ε₁, ε₂, · · · , ε_n]

follows the Laplace

∼

· · ·

| _i− |

L(0, b), i

= 1,2,

, n and the probability density function is P (ε

) =

e⁻

for some

b > 0. Under the MLE principle, what is the learning problem? Please write out the derivation process. (15 points)

Figure 1: PDF of Laplace distribution

(b) Huber-smoothing. L1-norm minimization

ˆ	=	θ	∥		−		∥₁
^θL1				Xθ		y
		argmin

is one possible solution for robust regression. However, it is nondifferentiable. We utilize smoothing technique for approximately solving the L1-norm minimization. Huber function is one possibility. The definition and sketch map are shown as below.

h_µ(z) _z2	\|z\|,	µ		\|z\| ≥ µ
	+		,	\|z\| ≤ µ
2µ		2

Then,

H_µ(Z) = h_µ(z_j).

j=1

By using Huber smoothing, the approximation of the optimization of L1−norm can be changed to

min H_µ(Xθ − y).

Let

f(θ) = H_µ(Xθ − y),

find the gradient ▽f(θ).(10 points)

0.8

0.6

0.4

0.2

–

Figure 2: Huber smoothing

Gradient descent for minimizing f(θ). The process of gradient descent algorithm is shown in the following table.

1. Input: observed data X,y and initialization parameter θ₀ Huber smoothing parameter µ,

total iteration number T , learning rate α.

1. for k = 1, 2, · · · , T ,do

1. θ_k+1 = θ_k − α▽f(θ_k)
2. end for
3. return θ_T

The data set is generated by the linear model

=Xθ^⋆+ϵ₁+ϵ₂,

where ϵ₁ ∈ R ⁿ follows Gaussian distribution, ϵ₂ are outliers. Given the observed data (x, y) = {(x₁, y₁), (x₂, y₂), · · · , (x_n, y_n)} and true value θ^⋆,

ˆ		ˆ	⋆
(1) calculate the estimation θ_LS	by using linear least squares and compute	θ_LS−θ	2	.(5

points)

suppose n = 1000, d = 50, use python to implement the gradient descent algorithm to minimize f(θ),the parameters are set as µ = 10⁻⁵, α = 0.001, T = 1000, plot the error ∥θ_k − θ^⋆∥₂ as a function of iteraction number.You can download the data {y, X, θ^⋆} from Blackboard.(20 points)

MDS5210 · Homework 1

Share this:

Share this:

Description

Share this:

Related products

MDS5210 Homework 3 solution

Homework 2 Problem 1 (20pts). Fundamental Knowledge for Generalization Theory