Problem Set 5: Boosting, Unsupervised learning Solution

$35.00 $29.00

Submission instructions Submit your solutions electronically on the course Gradescope site as PDF les. If you plan to typeset your solutions, please use the LaTeX solution template. If you must submit scanned handwritten solutions, please use a black pen on blank white paper and a high-quality scanner app. AdaBoost [5 pts] In the lecture on…

5/5 – (2 votes)

You’ll get a: zip file solution

 

Description

5/5 – (2 votes)

Submission instructions

Submit your solutions electronically on the course Gradescope site as PDF les.

If you plan to typeset your solutions, please use the LaTeX solution template. If you must submit scanned handwritten solutions, please use a black pen on blank white paper and a high-quality scanner app.

  • AdaBoost [5 pts]

In the lecture on ensemble methods, we said that in iteration t, AdaBoost is picking (ht; t) that minimizes the objective:

(ht (x); t ) =

X

yn tht(xn)

arg min

wt(n)e

(ht(x); t) n

X

=

arg min(e t

e t )

wt(n)I[yn 6= ht(xn)]

(ht(x); t)

n

X

+e t

wt(n)

n

We de ne the weighted misclassi cation error at time t, to be

=

w (n) [y

= h (x )]. Also

P

t

t

Pn t I

n 6 t n

the weights are normalized so that

n wt(n) = 1.

  1. Take the derivative of the above objective function with respect to t and set it to zero to solve for t and obtain the update for t.

  1. Suppose the training set is linearly separable, and we use a hard-margin linear support vector machine (no slack) as a base classi er. In the rst boosting iteration, what would the resulting 1 be?

  • K-means for single dimensional data [5 pts]

In this problem, we will work through K-means for a single dimensional data.

  1. Consider the case where K = 3 and we have 4 data points x1 = 1; x2 = 2; x3 = 5; x4 = 7. What is the optimal clustering for this data ? What is the corresponding value of the objective

?

Parts of this assignment are adapted from course material by Jenna Wiens (UMich) and Tommi Jaakola (MIT).

1

  1. One might be tempted to think that Lloyd’s algorithm is guaranteed to converge to the global minimum when d = 1. Show that there exists a suboptimal cluster assignment (i.e., initialization) for the data in the above part that Lloyd’s algorithm will not be able to improve (to get full credit, you need to show the assignment, show why it is suboptimal and explain why it will not be improved).

2

  • Gaussian Mixture Models [8 pts]

We would like to cluster data fx1; : : : ; xN g, xn 2 Rd using a Gaussian Mixture Model (GMM) with K mixture components. To do this, we need to estimate the parameters of the GMM, i.e., we need to set the values = f!k; k; kgKk=1 where !k is the mixture weight associated with mixture component k, and k and k denote the mean and the covariance matrix of the Gaussian distribution associated with mixture component k.

If we knew which cluster each sample xn belongs to (we had complete data), we showed in the lecture on Clustering that the log likelihood l is what we have below and we can compute the maximum likelihood estimate (MLE) of all the parameters.

X

l( ) =

log p(xn; zn)

n

(

nk log N(xnj k; k))

=

n

nk log !k +

n

(1)

Xk

X

Xk

X

Since we do not have complete data, we use the EM algorithm. The EM algorithm works by iterating between setting each nk to the posterior probability p(zn = kjxn) (step 1 on slide 26 of the lecture on Clustering) and then using nk to nd the value of that maximizes l (step 2 on slide 26). We will now derive updates for one of the parameters, i.e., j (the mean parameter associated with mixture component j).

(a) To maximize l, compute r j l( ): the gradient of l( ) with respect to j.

(b) Set the gradient to zero and solve for j to show that j = P 1 Pn njxn.

n nj

  1. Suppose that we are tting a GMM to data using K = 2 components. We have N = 5 samples in our training data with xn; n 2 f1; : : : ; Ng equal to: f5; 15; 25; 30; 40g.

We use the EM algorithm to nd the maximum likeihood estimates for the model parameters, which are the mixing weights for the two components, !1 and !2, and the means for the two components, 1 and 2. The standard deviations for the two components are xed at 1. Suppose that at the end of step 1 of iteration 5 in the EM algorithm, the soft assignment nk for the ve data items are as shown in Table 1.

1

2

0:2

0:8

0:2

0:8

0:8

0:2

0:9

0:1

0:9

0:1

Table 1: Entry in row n and column k of the table corresponds to nk

What are updated values for the parameters !1, !2, 1, and 2 at the end of step 2 of the EM algorithm?

3

Problem Set 5: Boosting, Unsupervised learning Solution
$35.00 $29.00