Description
Exercise 1
Log into “cookdata.cn”, and enroll the course “êâ‰Æ Ú”. Finish the online exer-cise there.
Exercise 2 (Decision Tree)
You are trying to determine whether a boy finds a particular type of food appealing based on the food’s temperature, taste, and size.
-
Food Sample Id
Appealing
Temperature
Taste
Size
1
No
Hot
Salty
Small
2
No
Cold
Sweet
Large
3
No
Cold
Sweet
Large
4
Yes
Cold
Sour
Small
5
Yes
Hot
Sour
Small
6
No
Hot
Salty
Large
7
Yes
Hot
Sour
Large
8
Yes
Cold
Sweet
Small
9
Yes
Cold
Sweet
Small
10
No
Hot
Salty
Large
-
What is the initial entropy of “Appealing”?
-
Assume that “Taste” is chosen as the root of the decision tree. What is the infor-mation gain associated with this attribute.
-
Draw the full decision tree learned from this data (without any pruning).
1
Exercise 3: (Maximum Likelihood Estimate (MLE, 4Œq, O))
-
Suppose that the samples {xi }n
are drawn from Normal distribution N („, ¾2) with
1
1
i ˘1
2
2
p.d.f. fµ(x) ˘
p
exp(¡
2¾2
(x ¡„)
), where µ ˘ („, ¾ ). The Maximum likelihood esti-
2…¾2
mator (MLE) of µ is the one that maximize the likelihood function
n
Y
L(µ) ˘ fµ(xi )
i ˘1
1. Show that the MLE estimator of the parameters („, ¾2) is
1 Xn
„ˆ ˘ x¯ ˘ n i ˘1 xi ,
2. Show that
E„ˆ ˘ „,
¾ˆ2˘ 1
n
E‡ n ¾ˆ2
n ¡1
n
X(xi ¡ x¯)2.
-
˘1
·
-
¾2,
where E is the expectation. This means that „ˆ is an unbiased estimator of „, but ¾ˆ2 is a biased estimator of ¾2.
Exercise 4 (MLE for Naive Bayes methods)
Suppose that X and Y are a pair of discrete random variables, i.e., X 2 {1, 2, . . . , t},
-
2 {1, 2, . . . , c}. Then the probability distribution of Y is solely dependent on the set
-
c
, where pk ˘ Pr(Y
˘ k) with
c
˘ 1. Similarly, the condi-
of parameters {pk }k˘1
k˘1 pk
tional probability distribution of X given Y is solelyPdependent on the set of parame-
s˘1,…,t
psk
˘ Pr(X ˘ sjY
˘ k) with
t
˘ 1. Now we have a set of
ters {psk }k˘1,…,c , wheren
s˘1 psk
samples {(xi , yi )}i ˘1 drawn independently from thePjoint distribution Pr(X , Y ). Prove
that the MLE of the parameter pk (prior probability) is
n
I(y
k)
pˆk ˘
Pi ˘1
n
i ˘
, k ˘ 1, . . . , c;
and the MLE of the parameter pks is
sk
˘ P
n
˘
˘
i
Pi ˘1 I(yi ˘ k)
1 I(xi ˘ s, yi ˘ k)
pˆ
˘ n
, s 1, . . . , t, k 1, . . . , c.
Exercise 5 (Error bound for 1-nearest-neighbor method) In class, we have estimated that the error for 1-nearest-neighbor rule is roughly twice the Bayes error. Now let us make it more rigorous.
Let us consider the two-class classification problem with X ˘ [0, 1]d and Y ˘ {0, 1}. The underlying joint probability distribution on X £ Y is P(X, Y ) from which we deduce that the marginal distribution of X is pX(x) and the conditional probability distribution is ·(x) ˘ P(Y ˘ 1jX ˘ x). Assume that ·(x) is c-Lipschitz continuous: j·(x)¡·(x0)j É ckx¡ x0k for any x, x0 2 X . Recall that the Bayes rule is f ⁄(x) ˘ 1{·(x)¨1/2}. Given a training set
2
-
˘ {(xi , yi )}ni˘1 with (xi , yi ) i .i».d. P (or equivalently S » Pn ), the 1-nearest-neighbor rule is f 1N N (x) ˘ y…S (x) where …S (x) ˘ arg mini kx ¡xi k.
Define the generalization error for rule f as E( f ) ˘ E(X,Y )»P 1Y 6˘f(X). Show that
ES»Pn E( f 1N N ) É 2E( f ⁄) ¯cES»Pn Ex»pX kx ¡x…S (x)k.
(This means that we can have a precise error estimate for 1-nearest-neighbor rule if we can bound ES»Pn Ex» pX kx ¡x…(x)k.)
3