Intro to Big Data Science: Assignment 2 Solution

$30.00 $24.00

Exercise 1 Log into “cookdata.cn”, and enroll the course “êâ‰Æ Ú”. Finish the online exer-cise there. Exercise 2 (Decision Tree) You are trying to determine whether a boy finds a particular type of food appealing based on the food’s temperature, taste, and size. Food Sample Id Appealing Temperature Taste Size 1 No Hot Salty Small…

5/5 – (2 votes)

You’ll get a: zip file solution

 

Description

5/5 – (2 votes)

Exercise 1

Log into “cookdata.cn”, and enroll the course “êâ‰Æ Ú”. Finish the online exer-cise there.

Exercise 2 (Decision Tree)

You are trying to determine whether a boy finds a particular type of food appealing based on the food’s temperature, taste, and size.

Food Sample Id

Appealing

Temperature

Taste

Size

1

No

Hot

Salty

Small

2

No

Cold

Sweet

Large

3

No

Cold

Sweet

Large

4

Yes

Cold

Sour

Small

5

Yes

Hot

Sour

Small

6

No

Hot

Salty

Large

7

Yes

Hot

Sour

Large

8

Yes

Cold

Sweet

Small

9

Yes

Cold

Sweet

Small

10

No

Hot

Salty

Large

  1. What is the initial entropy of “Appealing”?

  1. Assume that “Taste” is chosen as the root of the decision tree. What is the infor-mation gain associated with this attribute.

  1. Draw the full decision tree learned from this data (without any pruning).

1

Exercise 3: (Maximum Likelihood Estimate (MLE, 4Œq, O))

Suppose that the samples {xi }n

are drawn from Normal distribution N (, ¾2) with

1

1

i ˘1

2

2

p.d.f. fµ(x) ˘

p

exp(¡

2¾2

(x ¡)

), where µ ˘ (, ¾ ). The Maximum likelihood esti-

2…¾2

mator (MLE) of µ is the one that maximize the likelihood function

n

Y

L(µ) ˘ fµ(xi )

i ˘1

1. Show that the MLE estimator of the parameters (, ¾2) is

1 Xn

ˆ ˘ x¯ ˘ n i ˘1 xi ,

2. Show that

Eˆ ˘ ,

¾ˆ2˘ 1

n

E n ¾ˆ2

n ¡1

n

X(xi ¡ x¯)2.

  • ˘1

·

  • ¾2,

where E is the expectation. This means that ˆ is an unbiased estimator of , but ¾ˆ2 is a biased estimator of ¾2.

Exercise 4 (MLE for Naive Bayes methods)

Suppose that X and Y are a pair of discrete random variables, i.e., X 2 {1, 2, . . . , t},

  • 2 {1, 2, . . . , c}. Then the probability distribution of Y is solely dependent on the set

c

, where pk ˘ Pr(Y

˘ k) with

c

˘ 1. Similarly, the condi-

of parameters {pk }k˘1

k˘1 pk

tional probability distribution of X given Y is solelyPdependent on the set of parame-

s˘1,…,t

psk

˘ Pr(X ˘ sjY

˘ k) with

t

˘ 1. Now we have a set of

ters {psk }k˘1,…,c , wheren

s˘1 psk

samples {(xi , yi )}i ˘1 drawn independently from thePjoint distribution Pr(X , Y ). Prove

that the MLE of the parameter pk (prior probability) is

n

I(y

k)

pˆk ˘

Pi ˘1

n

i ˘

, k ˘ 1, . . . , c;

and the MLE of the parameter pks is

sk

˘ P

n

˘

˘

i

Pi ˘1 I(yi ˘ k)

1 I(xi ˘ s, yi ˘ k)

pˆ

˘ n

, s 1, . . . , t, k 1, . . . , c.

Exercise 5 (Error bound for 1-nearest-neighbor method) In class, we have estimated that the error for 1-nearest-neighbor rule is roughly twice the Bayes error. Now let us make it more rigorous.

Let us consider the two-class classification problem with X ˘ [0, 1]d and Y ˘ {0, 1}. The underlying joint probability distribution on X £ Y is P(X, Y ) from which we deduce that the marginal distribution of X is pX(x) and the conditional probability distribution is ·(x) ˘ P(Y ˘ 1jX ˘ x). Assume that ·(x) is c-Lipschitz continuous: j·(x·(x0)j É ckx¡ x0k for any x, x0 2 X . Recall that the Bayes rule is f (x) ˘ 1{·(x)¨1/2}. Given a training set

2

  • ˘ {(xi , yi )}ni˘1 with (xi , yi ) i .i».d. P (or equivalently S » Pn ), the 1-nearest-neighbor rule is f 1N N (x) ˘ yS (x) where S (x) ˘ arg mini kx ¡xi k.

Define the generalization error for rule f as E( f ) ˘ E(X,Y P 1Y f(X). Show that

ES»Pn E( f 1N N ) É 2E( f ) ¯cES»Pn Ex»pX kx ¡xS (x)k.

(This means that we can have a precise error estimate for 1-nearest-neighbor rule if we can bound ES»Pn Ex» pX kx ¡x(x)k.)

3

Intro to Big Data Science: Assignment 2 Solution
$30.00 $24.00