Name: Intro to Big Data Science: Assignment 2 Solution
SKU: 11389
Price: 30.00 USD
Availability: InStock

Description

5/5 – (2 votes)

Exercise 1

Log into “cookdata.cn”, and enroll the course “êâ‰Æ Ú”. Finish the online exer-cise there.

Exercise 2 (Decision Tree)

You are trying to determine whether a boy finds a particular type of food appealing based on the food’s temperature, taste, and size.

Food Sample Id	Appealing	Temperature	Taste	Size

1	No	Hot	Salty	Small

2	No	Cold	Sweet	Large

3	No	Cold	Sweet	Large

4	Yes	Cold	Sour	Small

5	Yes	Hot	Sour	Small

6	No	Hot	Salty	Large

7	Yes	Hot	Sour	Large

8	Yes	Cold	Sweet	Small

9	Yes	Cold	Sweet	Small

10	No	Hot	Salty	Large

What is the initial entropy of “Appealing”?

Assume that “Taste” is chosen as the root of the decision tree. What is the infor-mation gain associated with this attribute.

Draw the full decision tree learned from this data (without any pruning).

Exercise 3: (Maximum Likelihood Estimate (MLE, 4Œq, O))

Suppose that the samples {x_i }ⁿ						are drawn from Normal distribution N („, ¾²) with
		1		1	i ˘1	2	2
p.d.f. f_µ(x) ˘	p		exp(¡	2¾²	(x ¡„)		), where µ ˘ („, ¾ ). The Maximum likelihood esti-
		2…¾²

mator (MLE) of µ is the one that maximize the likelihood function

L(µ) ˘ f_µ(x_i )

i ˘1

1. Show that the MLE estimator of the parameters („, ¾²) is

₁ Xⁿ

^„^{ˆ ˘}^x^{¯ ˘}_n _i _˘1 ^xi ^,

2. Show that

E„ˆ ˘ „,

_¾_ˆ2_˘ ¹

_E^‡ ⁿ _¾_ˆ2

n ¡1

^X(x_i ¡ x¯)².

¾²,

where E is the expectation. This means that „ˆ is an unbiased estimator of „, but ¾ˆ² is a biased estimator of ¾².

Exercise 4 (MLE for Naive Bayes methods)

Suppose that X and Y are a pair of discrete random variables, i.e., X 2 {1, 2, . . . , t},

2 {1, 2, . . . , c}. Then the probability distribution of Y is solely dependent on the set

, where p_k ˘ Pr(Y

˘ k) with

˘ 1. Similarly, the condi-

of parameters {p_k }_k_˘1

_k_˘1 ^pk

tional probability distribution of X given Y is solely^Pdependent on the set of parame-

s˘1,…,t

^psk

˘ Pr(X ˘ sjY

˘ k) with

˘ 1. Now we have a set of

^{ters {}^psk ^}_k_˘1,…,_c ^{, where}_n

_s_˘1 ^psk

samples {(x_i , y_i )}_i _˘1 drawn independently from the^Pjoint distribution Pr(X , Y ). Prove

that the MLE of the parameter p_k (prior probability) is

I(y

pˆ_k ˘

^Pi ˘1

_i ˘

, k ˘ 1, . . . , c;

and the MLE of the parameter p_ks is

˘ ^P

^P_i _˘1 I(y_i ˘ k)

₁ I(x_i ˘ s, y_i ˘ k)

pˆ

˘ _n

, s 1, . . . , t, k 1, . . . , c.

Exercise 5 (Error bound for 1-nearest-neighbor method) In class, we have estimated that the error for 1-nearest-neighbor rule is roughly twice the Bayes error. Now let us make it more rigorous.

Let us consider the two-class classification problem with X ˘ [0, 1]^d and Y ˘ {0, 1}. The underlying joint probability distribution on X £ Y is P(X, Y ) from which we deduce that the marginal distribution of X is p_X(x) and the conditional probability distribution is ·(x) ˘ P(Y ˘ 1jX ˘ x). Assume that ·(x) is c-Lipschitz continuous: j·(x)¡·(x⁰)j É ckx¡ x⁰k for any x, x⁰ 2 X . Recall that the Bayes rule is f ^⁄(x) ˘ 1_{_·₍_x_)¨1/2}. Given a training set

˘ {(x_i , y_i )}ⁿ_i_˘1 with (x_i , y_i ) ⁱ ^.ⁱ»^.^d^. P (or equivalently S » Pⁿ ), the 1-nearest-neighbor rule is f ¹^{N N} (x) ˘ y_…_S ₍_x₎ where …_S (x) ˘ arg min_i kx ¡x_i k.

Define the generalization error for rule f as E( f ) ˘ E₍_X_,_Y _)»_P 1_Y _6˘_f₍_X₎. Show that

E_S_»_Pn E( f ¹^{N N} ) É 2E( f ^⁄) ¯cE_S_»_Pn E_x_»_p_X kx ¡x_…_S ₍_x₎k.

(This means that we can have a precise error estimate for 1-nearest-neighbor rule if we can bound E_S_»_Pn E_x_» _p_X kx ¡x_…₍_x₎k.)

Intro to Big Data Science: Assignment 2 Solution

Share this:

Share this:

Description

Share this:

Related products

Homework 5: Heap ADT using STL Solution

Examining the Effect of Cache Parameters and Program Factors on Cache Hit Rate Solution

Foundation of statistical inference

Lab 4: Bash Script and Bitwise Operations Solution

Problem Set 4 Solution