Statistical Inference and Machine Learning Homework 2 Solution

$30.00 $24.00

This assignment can be solved in groups of 1 up to 5 students. You must mention the name of all the participants. Note that all the students in a group will get the same grade. Deadline: 25 November 2020, 23:59 (No late submissions will be accepted) Upload a single pdf file on Moodle containing your…

5/5 – (2 votes)

You’ll get a: zip file solution

 

Description

5/5 – (2 votes)
  • This assignment can be solved in groups of 1 up to 5 students. You must mention the name of all the participants. Note that all the students in a group will get the same grade.

  • Deadline: 25 November 2020, 23:59 (No late submissions will be accepted)

  • Upload a single pdf file on Moodle containing your solution.

1 Feature Selection [60 pts]

Algorithm:

Given a dataset S = {(Y i, Xi)}ni=1 of n instances, where features X = (X1, . . . , Xd) 2 Rd, and labels

  1. = {1,…,K}.

    • For each value of the label Y = k

Estimate density p(Y = k)

    • For each feature Xi, i = {1, . . . , d}

Estimate its density p(Xi)

For each value of the label Y = k, estimate the density p(Xi|Y = k)

Score feature Xi, i = {1, . . . , d}, using

xi2XX,y2Y p(xi, y) log2(

p(xi, y)

I(Xi, Y ) =

)

(1)

p(xi)p(y)

where X and Y denote the support sets of Xi and Y .

  • Choose those feature Xi with high score Ii

Insight: Informativeness of a feature

  • We are uncertain about label Y before seeing any input.

Suppose we quantify using entropy H(Y ), defined as

X

H(Y ) = − p(y) log2 p(y) (2)

y2Y

where Y denotes the support sets of Y .

1

Statistical Inference and Machine Learning Homework 2 Solution
$30.00 $24.00