Assignment-One Solution

$30.00 $24.00

Provide credit to any sources other than the course sta that helped you solve the problems. This includes the names of all students you talked to regarding the problems. You can look up de nitions/basics online (eg wikipedia, stack-exchange, etc) The questions marked with one asterisk can be slightly involved. Problem 1. (10 points). Design…

5/5 – (2 votes)

You’ll get a: zip file solution

 

Description

5/5 – (2 votes)

Provide credit to any sources other than the course sta that helped you solve the problems. This includes the names of all students you talked to regarding the problems.

You can look up de nitions/basics online (eg wikipedia, stack-exchange, etc) The questions marked with one asterisk can be slightly involved.

Problem 1. (10 points). Design the decision tree for the tennis data using Gini impurity mea-sure. Compute the Gini measure for all attributes at each node, and continue until all the examples are correctly labeled by the tree.

Problem 2 (20 points). Consider the training set given in Table 1. The attribute \Shirt Size Fine” is a re nement of the attribute \Shirt Size”, wherein the value \Medium” has been further categorized into two values \Small-Medium” and \Large-Medium”.

  1. What is the entropy of the labels?

  1. Compute the information gain for \Shirt Size” and \Shirt Size Fine”. Which is higher?

3. A function f is called concave if for x; y, and any 0 1,

f( x + (1 )y) f(x) + (1 )f(y):

(1)

The logarithm function is concave. You can assume that as a fact for all other parts of this assignment. For this part, you have to show that Equation (1) holds for = 1=2, and f is the logarithm function.

4 The following inequality is called as the log-sum inequality. For positive x1; x2; y1; y2,

x1 log

x1

+ x2 log

x2

(x1 + x2) log

x1

+ x2

:

(2)

y1

y2

y1

+ y2

Prove this using the concavity of logarithms.

1

Gender

Car Type

Shirt Size

Shirt Size Fine

Label

M

Family

Small

Small

+

M

Sports

Medium

Small-Medium

+

M

Family

Large

Large

F

Sports

Small

Small

+

M

Sports

Extra Large

Extra Large

+

F

Luxury

Small

Small

M

Family

Medium

Large-Medium

M

Sports

Extra Large

Extra Large

+

M

Sports

Large

Large

+

F

Luxury

Medium

Large-Medium

F

Sports

Medium

Large-Medium

+

F

Family

Small

Small

+

F

Luxury

Large

Large

M

Luxury

Medium

Small Medium

F

Family

Medium

Small-Medium

+

Table 1: Training Data

5 We will show that part 2 of this problem can be generalized as follows. Consider a training set of any size with the four features as in Table 1, again with the property that \Shirt Size Fine” is a re nement of the attribute \Shirt Size”. Show that the information gain for \Shirt Size Fine” is always at least that for \Shirt Size”. This is a heuristic justi cation for the fact that IG picks attributes that have more possibilities.

(hint: Suppose nm are the number of medium’s, and nml and nms are the number of small-medium, and large medium respectively. then nml + nms = nm. You may also want to de ne terms such at n+m which are the number of medium’s with +ve labels). You may have to use Equation (2) carefully!

Problem 3. (10 points). Consider the training set given in Table 2. There are nine examples, each with three features. Feature 1 and Feature 2 are binary, and Feature 3 is continuous.

  1. For Feature 1 and Feature 2, compute the information gain with respect to the examples.

  1. To use a feature that takes continuous values, we x a threshold and categorize the continuous feature depending on whether it is greater than the threshold or not. For example, in the given example, if the threshold is xed at 4.5, we convert Feature 3 into a categorical feature, fS; G; G; S; G; S; G; G; Gg where S; G imply that the value is smaller than and greater than the threshold respectively.

For Feature 3, compute the information gain with respect to the threshold values 2.5, 3.5, 4.5, 5.5, 6.5, and 7.5. Which threshold has the highest information gain?

  1. Which feature will be chosen as the root node, if we use the threshold value with the highest information gain for the third feature?

  1. Construct any decision tree that gives correct answers for all the training examples.

2

Feature 1

Feature 2

Feature 3

Label

T

T

1.0

+

T

T

6.0

+

T

F

5.0

F

F

4.0

+

F

T

7.0

F

T

3.0

F

F

8.0

T

F

7.0

+

F

T

5.0

Table 2: Training Data

Problem 4. Decision Trees (30 points). See attached jupyter notebook for details.

3

Assignment-One Solution
$30.00 $24.00