Intro to Data Mining HW #2 Solved

$24.99 $18.99

1. Consider the following dataset where the decision attribute is restaurant: mealPreference gender drinkPreference restaurant hamburger M coke mcdonalds fish M pepsi burgerKing chicken F coke mcdonalds hamburger M coke mcdonalds chicken M pepsi wendys fish F coke burgerKing chicken M pepsi burgerKing chicken F coke wendys hamburger F coke mcdonalds Use the 1-rule (1R)…

5/5 – (2 votes)

You’ll get a: zip file solution

 

Description

5/5 – (2 votes)

1. Consider the following dataset where the decision attribute is restaurant:

mealPreference

gender

drinkPreference

restaurant

hamburger

M

coke

mcdonalds

fish

M

pepsi

burgerKing

chicken

F

coke

mcdonalds

hamburger

M

coke

mcdonalds

chicken

M

pepsi

wendys

fish

F

coke

burgerKing

chicken

M

pepsi

burgerKing

chicken

F

coke

wendys

hamburger

F

coke

mcdonalds

Use the 1-rule (1R) method to find the best single attribute to determine restaurant. In order to demonstrate that you actually know how this method works (and aren’t just guessing at which attribute is best), you must fill in ALL of the blank values in the table below; otherwise, you will not receive any credit for this problem. If there is a tie for most frequent value for restaurant, choose whichever of the tied attributes you want. (10 pts.)

Attribute

Attribute Value

# Rows with Attribute Value

Most Frequent Value for restaurant

Errors

Total Errors

mealPreference

hamburger

3

fish

2

chicken

4

gender

M

5

F

4

drinkPreference

pepsi

3

coke

6

Based on these calculations, list the rules that would be generated by the 1R method for determining restaurant.

2. Create the dataset given in problem 1. as an arff or csv file, and run DecisionStump on it in Weka. List the classification rules that are produced (you can just include a screenshot of your Weka output) AND draw a tree that corresponds to the rules. (1 pt.)

3. Statistical modeling can be used to compute the probability of occurrence of an attribute value. Based on the data given in the table below, if we have a new instance where ageGroup = youngAdult, gender = M, and bookPreference = nonFiction, what is the likelihood that musicPreference = country? Just set up the equation to compute this with the appropriate values; you don’t have to actually calculate the final answer. (1 pt.)

4. Create the dataset given in problem 1. as an arff or csv file, and run Id3 on it in Weka. Show the decision tree output that is produced by Weka AND draw the tree by hand. (1 pt.)

Note: Id3 may not be installed with the initial download of Weka 3.8, in which case you will need to install the package named simpleEducationalLearningSchemes.

5. Consider the following dataset where the decision attribute is musicPreference:

ageGroup

gender

bookPreference

musicPreference

youngAdult

M

sciFiction

rock

senior

M

mystery

classical

middleAge

F

mystery

rock

youngAdult

M

nonFiction

country

middleAge

M

sciFiction

rock

senior

F

nonFiction

classical

middleAge

F

mystery

country

youngAdult

F

mystery

country

If we want to make a decision tree for determining musicPreference, we must decide which of the three attributes (ageGroup, gender, or bookPreference) to use as the root of the tree.

    1. Set up the equation to compute what in lecture we called entropyBeforeSplit for musicPreference. You do not have to actually solve (i.e., calculate the terms in) the equation, just set up the equation. (1.5 pts.)

    1. Set up the equation to compute entropy for bookPreference when its value is mystery. That is, a tree with bookPreference at the root would have three branches (one for sciFiction, one for mystery, and one for nonFiction), requiring us to compute entropySciFiction, entropyMystery, and entropyNonFiction; here we only want you to set up the equation to compute entropyMystery. You do not have to actually solve (i.e., calculate the terms in) the equation, just set it up. (1.5 pts.)

    1. Suppose that instead of considering bookPreference to be the root of a decision tree for musicPreference, we had instead considered gender. Set up the equation to compute information gain for gender given the variables specified below. (1.5 pts.)

entropy before any split: X

entropy for gender = M Y

entropy for gender = F Z

6. Consider the following dataset where the decision attribute is play:

outlook

temperature

humidity

windy

play

good

warm

high

FALSE

no

good

warm

high

TRUE

no

bad

warm

high

FALSE

no

bad

cool

normal

FALSE

yes

bad

cool

normal

TRUE

yes

good

cool

normal

FALSE

yes

good

warm

normal

TRUE

yes

bad

cool

high

TRUE

yes

bad

warm

normal

FALSE

yes

good

cool

high

TRUE

no

  1. Do ONLY the necessary calculations to determine what the ROOT NODE would be for a CART decision tree. YOU MUST SHOW YOUR WORK!!! (6.5 pts.)

Note: If there’s a tie for which attribute you’d pick to be the root of the tree, just list those attributes and say that we could pick from them.

  1. Write a Python program that runs the CART algorithm on this dataset. Include both your source code and a screenshot showing the resulting tree. The dataset (hw2_prob6.csv) is posted on Canvas along with this assignment. (3.5 pts.)

  1. Run SimpleCart in Weka on this dataset specifying the options minNumObj = 1 and usePrune = False. Show a screenshot of the CART decision tree that it produces. (0.5 pt.)

Note: SimpleCART may not be installed with the initial download of Weka 3.8, in which case you will need to install the package.

Intro to Data Mining HW #2 Solved
$24.99 $18.99