Name: Intro to Data Mining HW #2 Solved
SKU: 25781
Price: 24.99 USD
Availability: InStock

Description

5/5 – (2 votes)

1. Consider the following dataset where the decision attribute is restaurant:

mealPreference	gender	drinkPreference	restaurant
hamburger	M	coke	mcdonalds
fish	M	pepsi	burgerKing
chicken	F	coke	mcdonalds
hamburger	M	coke	mcdonalds
chicken	M	pepsi	wendys
fish	F	coke	burgerKing
chicken	M	pepsi	burgerKing
chicken	F	coke	wendys
hamburger	F	coke	mcdonalds

Use the 1-rule (1R) method to find the best single attribute to determine restaurant. In order to demonstrate that you actually know how this method works (and aren’t just guessing at which attribute is best), you must fill in ALL of the blank values in the table below; otherwise, you will not receive any credit for this problem. If there is a tie for most frequent value for restaurant, choose whichever of the tied attributes you want. (10 pts.)

Attribute	Attribute Value	# Rows with Attribute Value	Most Frequent Value for restaurant	Errors	Total Errors
mealPreference	hamburger	3
	fish	2
	chicken	4

gender	M	5
	F	4

drinkPreference	pepsi	3
	coke	6

Based on these calculations, list the rules that would be generated by the 1R method for determining restaurant.

2. Create the dataset given in problem 1. as an arff or csv file, and run DecisionStump on it in Weka. List the classification rules that are produced (you can just include a screenshot of your Weka output) AND draw a tree that corresponds to the rules. (1 pt.)

3. Statistical modeling can be used to compute the probability of occurrence of an attribute value. Based on the data given in the table below, if we have a new instance where ageGroup = youngAdult, gender = M, and bookPreference = nonFiction, what is the likelihood that musicPreference = country? Just set up the equation to compute this with the appropriate values; you don’t have to actually calculate the final answer. (1 pt.)

4. Create the dataset given in problem 1. as an arff or csv file, and run Id3 on it in Weka. Show the decision tree output that is produced by Weka AND draw the tree by hand. (1 pt.)

Note: Id3 may not be installed with the initial download of Weka 3.8, in which case you will need to install the package named simpleEducationalLearningSchemes.

5. Consider the following dataset where the decision attribute is musicPreference:

ageGroup	gender	bookPreference	musicPreference
youngAdult	M	sciFiction	rock
senior	M	mystery	classical
middleAge	F	mystery	rock
youngAdult	M	nonFiction	country
middleAge	M	sciFiction	rock
senior	F	nonFiction	classical
middleAge	F	mystery	country
youngAdult	F	mystery	country

If we want to make a decision tree for determining musicPreference, we must decide which of the three attributes (ageGroup, gender, or bookPreference) to use as the root of the tree.

1. Set up the equation to compute what in lecture we called entropyBeforeSplit for musicPreference. You do not have to actually solve (i.e., calculate the terms in) the equation, just set up the equation. (1.5 pts.)

1. Set up the equation to compute entropy for bookPreference when its value is mystery. That is, a tree with bookPreference at the root would have three branches (one for sciFiction, one for mystery, and one for nonFiction), requiring us to compute entropySciFiction, entropyMystery, and entropyNonFiction; here we only want you to set up the equation to compute entropyMystery. You do not have to actually solve (i.e., calculate the terms in) the equation, just set it up. (1.5 pts.)

1. Suppose that instead of considering bookPreference to be the root of a decision tree for musicPreference, we had instead considered gender. Set up the equation to compute information gain for gender given the variables specified below. (1.5 pts.)

entropy before any split: X

entropy for gender = M Y

entropy for gender = F Z

6. Consider the following dataset where the decision attribute is play:

outlook	temperature	humidity	windy	play
good	warm	high	FALSE	no
good	warm	high	TRUE	no
bad	warm	high	FALSE	no
bad	cool	normal	FALSE	yes
bad	cool	normal	TRUE	yes
good	cool	normal	FALSE	yes
good	warm	normal	TRUE	yes
bad	cool	high	TRUE	yes
bad	warm	normal	FALSE	yes
good	cool	high	TRUE	no

Do ONLY the necessary calculations to determine what the ROOT NODE would be for a CART decision tree. YOU MUST SHOW YOUR WORK!!! (6.5 pts.)

Note: If there’s a tie for which attribute you’d pick to be the root of the tree, just list those attributes and say that we could pick from them.

Write a Python program that runs the CART algorithm on this dataset. Include both your source code and a screenshot showing the resulting tree. The dataset (hw2_prob6.csv) is posted on Canvas along with this assignment. (3.5 pts.)

Run SimpleCart in Weka on this dataset specifying the options minNumObj = 1 and usePrune = False. Show a screenshot of the CART decision tree that it produces. (0.5 pt.)

Note: SimpleCART may not be installed with the initial download of Weka 3.8, in which case you will need to install the package.

Intro to Data Mining HW #2 Solved

Share this:

Share this:

Description

Share this:

Related products

Homework 5: Heap ADT using STL Solution

Exercise 5: Regularized Linear Regression and Bias v.s. Variance Solution

Homework 7 Solution

LAB 05 QUESTIONS SOLUTION

Problem Set 4 Solution