Description
Problems #1 (20 points)
Using an “Addiction” dataset, a researcher has prepared the following table of patient counts:
-
Ethnicity
Age Category
Alcohol
Cocaine
Heroin
Row Total
Black
Old
30
48
17
95
Young
25
72
13
110
Hispanic
Old
7
0
5
12
Young
8
7
19
34
White
Old
60
2
17
79
Young
26
10
34
70
Column Total
156
139
105
400
Use the table above and Excel to classify patient addiction type (Alcohol, Cocaine, Heroin) using Ethnicity and Age Category:
- Construct a C4.5 decision tree (two levels deep only).
Problem #2 (15 points)
Use R/Python and the NYNJ_zipcode_population.csv file to cluster the zip codes of New Jersey and New York into 3 clusters. Use “Low_income” percent and “Total_Pop” as clustering attributes. (Algorithm=hierarchical, linkage=simple).
- What are the members of each cluster?
- Is there an unusual zip code? If so, what is the zip code?
Problem #3 (10 points)
Use R/Python and the NYNJ_zipcode_population.csv file to cluster the zip codes of New Jersey and New York into 5 clusters. Use “Low_income” percent and “Total_Pop” as clustering attributes (Algorithm=kmeans).
- What are the members of each cluster?
Problem #4 (20 points)
Use an Artificial Neural Network, with 10 nodes in the hidden layer, learning rate of .001, and the standard scaler (StandardScaler) to develop a classification model for the heart_attack .CSV dataset using 30% of the data as training data. (See the data dictionary below).
- What is the model accuracy ?
Use Random Forest to develop a classification model for the heart_attack .CSV dataset using 30% of the data as training data. (See the data dictionary below).
- What is the most important feature? Why?
- Show your model metrics.
Problem #6 (15 points)
Using data in the table below, construct a Neural Network with one Output Layer (z) and one Hidden Layer (A and B).
- Calculate the predicted outcome if the inputs to the input nodes are (x=1, Node 1=.4, Node 2=.7 Node 3= .7 and Node 4=.2).
- Adjust the weight if the actual output is 0.8500
From |
To |
Weight |
X |
A |
0.5 |
Node 1 |
A |
0.6 |
Node 2 |
A |
0.8 |
Node 3 |
A |
0.6 |
Node 4 |
A |
0.2 |
x |
B |
0.7 |
Node 1 |
B |
0.9 |
Node 2 |
B |
0.8 |
Node 3 |
B |
0.4 |
Node 4 |
B |
0.2 |
xx |
z |
0.5 |
A |
z |
0.9 |
B |
z |
0.9 |
Data Dictionary
Attribute |
Description |
Target/Attribute |
Heart_Attack |
Type of Heart Attack (Mild, Light, Massive) |
Target |
RestHR |
Resting Heart Rate |
Attribute |
MaxHR |
Maximum Heart Rate |
Attribute |
RecHR |
Recovery Heart Rate |
Attribute |
BP |
Blood Pressure |
Attribute |
Data dependencies: NYNJ_zipcode_population.csv, Heart_attack.CSV