Description
Problem 1 – (20 points)
The “AL_NJ_Income_pct” CSV dataset on CANVAS categorizes the tax returns of families in the states of Alabama and New Jersey into six categories (Returns_pct1 to Returns_pct6). Use these six categories and Euclidian distance, to perform the following analysis
- Use the kmeans clustering method to create two clusters for the “AL_NJ_Income_pct” dataset.
- Show the cross tabulation of the clusters versus the State feature.
- Use the hierarchical clustering method and single linkage to create 4 clusters for the the “AL_NJ_Income_pct” dataset.
- Identify the outliers (if any).
Problem 2 – (20 points)
Use the Random Forest methodology to develop a classification model for the “State” (target), using the Returns_pct1 to Returns_pct6 features in the “AL_NJ_Income_pct dataset.
- Show the cross tabulation of the classification.
- What is the accuracy of your model?
- What is the precision of the model?
- What is the recall of the model?
- What is the F1 of the model?
Problem 3 – (20 points)
Use the C5.0 Forest methodology to develop a classification model for the “State” (target), using the Returns_pct1 to Returns_pct6 features in the “AL_NJ_Income_pct dataset.
- Show the cross tabulation of the classification.
- What is the accuracy of your model?
- What is the precision of the model?
- What is the recall of the model?
- What is the F1 of the model?
Problem # 4: (20 points)
Use theCART methodology to develop a classification model for the “State” (target), using the Returns_pct1 to Returns_pct6 features in the “AL_NJ_Income_pct dataset.
- Show the cross tabulation of the classification.
- What is the accuracy of your model?
- What is the precision of the model?
- What is the recall of the model?
- What is the F1 of the model?
Problem # 5: (20 points)
Using data in the table below, construct a Neural Network with one Output Layer (z) and one Hidden Layer (two nodes A and B). Calculate the predicted outcome if the inputs to the input nodes are (Node 1=.4, Node 2=.7 Node 3= .7 and Node 4=.2)
Use the actual value of .75 and a learning factor of .1 to adjust the weight for xx to z.
From |
To |
Weight |
X |
A |
0.5 |
Node 1 |
A |
0.6 |
Node 2 |
A |
0.8 |
Node 3 |
A |
0.6 |
Node 4 |
A |
0.2 |
x |
B |
0.7 |
Node 1 |
B |
0.9 |
Node 2 |
B |
0.8 |
Node 3 |
B |
0.4 |
Node 4 |
B |
0.2 |
xx |
z |
0.5 |
A |
z |
0.9 |
B |
z |
0.9 |
Datasets: AL_NJ_Income_pct.csv