Name: Homework 4 part 1 Solution
SKU: 3237
Price: 30.00 USD
Availability: InStock

Description

5/5 – (2 votes)

1. Consider the training data shown in the table below:

Data point	^x1	^x2	class
p1	0.3	0.2	+
p2	0.2	0.45	+
p3	0.5	0.2	+
p4	0.1	0.1	+
p5	0.4	0.1	+
p6	0.25	0.8	–
p7	0.3	0.5	–
p8	0.4	0.8	–
p9	0.15	0.7	–
p10	0.3	0.7	–

Consider a test instance, (x₁ = 0:2; x₂ = 0:55).

Compute the Euclidean distance between the test instance to all the training instances.

Based on your answer in part (a), classify the test instance using the 1-nearest neighbor approach.

Based on your answer in part (a), classify the test instance using the 5-nearest neighbor approach.

Which classi cation result (part (b) or (c)) do you think is more reliable for the given test point? Explain your reason.

Consider the following logistic regression model constructed from the training set shown in the table above:

P (^y = 1jx₁; x₂)

^log _P_(^_y₌ ₁_jx₁ _{; x}₂₎ ⁼^w2^x2 ⁺ ^w1^x1 ⁺ ^w0

where y^ is the predicted class, w₂ = 122:1774, w₁ = 73:36, and w₀ = 73:3023. Apply the logistic regression model on the given test instance to predict its class label. Show your computations clearly.

Consider the following set of one-dimensional points: f0.1, 0.2, 0.8, 0.9, 1.0,1.3, 1.8, 1.9g.
1. Suppose we apply kmeans clustering to obtain three clusters, A, B, and C. If the initial centroids of the three clusters are located at f0.1,

0.2, 1.9g, respectively, show the cluster assignments and locations of the centroids after the rst three iterations by lling out the following table.

	Cluster assignment of data points (enter A, B, or C)								Centroid Locations
Iter	0.10	0.20	0.80	0.90	1.00	1.30	1.80	1.90	A	B	C
0	–	–	–	–	–	–	–	–	0.10	0.20	1.90
1
2
3

Compute the sum-of-squared errors (SSE) for the clustering solution in part (a).

Repeat part (a) using f0.8, 1.0, 1.8g as the initial centroids. Show the cluster assignments and locations of the centroids after the rst four iterations by lling out the following table.

	Cluster assignment of data points (enter A, B, or C)								Centroid Locations
Iter	0.10	0.20	0.80	0.90	1.00	1.30	1.80	1.90	A	B	C
0	–	–	–	–	–	–	–	–	0.80	1.00	1.80
1
2
3
4

1. Compute the sum-of-squared errors (SSE) for the clustering solution in part (c). Which solution is better in terms of their SSE?

Consider the transaction database shown in the table below.

Table 1: Transaction database=.

Transaction ID	Items Purchased

1	Bread, Co ee, Sugar
2	Bread, Eggs, Milk
3	Bread, Butter, Milk
4	Co ee, Milk
5	Bread, Butter, Eggs, Cookies
6	Milk, Sugar
7	Bread, Butter, Eggs, Milk, Sugar
8	Bread, Butter, Milk, Cookies
9	Bread, Butter, Eggs, Milk
10	Butter, Co ee, Milk

Assuming the minimum support threshold for frequent itemsets is 40%, list all the frequent 2-itemsets of the data along with their support values.

Based on your answer in part (a), generate all the candidate 3-itemsets using the candidate generation approach described in the

lecture. You may assume items in an itemset are ordered in increas-ing alphabetical order.

Assuming the minimum support threshold for frequent itemsets is 40%, which of the candidate 3-itemsets in part (b) are frequent?

Extract all the candidate rules from the frequent itemsets found in part (c).

Based on your answer in part (e), nd all the rules whose con dence is more than 70% and support is at least 40%. For this question, you need to focus only on the rules that can be extracted from the frequent 3-itemsets found in part (e). Note: you do not have to use the Apriori implementation to extract the rules.

Homework 4 part 1 Solution

Share this:

Share this:

Description

Share this:

Related products

Lab 2 File Management System Calls Solution

Assignment-2 Solution

Assignment-3 Solution

ASSIGNMENT-04 Solution

ASSIGNMENT-02 Solution