Name: Clustering
SKU: 21796
Price: 24.99 USD
Availability: InStock

Description

5/5 – (2 votes)

1. Write a function to perform k-Means clustering of a given dataset. The function should take the

following arguments: (30 marks)

the dataset for clustering

the number of clusters, k

the initial centroids (optional)

If the initial centroids are not provided, k random pointsare chosen as initial centroids.

The function should return:

the final cluster centroids

cluster label associated with each datapoint

sum of squared errors

The sum of squared errors(SSE) is computed as follows, where k is the number of clusters and is the centroid of the j^thcluster:

Generate a dataset by sampling 20points eachfrom uniform([-1,1]) and uniform([-0.5,1.5]).

(10+10 marks)

1. Run k-means with the following initial centroids.

		initial				initial
i.			= -0.1 and				= 0.1
	1				2
		initial			initial
ii.			= 0 and			= 3.5
	1			2

After each iteration, generate a scatterplot of the dataset such that points belonging to the same cluster are given the same colour. Use different colours for different clusters. Display the cluster centroids with * (asterisk symbol)

1. Now add a random point generated from uniform([3,4]) to the dataset. Perform k-means clustering with k=2 for different sets of initial centroids. What do you observe? Are the clusters always found correctly?

There are three groups of people, say, Kids, Adults and Aliens. Each person has two features:


height and weight, i.e., the data point x	i	is represented as (x (1), x (2)) where x (1) represents
	i	i	i	i

height and x(2) represents weight. The features for each group are distributed as follows:

Group	Height	Weight	No: of samples

Kids	Normal(5,1.1)	Normal(60,7)	100

Adults	Normal(3,1)	Normal(30,5)	100

Aliens	Normal(7,1)	Normal(40,2)	50

Run k-means on this dataset with different sets of initial centroids. Display the clusters after each

iteration, as mentioned in the previous question. (10+10+5+10 marks)

Generate a plot of the sum of squared errors (SSE) against iteration number. Against each iteration number (x-axis), plot the SSE obtained (y-axis) in that iteration

Are you able to obtain distinct sets of final clusters when starting with different initial centroids? If yes, show at least two of such clusterings. In each case, show the initial cluster centroids in the scatterplot using a + (plus symbol).

If you were to select one clustering result from among the different clusterings obtained, how would you make a choice?

How can you modify your k-means algorithm such that for a given value of k, different results are not obtained on successive runs over the same dataset?

4. Plot the data in “Dataset.csv”. Visually identify the clusters. Let k be the number of clusters

identified. (5+10 marks)

Run k-means on this dataset with the value of k identified above. Do you get the expected clusters? Why or why not?

Now run k-means for k = 2 to 10 on this dataset and for each value of k (x-axis), plot the best SSE obtained for that value of k (y-axis). How will you select a good value of k from this plot?

Clustering

Share this:

Share this:

Description

Share this:

Related products

Homework 1 Extracting Data from a CSV file Solution

Assignment-(H) Solution

Homework 10 Solution

Homework 6 Mountain Paths – Part II Solution

Assignment-7 Binary Search Trees II:Solution