Clustering

$24.99 $18.99

1. Write a function to perform k-Means clustering of a given dataset. The function should take the following arguments: ​(30 marks) the dataset for clustering the number of clusters, ​k the initial centroids (optional) If the initial centroids are not provided, ​k random points​are chosen as initial centroids. The function should return: the final cluster…

5/5 – (2 votes)

You’ll get a: zip file solution

 

Categorys:

Description

5/5 – (2 votes)

1. Write a function to perform k-Means clustering of a given dataset. The function should take the

following arguments: (30 marks)

  1. the dataset for clustering

  1. the number of clusters,k

  1. the initial centroids (optional)

If the initial centroids are not provided,k random pointsare chosen as initial centroids.

The function should return:

  1. the final cluster centroids

  1. cluster label associated with each datapoint

  1. sum of squared errors

Thesum of squared errors(SSE) is computed as follows, where k is the number of clusters and is the centroid of the jthcluster:

j

  1. Generate a dataset by sampling20pointseachfrom uniform([-1,1]) and uniform([-0.5,1.5]).

(10+10 marks)

    1. Run k-means with the following initial centroids.

initial

initial

i.

= -0.1 and

​ ​

= 0.1

1

2

initial

initial

ii.

= 0 and

​ ​

= 3.5

1

2

After each iteration, generate a scatterplot of the dataset such that points belonging to the same cluster are given the same colour. Use different colours for different clusters. Display the cluster centroids with*(asterisk symbol)

    1. Now add a random point generated from uniform([3,4]) to the dataset. Perform k-means clustering with k=2 for different sets of initial centroids. What do you observe? Are the clusters always found correctly?

  1. There are three groups of people, say, ​Kids, Adults and Aliens​. Each person has two features:

height and weight, i.e., the data point x

i

is represented as (x (1), x (2)) where x (1) represents

i

i

i

height and x(2) represents weight. The features for each group are distributed as follows:

i

Group

Height

Weight

No: of samples

Kids

Normal(5,1.1)

Normal(60,7)

100

Adults

Normal(3,1)

Normal(30,5)

100

Aliens

Normal(7,1)

Normal(40,2)

50

Run k-means on this dataset with different sets of initial centroids. Display the clusters after each

iteration, as mentioned in the previous question. (10+10+5+10 marks)

  1. Generate a plot of the sum of squared errors (SSE) against iteration number. Against each iteration number (x-axis), plot the SSE obtained (y-axis) in that iteration

  1. Are you able to obtain distinct sets of final clusters when starting with different initial centroids? If yes, show at least two of such clusterings. In each case, show the initial cluster centroids in the scatterplot using a + (plus symbol).

  1. If you were to select one clustering result from among the different clusterings obtained, how would you make a choice?

  1. How can you modify your k-means algorithm such that for a given value of k, different results are not obtained on successive runs over the same dataset?

4. Plot the data in “Dataset.csv”. Visually identify the clusters. Let k be the number of clusters

identified. (5+10 marks)

  1. Run k-means on this dataset with the value of k identified above. Do you get the expected clusters? Why or why not?

  1. Now run k-means for k = 2 to 10 on this dataset and for each value of k (x-axis), plot the best SSE obtained for that value ofk (y-axis). How will you select a good value of k from this plot?

Clustering
$24.99 $18.99