P4-1. Hierarchical Clustering Dendrogram Solution

$24.99 $18.99

Randomly generate the following data points: import numpy as np np.random.seed(0) X1 = np.random.randn(50,2)+[2,2] X2 = np.random.randn(50,2)+[6,10] X3 = np.random.randn(50,2)+[10,2] X = np.concatenate((X1,X2,X3)) Use sklearn.cluster.AgglomerativeClustering to cluster the points generated in (a). Plot your Dendrogram using different linkage{“ward”, “complete”, “average”, “single”}. Instructions: Set distance_threshold=0, n_clusters=None in AgglomerativeClustering. The default metric used to compute the linkage…

5/5 – (2 votes)

You’ll get a: zip file solution

 

Categorys:

Description

5/5 – (2 votes)
  1. Randomly generate the following data points: import numpy as np np.random.seed(0)

X1 = np.random.randn(50,2)+[2,2]

X2 = np.random.randn(50,2)+[6,10]

X3 = np.random.randn(50,2)+[10,2] X = np.concatenate((X1,X2,X3))

  1. Use sklearn.cluster.AgglomerativeClustering to cluster the points generated in (a). Plot your Dendrogram using different linkage{“ward”, “complete”, “average”, “single”}.

Instructions: Set distance_threshold=0, n_clusters=None in AgglomerativeClustering. The default metric used to compute the linkage is ‘euclidean’, so you do not need to change this parameter.

P4-2. Clustering structured dataset

(a) Generate a swiss roll dataset:

from sklearn.datasets import make_swiss_roll

  • Generate data (swiss roll dataset) n_samples = 1500

noise = 0.05

X, _ = make_swiss_roll(n_samples, noise=noise)

  • Make it thinner

X[:, 1] *= .5

  1. Use sklearn.cluster.AgglomerativeClustering to cluster the points generated in (a), where you set the parameters as n_clusters=6, connectivity=connectivity, linkage=’ward’, where

from sklearn.neighbors import kneighbors_graph

connectivity = kneighbors_graph(X, n_neighbors=10, include_self=False)

Plot the clustered data in a 3D figure and use different colors for different clusters in your figure.

  1. Use sklearn.cluster.DBSCAN to cluster the points generated in (a). Plot the clustered data in a 3D figure and use different colors different clusters in your figure. Discuss and compare the results of DBSCAN with the results in (b).

P4-3. Clustering the handwritten digits data

Use the hand-written digits dataset embedded in scikit-learn:

from sklearn import datasets

digits = datasets.load_digits()

  1. Use the following methods to cluster the data:

    • K-Means (sklearn.cluster.KMeans)

    • DBSCAN (sklearn.cluster.DBSCAN) Optimize the parameters of these methods.

  1. Evaluate these methods based on the labels of the data and discuss which method gives you the best results in terms of accuracy.

P4-1. Hierarchical Clustering Dendrogram Solution
$24.99 $18.99