Description
-
In this lab, you are going to learn how to classify data points using kNN classifier. Iris data set is given which consists of 3 classes and 150 data points.
-
Load libraries import pandas
from sklearn.model_selection import train_test_split from sklearn.metrics import classification_report from sklearn.metrics import confusion_matrix from sklearn.metrics import accuracy_score
from sklearn.neighbors import KNeighborsClassifier
-
-
Load data set using pandas library
-
names = [‘sepal-length’, ‘sepal-width’, ‘petal-length’, ‘petal-width’, ‘class’] dataset = pandas.read_csv(“iris.data”, names=names)
-
-
Print the size of data set e.g. size should be [150,5] (4 Features and 1 class). Use dataset.shape to print
-
-
-
Display the class distribution
-
Use dataset.groupby(‘class’).size()
-
-
Now, divide your data using hold out approach (80% for training and 20% for testing)
-
# train / test dataset
array = dataset.values
X = array[:,0:4]
Y = array[:,4]
t_size = 0.20
seed = 7
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=t_size, random_state=seed)
-
Apply knn classifier. See the documentation below. You need to import necessary classes
http://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsClassifier.html
knn = KNeighborsClassifier()
knn.fit(X_train, Y_train)
predictions = knn.predict(X_test)
print(accuracy_score(Y_test, predictions))
print(confusion_matrix(Y_test, predictions))
print(classification_report(Y_test, predictions))
-
Repeat (e) by changing the value of k (k=1, 2, 3,…., 10). Print only accuracy
-
-
Repeat (e) by changing the value of seed (seed = 1, 2, 3, …. , 10). Print only accuracy
-
-
Repeat (I) using Occupancy Detection dataset. Ignore Date Attribute. Off course, steps (d) and (g) are not applicable since training / test data is given. http://archive.ics.uci.edu/ml/datasets/Occupancy+Detection+
-
Now instead of using build in library, write your own code for kNN classifier in any language and repeat I and II. You must use the following chi squared distance function