Description
In this practical you are given data file “New_York_Neighborhoods.xlsx”. In the programming language of your choice read the file and get the numerical values of the features. The file contains 50 observations of 12 variables/features which are used to define neighbourhood of some suburbs in New York. Carry out the following data analysis and visualizations.
-
Compute, display and interpret the Pearson correlation matrix for the data.
-
Carry out PCA for the data. Python users may use sklearn implementation of PCA. Use n_components to be same as the number of variables in the data.
-
Visualize the percentage variance explained by each principal component.
-
Scatter plot each individuals/samples on the x & y axis as PC1 and PC2.
-
-
-
Graph the variables as unit vector using their projection values on PC1 and PC2.
-
-
-
Biplot both individuals/observations and the variables. Biplot is a combine scatter plots of the samples (b) and variable vectors (c).
-
-
Introduce following two outliers in the data.
-
70
70
700
80
83
71
600
70
65
900
45
800
77
600
72
82
800
73
65
900
62
75
-500
80
Examine how the biplot changes. You can also experiment with scaling the variables and not scaling the variables.
You may use Python/R for this exercise
1