Description
-
IRIS FLOWERS
In 1935, Edgar Anderson went to his favourite pasture and recorded the length and width of the sepals and petals on several flowers in the field. For whatever reason, this dataset became one of the oldest and most well-known “sanity-check” datasets around, being cited by countless papers. This class continues this time-honored tradition by using Iris Flowers to sanity-check your Python environment and plotting libraries.
-
Find and download the Iris Flowers dataset from the UC Irvine Machine Learning datasets archive at http://archive.ics.uci.edu/ml/datasets.html Hint: The iris.names file describes the
2
CS5785 Fall 2019: Homework 0 Page 3
structure of the dataset. How many features/attributes are there per sample? How many different species are there, and how many samples of each species did Anderson record?
-
Figure out how to parse the dataset you downloaded. Load the samples into an N £p array, where N is the number of samples and p is the number of attributes per sample. Aditionally, create a N -dimensional vector containing each sample’s label (species).
Hint: Python has a built-in CSV parser in the csv library, or you can use the “string”.split(…) method.
Hint 2: Here is some code that prints each line in a file:
for line in open (“/ path /to/ filename . txt” ) : print ” Line contains : “+ line
-
To visualize this dataset, we would have to build a p-dimensional scatterplot. Unfortunately, we only have 2D displays so we must reduce the dataset’s dimensionality. The easiest way to view the set is to plot two attributes of the data against one another and repeat for each pair of attributes.
Create every possible scatterplot from all pairs of two attributes. (For example, one scatterplot would graph petal length vs sepal width, another would graph petal length vs. sepal length, and so on). Within each scatterplot, the color of each dot should correspond with the sample species. Ideally, we’re looking for something like this figure from Wikipedia:
But your results do not have to be this ornate. Presenting six separate figures in your report is certainly fine. Be sure to include the source code for all plots!
Hint: This is one way to draw a scatterplot. Use whatever works for you.
-
from m a t p l o t l i b import pyplot as plt
import
numpy
xs
=
numpy . array ([1 ,
2 ,
3 ,
4 ,
5 ,
6 ,
7])
ys
=
numpy . array ([3 ,
2 ,
5 ,
1 ,
3 ,
3 ,
2])
colors = [ “r” ,“r” ,“r” ,“b” ,“b” ,“g” ,“g”] plt . scatter ( xs , ys , c = colors ) plt . savefig (” plot . png”)
3
CS5785 Fall 2019: Homework 0 Page 4
Hint: If you would like plots to appear right inside of your Jupyter Notebook, restart the kernel and evaluate the following before running anything else:
% m a t p l o t l i b inline
Good luck!
4