Description
-
The goal of this problem is to design a classi er that will predict if a person, represented by measurements of their face, is happy or angry. A key to any classi cation task is to use good features that discriminate between the two categories.
Features extracted from m = 128 face images (like the two shown above) provided in the m-by-n matrix X in the le face_emotion_data.mat. This le also includes the m 1 vector of labels y. Here happy faces are labeled +1 and angry faces are labeled
-
Your task is to nd the weights for a linear classi er that will use the features to predict whether the emotion displayed on a face image is happy or angry.
-
De ne a feature vector xiT
=
x1i
x2ix9i
T
and classi er weights w =
w1 w2
w9
T
so that the label, yi xi
w.
-
Use the training data X and y and a least squares problem to train your classi er weights.
-
Explain how to use the weights you found to classify a new face image as happy or angry?
-
Which features seem to be most important? Justify your answer. Note that the nine columns of the training data feature matrix X have been normalized to have the same two-norm.
-
Design a classi er based on three of the nine features. Which three should you choose? Describe the procedure for designing your classi er.
-
What percent of the training labels are incorrectly classi ed using all nine fea-tures? What percent of the training labels are incorrectly classi ed using your reduced set of three features?
-
Now use cross validation to assess your classi er performance. Divide the available
data in to eight subsets of sixteen samples (e.g., examples 1 16; 17 32; : : : ; 113 128). Use seven sets to design your classi er weights, then use the remaining hold-out set to evaluate the classi er performance. Compute the number of mis-classi cations made on this hold-out set and divide that number by 16 (the size of the set) to estimate the error rate for that hold-out set. Repeat this process eight times using the eight di erent possible divisions between training and hold-out sets and average the error rates to obtain a nal performance estimate.
2 of 2