Description
1 Classi cation with Linear Regression
Consider the following 1-dimensional input x = [ 2:0; 1:0; 0:5; 0:6; 5:0; 7:0] with corre-sponding binary class labels y = [0; 0; 1; 0; 1; 1]. Use (least-squares) linear regression, as shown in the lecture, to train on these samples and classify them. Your model should include an intercept term.
-
Provide the coe cients of the linear regression (on x and y) and explain shortly how you computed them.
-
Classify each of the 6 samples with your linear regression model. Explain how you map the continuous output of the linear model to a class label.
-
Discuss in your own words, why linear regression is not suitable for classi cation.
2 Log-likelihood gradient and Hessian
Consider a binary classi cation problem with data D = f(xi; yi)gni=1, xi 2 Rd and yi 2 f0; 1g. We de ne
f(x) = (x)> ; p(x) = (f(x)) ; (z) = 1=(1 + e z)
-
Lnll( ) =
n
i=1 hyi log p(xi) + (1 yi) log[1 p(xi)]i
X
where 2 Rd is a vector. (Note: p(x) is a short-hand for p(y = 1jx).)
1.
Compute the derivative
@
L( ). Tip: Use the fact that
@
(z) = (z)(1 (z)).
@z
@
2.
Compute the 2nd derivative
@2
L( ).
2
@
3 Discriminative Function in Logistic Regression
Logistic Regression de nes class probabilities as proportional to the exponential of a discriminative function:
exp f(x; y)
P (yjx) =
Py0 exp f(x; y0)
Prove that, in the binary classi cation case, you can assume f(x; 0) = 0 without loss of
generality.
This results in
exp f(x; 1)
P (y = 1jx) = 1 + exp f(x; 1) = (f(x; 1)):
(Hint: First assume f(x; y) = (x; y)> , and then de ne a new discriminative function f0 as a function of the old one, such that f0(x; 0) = 0 and for which P (yjx) maintains the same expressibility.)
2