Homework 4 (Week 5)

$24.99 $18.99

Consider the email spam classification problem of Murphy Problem 8.1. Suppose you intend to use a linear perceptron classifier on that data (not logistic regression as directed in Problem 8.1). In the parts below, unless stated otherwise, assume the dataset of N = 4601 samples is split into NTr = 3000 for training and NTest…

5/5 – (2 votes)

You’ll get a: zip file solution

 

Description

5/5 – (2 votes)
  1. Consider the email spam classification problem of Murphy Problem 8.1. Suppose you intend to use a linear perceptron classifier on that data (not logistic regression as directed in Problem 8.1). In the parts below, unless stated otherwise, assume the

dataset of

N = 4601 samples is split into

NTr = 3000 for training and

NTest = 1601

for testing.

Also, for the tolerance δ in the VC generalization bound, use 0.1 (for a

certainty of 0.9). The parts below have short answers.

Hint: You may use the relation that if

H is a linear perceptron classifier in D

dimensions (D features),

dVC (H ) = D + 1 . ( This will be proved in Problem 2.)

a)

What is the VC dimension of the hypothesis set?

b)

Expressing

the

upper

bound

on

the

out-of-sample

error

as

Eout (hg ) Ein (hg )+ εvc

For Ein (hg )

measured on the training data, use dvc

from part (a) to get a value

for εvc .

  1. To get a lower εvc , suppose you reduce the number of features to D = 10 , and also increase the training set size to 10,000. Now what is εvc ?

  1. Suppose that you had control over the number of training samples NTr (by

collecting more email data). How many training samples would ensure a generalization error of εvc = 0.1 again with probability 0.9 (the same tolerance

    • = 0.1), and using the reduced feature set (10 features)?

  1. Instead suppose you use the test set to measure Ein (hg ) , so let’s call it Etest (hg ) . What is the hypothesis set now? What is its cardinality?

  1. Continuing from part (e), use the bound:

Eout (hg ) Etest (hg )+ ε

Use the original feature set and the original test set, so that NTest = 1601 . Give an appropriate expression for ε and calculate it numerically.

  1. AML Exercise 2.4 (page 52). In addition to the hints given in the book, you can solve the problem by following the steps outlined below.

Homework 4 (Week 5)
$24.99 $18.99