Name: Homework 4 (Week 5)
SKU: 23785
Price: 24.99 USD
Availability: InStock

Description

5/5 – (2 votes)

Consider the email spam classification problem of Murphy Problem 8.1. Suppose you intend to use a linear perceptron classifier on that data (not logistic regression as directed in Problem 8.1). In the parts below, unless stated otherwise, assume the

dataset of		N = 4601 samples is split into				N_Tr = 3000 for training and			N_Test = 1601
for testing.		Also, for the tolerance δ in the VC generalization bound, use 0.1 (for a
certainty of 0.9). The parts below have short answers.
Hint: You may use the relation that if						H is a linear perceptron classifier in D
dimensions (D features),				d_VC (H ) = D + 1 . ( This will be proved in Problem 2.)
a)	What is the VC dimension of the hypothesis set?
b)	Expressing		the	upper	bound	on	the	out-of-sample	error	as
	^Eout (^hg ) ^≤ ^Ein (^hg )⁺ ^εvc
	^For^Ein (^hg )		measured on the training data, use d_vc					from part (a) to get a value
	for ε_vc .

To get a lower ε_vc , suppose you reduce the number of features to D = 10 , and also increase the training set size to 10,000. Now what is ε_vc ?

Suppose that you had control over the number of training samples N_Tr (by

collecting more email data). How many training samples would ensure a generalization error of ε_vc = 0.1 again with probability 0.9 (the same tolerance

- = 0.1), and using the reduced feature set (10 features)?

Instead suppose you use the test set to measure E_in (h_g ) , so let’s call it E_test (h_g ) . What is the hypothesis set now? What is its cardinality?

Continuing from part (e), use the bound:

^Eout (^hg ) ^≤ ^Etest (^hg )⁺ ^ε

Use the original feature set and the original test set, so that N_Test = 1601 . Give an appropriate expression for ε and calculate it numerically.

AML Exercise 2.4 (page 52). In addition to the hints given in the book, you can solve the problem by following the steps outlined below.

Homework 4 (Week 5)

Share this:

Share this:

Description

Share this:

Related products

Develop a multithreaded app that can find the integer in the range– Assignment 2 Solution

Homework 5: Heap ADT using STL Solution

Programming Project 3 GritVM Interpreter Solution

Assignment-(H) Solution

Problem Set 4 Solution