Homework 4 Solution

Description

5/5 – (2 votes)

All questions have multiple-choice answers ([a], [b], [c], …). You can collaborate with others, but do not discuss the selected or excluded choices in the answers. You can consult books and notes, but not other people’s solutions. Your solutions should be based on your own work. De nitions and notation follow the lectures.

Note about the homework

The goal of the homework is to facilitate a deeper understanding of the course material. The questions are not designed to be puzzles with catchy answers. They are meant to make you roll up your sleeves, face uncertainties, and ap-proach the problem from di erent angles.

The problems range from easy to di cult, and from practical to theoretical. Some problems require running a full experiment to arrive at the answer.

The answer may not be obvious or numerically close to one of the choices, but one (and only one) choice will be correct if you follow the instructions precisely in each problem. You are encouraged to explore the problem further by experimenting with variations on these instructions, for the learning bene t.

You are also encouraged to take part in the forum http://book.caltech.edu/bookforum

where there are many threads about each homework set. We hope that you will contribute to the discussion as well. Please follow the forum guidelines for posting answers (see the \BEFORE posting answers” announcement at the top there).

Generalization Error

In Problems 1-3, we look at generalization bounds numerically. For N > d_vc, use the simple approximate bound N^dvc for the growth function m_H(N).

For an H with d_vc = 10, if you want 95% con dence that your generalization error is at most 0.05, what is the closest numerical approximation of the sample size that the VC generalization bound predicts?

1. 400,000

1. 420,000

1. 440,000

1. 460,000

1. 480,000

There are a number of bounds on the generalization error , all holding with probability at least 1 . Fix d_vc = 50 and = 0:05 and plot these bounds as a function of N. Which bound is the smallest for very large N, say N = 10; 000? Note that [c] and [d] are implicit bounds in .

(2N)

[a]

Original VC bound:^q

[b]

Rademacher Penalty Bound:^q

2 ln(2Nm

(N))

+ ^q

[c]

Parrondo and Van den Broek:

(2 + ln

6m_H(2N)

)

(N )

[d]

Devroye:^q

(4 (1 + ) + ln

)

1. They are all equal.

For the same values of d_vc and of Problem 2, but for small N, say N = 5, which bound is the smallest?

(2N)

[a]

Original VC bound:^q

[b]

Rademacher Penalty Bound:^q

2 ln(2Nm

(N))

₊ q

[c]

Parrondo and Van den Broek:

(2 + ln

6m_H(2N)

)

(N )

[d]

Devroye:^q

(4 (1 + ) + ln

)

They are all equal.

Bias and Variance

Consider the case where the target function f : [ 1; 1] ! R is given by f(x) = sin( x) and the input probability distribution is uniform on [ 1; 1]. Assume that the training set has only two examples (picked independently), and that the learning algorithm produces the hypothesis that minimizes the mean squared error on the examples.

Assume the learning model consists of all hypotheses of the form h(x) = ax. What is the expected value, g(x), of the hypothesis produced by the learning algorithm (expected value with respect to the data set)? Express your g(x) as ax^, and round a^ to two decimal digits only, then match exactly to one of the following answers.

1. g(x) = 0

1. g(x) = 0:79x

1. g(x) = 1:07x

1. g(x) = 1:58x

1. None of the above

What is the closest value to the bias in this case?

1. 0.1

1. 0.3

1. 0.5

1. 0.7

1. 1.0

What is the closest value to the variance in this case?

1. 0.2

1. 0.4

1. 0.6

1. 0.8

1. 1.0

Now, let’s change H. Which of the following learning models has the least expected value of out-of-sample error?

1. Hypotheses of the form h(x) = b

1. Hypotheses of the form h(x) = ax

1. 1. Hypotheses of the form h(x) = ax + b

1. 1. Hypotheses of the form h(x) = ax²

1. 1. Hypotheses of the form h(x) = ax² + b

VC Dimension

1. Assume q 1 is an integer and let m_H(1) = 2. What is the VC dimension of a

hypothesis set whose growth function satis es: m_H(N + 1) = 2m_H(N) ^N_q ? Recall that ^M_m = 0 when m > M.

[a] q 2

[b] q 1

[c] q

[d] q + 1

[e] None of the above

9. For hypothesis sets H₁; H₂; :::; H_K with nite, positive VC dimensions d_vc(H_k), some of the following bounds are correct and some are not. Which among the correct ones is the tightest bound (the smallest range of values) on the VC dimension of the intersection of the sets: d_vc(^T^K_k=1H_k)? (The VC dimension of an empty set or a singleton set is taken as zero)

[a] 0 d_vc(^T^K H_k) ^P^K d_vc(H_k)

k=1 k=1

[b] 0 d_vc(^T^K_k=1H_k) minfd_vc(H_k)g^K_k=1

[c] 0 d_vc(^T^K_k=1H_k) maxfd_vc(H_k)g^K_k=1

[d] minfd_vc(H_k)g^K_k=1 d_vc(^T^K_k=1H_k) maxfd_vc(H_k)g^K_k=1

[e] minfd_vc(H_k)g^K_k=1 d_vc(^T^K_k=1H_k) ^P^K_k=1 d_vc(H_k)

10. For hypothesis sets H₁; H₂; :::; H_K with nite, positive VC dimensions d_vc(H_k), some of the following bounds are correct and some are not. Which among the correct ones is the tightest bound (the smallest range of values) on the VC dimension of the union of the sets: d_vc(^SK_k=1H_k)?

[a] 0		d_vc(	S	K	H	k⁾	^P	^K d_vc(	k⁾
			S	K	H		^P	k=1	H_K
				k=1				k=1
[b] 0	d_vc(^S_k=1H_k)						K	1 + ^P_k=1 d_vc(H_k)

[c] minfd_vc(H_k)g_k^K₌₁ d_vc(^S_k^K₌₁H_k)							^P_k^K₌₁ d_vc(H_k)
f H	g_K			S	K	H	K		H_K
			d_vc(		K		^P_k=1	d_vc(
[d] max d_vc( _k)	k^K=1				k=1	k⁾			k⁾
[e] maxfd_vc(H_k)g_k=1		d_vc(		^S_k=1H_k)			K 1 + ^P_k=1 d_vc(H_k)

Share this:

Share this:

Description

Share this:

Related products

Lab 5 Task 4 System Calls Summary Solution

Task 5 Process Synchronization Solution

Programming II Assignment 5: The Emergency Room Solution

Assignment-2 Solution

Lab 03 Process Management System Calls Solution