Name: Homework 6 (Week 11)
SKU: 23805
Price: 24.99 USD
Availability: InStock

Description

5/5 – (2 votes)

In class we derived the generalization-error bound for a C-class problem with C > 2, from the training-set error, based on the growth function _ℋ(2 ). In this problem, you will derive the generalization-error bound for a C-class problem from the test-set error and from a validation-set error with finite M.

Throughout this problem:

let ^‘ denote the (in-sample) unnormalized confusion matrix based on dataset , so that entry ) ^‘ *_#$= [number of data points labelled = that were misclassified as

ℎ= ];

also, let ) _out*_#$ = [(ℎ = ) ( = )] be the ^th entry of the out-of-sample confusion matrix _out .

For a given single hypothesis h for the C-class problem (so ℎ ∈ {1,2, ⋯ , }) tested using dataset that has N data points, give an expression for the total number of points that were misclassified _mis , in terms of the entries ) ^‘ *_#$ . Also give an expression for the error rate on , (ℎ) , in terms of the entries

^{)’ *}_#$^.

For the out-of-sample confusion matrix, give an expression for the total probability of error (ℎ ≠ ) in terms of the entries of _out . Hint: are the ^{events for}⁾out^*_#$ ^and ⁾ out^*_-. ^{mutually exclusive?}

Use these results to give expressions for= [incorrect classification] and = percent misclassified by h on .

Apply Hoeffding Inequality to µ and n.

Write the resulting expression in terms of and _out . Reformulate to give an expression in the following form:

[ _out(ℎ) ≤ (ℎ) + ( )] ≥ 1 − d .

in which you fill in for ( ). Hint: this is similar to what we did in Lecture 7 for the C = 2 case.

Is this a generalization-error bound for test-set error, for a C > 2 class problem?

Comment: As you may have observed in the Midterm Assignment Pr. 1, the generalization-error bound based on a test set can be much tighter than the bound based on a training set and its VC dimension.

1. Extend the result of (a) to a validation-set error on _val, in which the hypothesis set has |ℋ| = , 0 < < ∞ .

Hint: does the same technique applying a union bound that we did for the 2-class problem (Lecture 7) apply?

This problem concerns the generalization error bound in a transfer learning problem, as given in Lecture 13 (v2.1), Eq. (6).

In this problem you will study the effects of varying ₂, ₃, and a on the cross-domain generalization error bound.

Throughout this problem, let ε₄₅ be everything in the cross-domain generalization-error bound (RHS of Lecture 13 (v2.1) Eq. (6)), except omitting _2,3^∗. Note that _2,3^∗ is a constant of the parameters we will be varying.

Also throughout this problem, use the values ₈₉ = 10, δ = 0.1, _ℋ:ℋ = 0.1 . However, leave them as variables until you are ready to plot, or until you are asked for a number.

1. Give the simplified number (to two decimal digits) for ε₄₅, for the following cases:

1. 1. ₃= 1, ₂ = 100, = 0.1, 0.5, 0.9
  2. ₃= 10, ₂ = 1000, = 0.1, 0.5, 0.9
  3. ₃= 100, ₂ = 10000, = 0.1, 0.5, 0.9

1. 1. ₃= 1000, ₂ = 100000, = 0.1, 0.5, 0.9

Tip: put these in a table for easy viewing.

(v) Do any of these sets of numbers assure some degree of generalization (i.e.,

ε₄₅ < 0.5

, assuming

2,3

≈ 0)

If so, which?

∗

Comment: As in the supervised learning case, these bounds can be very loose,

but evidence indicates the functional dependence of

^ε45

on its variables still

generally apply.

₂= 1000

₃=

(b)

For this

part,

let

and plot

^ε45

vs. a

for

10, 100, 1000, 10000

(4 curves on one plot), over

0≤ ≤1

. Answer: what

_, 3

approximate value of a is optimal for each value of

Try to explain the

dependence

of a.

on a for

different values of

₃

and

any

difference in

optimal values

^ε45

(c)

For this part, let ₃ = 100 and plot

ε₄₅ vs. a for ₂ = 10, 100, 1000, 10000

4 curves on one plot), over 0 ≤ ≤ 1. Answer: what approximate value of a is optimal for each value of ₃ ? Try to explain the dependence of ε₄₅ on a for different values of ₂ , and any difference in optimal values of a.

Common default values for a are α = 0.5 and α = β.

1. 1. In terms of minimizing the cross-domain generalization-error bound, which default choice looks better (based on your answers to (b) and (c) above)? Is that choice reasonably consistent with your results of (b) and (c)?
  2. Give algebraic expressions for ε₄₅(α = 0.5) and ε₄₅(α = β). Compare them algebraically: can you draw any conclusions about which is lower?

1. 1. Plot _;<( = 0.5) vs. N for = 0.01, 0.1, 0.5, for 1000 ≤≤ 100000

(3 curves on 1 plot). Repeat for _;<( = ). What conclusions can you draw from the plots?

(a) Is it possible to have a covariate shift while satisfying all of:

₂( | ) = ₃( | ), ₂( ) = ₃( ), ₂( | ) = ₃( | ) ?

If no, prove your answer; if yes, justify your answer.

Is it possible to have a covariate shift while satisfying:

₂( | ) = ₃( | ) ?

If no, prove your answer; if yes, justify your answer.

Is it possible to have a concept shift while satisfying:

₂( ) = ₃( ) and ₂( ) = ₃( ) ?

If no, prove your answer; if yes, justify your answer.

p. 3 of 3

Homework 6 (Week 11)

Share this:

Share this:

Description

Share this:

Related products

Programming Assignment # 1 Dynamic Memory Allocation Solution

Homework 5: Heap ADT using STL Solution

Take Home Exam 4 Solution

Assignment-7 Binary Search Trees II:Solution

Parallel Programming MP2 Solution