Lab Assignment 3 Solution

$30.00 $24.00

Instructions: Please submit a report file that includes: a short answer, related code, printouts, etc. for each problem (where necessary). Push your answers to Github or Canvas. All programming must be in R (or R Markdown). Problem 1 I rolled a 6-sided die 100 times and observed the following results: 1 2 3 4 5…

5/5 – (2 votes)

You’ll get a: zip file solution

 

Description

5/5 – (2 votes)

Instructions: Please submit a report file that includes: a short answer, related code, printouts, etc. for each problem (where necessary). Push your answers to Github or Canvas. All programming must be in R (or R Markdown).

Problem 1

I rolled a 6-sided die 100 times and observed the following results:

1

2

3

4

5

6

18

11

9

25

18

19

Problem 1a

What is the maximum likelihood estimate for the dice roll probabilities θ = (θ1, · · · , θ6)?

Problem 1b

Assume the prior θ Diri(1). I.e., assume that prior over roll probabilities are uniform Dirichlet with prior

  • = 1 (the Dirichlet distribution is a multivariate generalization of the beta distribution, with probability

1

K

αi−1

Q

distribution function p(θ|α) =

θi

). I.e., use a prior assumption that the die is fair and using six

B(α)

i=1

artificial rolls (one on each face) to incorporate this prior.

What is the posterior log-likelihood of the above rolls (in terms of θ)? What is the maximum a posteriori estimate for θ?

Problem 1c

Program a Gibbs sampler to draw probable values of θ from the posterior distribution, using the prior from Problem 1b. Recall the MCMC techniques from Lab 3 and Assignment 1, Problem 3. Plot a histogram of the drawn values of θ.

Note: if you try to sample a single θi, i {1, · · · , 6} from the posterior, you should get the same answer,

6

P

repeated, due to the condition θi = 1. Therefore, you must draw at least two θi’s at a time. The

i=1

rdirichlet function draws the full θ vector at once, by drawing each θi Gamma(αi, 1), then normalizing the result.

Problem 2

You will further analyze the gradAdmit.csv dataset (from Lab 4 and Assignment 2). As a reminder, this dataset contains a list of students (rows), along with whether or not they were admitted to graduate school (admit), their GRE score (gre), their GPA (gpa), and the prestige of their undergraduate university (rank). You do not need to repeat the parameter tuning from Assignment 2.

1

Problem 2a

Compute the class balance for both the training set (80% from Assignment 2) and test set (20%). For each dataset, what percentage of students were admitted?

Problem 2b

Using your optimal parameters from Assignment 2, Problem 1c, and the model trained on the full training set (if you did this improperly before, redo it), compute the precision, recall, and specificity of the test dataset. Hint: the confusionMatrix function may be helpful.

Problem 2c

Based on your answer to Problem 2a, what percentage of minority over-sampling would create the most even class balance? Generate that many artificial training samples using the SMOTe algorithm (you may use the SMOTE function). Combine the original training dataset with the generated dataset and confirm the class balance is as desired.

Problem 2d

Retrain your model on the combined training dataset (using the same parameters). Compute the precision, recall, and specificity of the test dataset. Note: the test dataset should not be augmented. How do they differ from Problem 2b?

Problem 3

Use importance sampling and the Monte Carlo integration method to estimate the integral R ex sin xdx.

) = (

sin x,

x 10π

10π

Use

p x

e

x

and

g

x

. Note: this problem is similar to Assignment 1, Problem 2.

( ) =

(

0,

x < 10π

Problem 3a

What is the probability of drawing a sample x ≥ 10π from the exponential distribution (with λ = 1), i.e. p(x 10π)?

Problem 3b

What is the exact solution to the integral, i.e., the result obtained via calculus? You may use the result given

in Assignment 1, Problem 2: R eλx sin xdx = 1+1λ2 .

0

Problem 3c

Pick a biasing distribution that should work well for this problem. Your goal is to minimize the variance. Explain your choice. Note: choosing p(x) > p(x) when g2(x)p(x) is large and p(x) < p(x) when g2(x)p(x)

is small reduces the variance.

2

Problem 3d

Numerically estimate the integral using the importance sampling method with the biasing distribution from Problem 3c and number of samples n = 106.

3

Lab Assignment 3 Solution
$30.00 $24.00