Artificial intelligence Homework 4 Solution

$24.99 $18.99

Question 1: Health Behaviours Consider the following causal graphical model involving three Bernoulli random variables, which is a simple model of health status and behaviours: H (health status), C (cautious behaviour), D (disease). People’s health status influences whether they adopt cautious behaviour, and their health status together with their behaviour influence their probability of H…

5/5 – (2 votes)

You’ll get a: zip file solution

 

Categorys:

Description

5/5 – (2 votes)

Question 1: Health Behaviours

Consider the following causal graphical model involving three Bernoulli random variables, which is a simple model of health status and behaviours: H (health status), C (cautious behaviour), D (disease).

People’s health status influences whether they adopt cautious behaviour, and

their health status together with their behaviour influence their probability of H disease.

Question 3: Bandits

Consider the following 6-armed bandit problem. The initial value estimates of the arms are given by Q = {1, 2, 2, 1, 0, 3}, and the actions are represented by A = {1, 2, 3, 4, 5, 6}. Suppose we observe that each lever is played in turn: (from lever 1 to lever 6, and then start from lever 1 again):

=(( −1) 6) +1 (1)

We also observe that the rewards seem to fit the following function:

= 2 cos [

( − 1)]

(2)

6

So, the first two action-reward pairs are 1 = 1, 1 = 2, and 2 = 2, 2 = √3.

  1. Show the estimated Q values from =1 to =12 of the trajectory using the average of the observed rewards, where available. Do not consider the initial estimates as samples.

  1. It turns out the player was following an -greedy strategy, which just happened to coincide with the scheme described above in (1) for the first 12 time steps. For each time step t from 1 to 12, report whether it can be concluded with certainty that a random action was selected.

  1. Suppose now we continue to visit the levers iteratively as in (1), and that the observed rewards continue to fit the pattern established by (2). Is there a limiting expected reward ( ) for each action as approaches infinity? Justify your answer.

Artificial intelligence Homework 4 Solution
$24.99 $18.99