Reinforcement Learning Assignment 2 Solution

~~$30.00~~ $24.00

Introduction The goal of this assignment is to do experiments with Monte-Carlo(MC) Learn-ing and Temporal-Di erence(TD) Learning. MC and TD methods learn directly from episodes of experience without knowledge of MDP model. TD method can learn after every step, while MC method requires a full episode to update value evaluation. Your goal is to implement…

Description

5/5 – (2 votes)

Introduction

The goal of this assignment is to do experiments with Monte-Carlo(MC) Learn-ing and Temporal-Di erence(TD) Learning. MC and TD methods learn directly from episodes of experience without knowledge of MDP model. TD method can learn after every step, while MC method requires a full episode to update value evaluation. Your goal is to implement MC and TD methods and test them in the small gridworld.

Small Gridworld

Figure 1: Gridworld

As shown in Fig.1, each grid in the gridwold represents a certain state. Let s_t denotes the state at grid t. Hence the state space can be denoted as S = fs_tjt 2 0; ::; 35g. S₁ and S₃₅ are terminal states, where the others are non-terminal states and can move one grid to north, east, south and west. Hence the action space is A = fn; e; s; wg. Note that actions leading out of the grid leave state unchanged. Each movement get a reward of -1 until the terminal state is reached.

Experiment Requirments

Programming language: python3

You should implement both rst-visit and every-visit MC method and TD(0) to evaluate state value in small grid world.

Report and Submission

Your reports and source les (.py) should be compressed and named after \studentID+name”.

The les should be submitted on Canvas before Apr. 10, 2020.

Reinforcement Learning Assignment 2 Solution

Share this:

Share this:

Description

Share this:

Related products

Lab 3: Using a Pinhole Camera Solution

Lab 2: Ray tracing a Sphere Solution

Lab 1: Checkerboard Solution

Assignment-2 Solution

ASSIGNMENT-04 Solution