Implement value iteration

$24.99 $18.99

In this exercise you will implement the value iteration algorithm for Markov Decision Processes. The value iteration algorithm is applied to a 2D grid decision process, where different locations on a can contain different rewards. The purpose is to compute the value of each location, and the corresponding policy. Instructions ^^^^^^^^^^^^ 1. To prepare for…

5/5 – (2 votes)

You’ll get a: zip file solution

 

Description

5/5 – (2 votes)

In this exercise you will implement the value iteration algorithm for Markov Decision Processes.

The value iteration algorithm is applied to a 2D grid decision process, where different locations on a can contain different rewards. The purpose is to compute the value of each location, and the

corresponding policy.

Instructions

^^^^^^^^^^^^

1. To prepare for the exercise, make sure you have consulted the lecture slides

and MyCourses material related to Markov Processes, The Bellman Equation, and

Value Iteration.

2. Copy `template-valueiteration.py` to `valueiteration.py`

3. Read and understand all code

– mdp.py :: This file defines an abstract class providing a general interface

for Markov Decision Processes. No need to edit.

– valueiteration.py :: Declares function related to value iteration

TASKs 2.x are found here.

– gridmdp.py :: This file defines a grid Markov Decision Process by

inheriting from mdp.py. No need to edit.

– gridactions.py :: Defines actions used by gridmdp.py. No need to edit.

– utils.py :: Defines some utility function, notably `argmax` which may come in handy.

4. Implement TASK 2.1, and 2.2

Tasks

^^^^^

– TASK 2.1 :: Implement the `value_of` function.

– TASK 2.2 :: Implement the `value_iteration` function (using `value_of`)

Testing

^^^^^^^

– `python valueiteration.py` :: Will execute a few basic examples on grids.

– `python test_valueiteration.py` :: Will execute a few unit tests.

Good luck!

Implement value iteration
$24.99 $18.99