Description
-
Introduction
The goal of this assignment is to do experiment with model-free control, includ-ing on-policy learning (Sarsa) and o -policy learning (Q-learning). For deep understanding of the principles of these two iterative approaches and the di er-ences between them, you will implement Sarsa and Q-learning at the application of the Cli Walking Example, respectively.
-
Cli Walking
Figure 1: Cli Walking
Consider the gridworld shown in the Figure 1. This is a standard undis-counted, episodic task, with start state (S), goal state (G), and the usual actions causing movement up, down, right, and left. Reward is -1 on all transitions ex-cept those into the region marked \The Cli “. Stepping into this region incurs a reward of -100 and sends the agent instantly back to the start.
-
Experiment Requirments
Programming language: python3
You should build the Cli Walking environment and search the optimal travel path by Sara and Q-learning, respectively.
Di erent settings for can bring di erent exploration on policy update. Try several (e.g. = 0:1 and = 0) to investigate their impacts on performances.
2
-
Report and Submission
Your reports and source code should be compressed and named after “stu-dentID+name”.
The les should be submitted on Canvas before Apr. 24, 2020.
3