Reinforcement Learning Assignment 5 Solution

$30.00 $24.00

Introduction Actor-Critic method reduces the variance in Monte-Carlo policy gradient by directly estimating the action-value function. The goal of this assignment is to do experiment with Asynchronous Advantage Actor-Critic (A3C). Due to the inherent advantage of PG method, AC method is able to tackle the environment with continuous action space. However, a naive application of…

5/5 – (2 votes)

You’ll get a: zip file solution

 

Description

5/5 – (2 votes)
  • Introduction

Actor-Critic method reduces the variance in Monte-Carlo policy gradient by directly estimating the action-value function. The goal of this assignment is to do experiment with Asynchronous Advantage Actor-Critic (A3C). Due to the inherent advantage of PG method, AC method is able to tackle the environment with continuous action space. However, a naive application of AC method with neural network approximation is unstable for challenging problem. In this assignment, you’re required to train the agent with continuous action space and have some fun in some classical RL continuous control scenarios.

  • Actor-Critic Algorithm

In original policy gradient 5 log (st; at)vt, return vt is the unbiased estimation of expected long-term value Q (s; a) following a policy (s) (Actor). However, original policy gradient su ers from high variance. Actor-Critic Algorithm uses Q value function Qw(s; a), named Critic, to estimate Q (s; a).

In A3C, we maintain several instances of local agent and a global agent. Instead of experience replay, we asynchronously execute all the local agents in parallel. The parameter of the global agent is updated by all the local experi-ence.

The pseudo code is listed at the end of the article. For more details of the algorithm, you can refer to the original paper in the following.

Mnih V, Badia A P, Mirza M, et al. Asynchronous methods for deep reinforcement learning[C]//International conference on machine learning. 2016: 1928-1937.

  • Experiment Description

Programming language: python3

You are required to implement A3C algorithm.

You should test your agent in a classical RL control environment{Pendulum. OPENAI gym provides this environment, which is implemented with python (https://github.com/openai/gym/wiki/Pendulum-v0).

  • Report and Submission

Your report and source code should be compressed and named after \stu-dentID+name+assignment5″.

The les should be submitted on Canvas before June 5, 2020.

2

Reinforcement Learning Assignment 5 Solution
$30.00 $24.00