Robot Learning Homework 2 Solution

$30.00 $24.00

Name, Surname, ID Number Problem 2.1 Optimal Control [20 Points] In this exercise, we consider a finite-horizon discrete time-varying Stochastic Linear Quadratic Regulator with Gaussian noise and time-varying quadratic reward function. Such system is defined as s t+1 = At s t + Bt at + w t , (1) where s is the state,…

Rate this product

You’ll get a: zip file solution

 

Categorys:

Description

Rate this product

Name, Surname, ID Number

Problem 2.1 Optimal Control [20 Points]

In this exercise, we consider a finite-horizon discrete time-varying Stochastic Linear Quadratic Regulator with Gaussian noise and time-varying quadratic reward function. Such system is defined as

s t+1 = At s t + Bt at + w t ,

(1)

where s

is the state, a

is the control signal, w

b ,

is Gaussian additive noise with mean b

and covariance

t and tt

= 0, 1, . . . , T ist

the time horizon. The controltN

signaltt at is computed as

t

at = K t s t + kt

(2)

and the reward function is

r ewar dt = ¤

(s t

r t )T Rt (s t

r t )

t

when

t = T

(3)

(s t

r t )T Rt (s t

r t )

aTH t at

when

t = 0, 1, . . . , T

1

  1. Implementation [8 Points]

Implement the LQR with the following properties

1

T =

s0 N 0, I

50

At = 0

1

Bt =

0.1

bt = 0

0.1

0

0.01

t =

0

5

0.01

0

K t = 5 0.3

kt =

0.3

0

0.1

if t = 14 or 40

8

0

if

t = 0, 1, . . . , 14

8

H t = 1

Rt =

>

100000

0

r t =

>

10

>

>

>

0

0.1

>

0

>

0.01

0

otherwise

>

20

if

t = 15, 16, . . . , T

<

<

>

>

>

>

:

:

Execute the system 20 times. Plot the mean and 95% confidence (see “68–95–99.7 rule” and mat-plotlib.pyplot.fill_between function) over the different experiments of the state s t and of the control signal

  • t over time. How does the system behave? Compute and write down the mean and the standard deviation of the cumulative reward over the experiments. Attach a snippet of your code.

Name, Surname, ID Number

  1. LQR as a P controller [4 Points]

The LQR can also be seen as a simple P controller of the form

at = K t sdest s t + kt , (4)

which corresponds to the controller used in the canonical LQR system with the introduction of the target sdest. Assume as target

8

10

if

t = 0, 1, . . . , 14

>

>

>

0

s t = r

t = <

if

t = 15, 16, . . . , T

(5)

20

des

>

>

>

  • 0

Use the same LQR system as in the previous exercise and run 20 experiments. Plot in one figure the mean

and 95% confidence (see “68–95–99.7 rule” and matplotlib.pyplot.fill_between function) of the first dimension of the state, for both sdest = r t and sdest = 0.

Name, Surname, ID Number

  1. Optimal LQR [8 Points]

To compute the optimal gains K t and k t , which maximize the cumulative reward, we can use an analytic optimal solution. This controller recursively computes the optimal action by

at = H t + BTt V t+1 Bt 1 BTt (V t+1 (At s t + bt ) v t+1) ,

(6)

which can be decomposed into

3

Robot Learning Homework 2 Solution
$30.00 $24.00