Name: Robot Learning Homework 2 Solution
SKU: 28236
Price: 30.00 USD
Availability: InStock

Description

Rate this product

Name, Surname, ID Number

Problem 2.1 Optimal Control [20 Points]

In this exercise, we consider a finite-horizon discrete time-varying Stochastic Linear Quadratic Regulator with Gaussian noise and time-varying quadratic reward function. Such system is defined as

				s _t₊₁ = A_t s _t + B_t a_t + w _t ,							(1)
where s	is the state, a	is the control signal, w			b ,	is Gaussian additive noise with mean b					and covariance
_t and t^t	= 0, 1, . . . , T is^t	the time horizon. The control^t^N			signal^tt a_t is computed as					t
				a_t = K _t s _t + k_t							(2)
and the reward function is
	r ewar d_t = ^¤		(s _t	r _t )^T R_t (s _t	r _t )	t	when	t = T			(3)
			(s _t	r _t )^T R_t (s _t	r _t )	a^TH _t a_t	when	t = 0, 1, . . . , T	1

Implementation [8 Points]

Implement the LQR with the following properties

1		T =
s₀ N 0, I		T =	50
A_t = ₀	₁	B_t =	0.1
b_t = ₀	0.1			0	0.01
b_t = ₀		_t =		0	0.01
5				0.01	0

K _t = 5 0.3

k_t =

0.3

0.1

if t = 14 or 40

t = 0, 1, . . . , 14

H _t = 1

R_t =

100000

r _t =

0.1

0.01

otherwise

t = 15, 16, . . . , T

Execute the system 20 times. Plot the mean and 95% confidence (see “68–95–99.7 rule” and mat-plotlib.pyplot.fill_between function) over the diﬀerent experiments of the state s _t and of the control signal

_t over time. How does the system behave? Compute and write down the mean and the standard deviation of the cumulative reward over the experiments. Attach a snippet of your code.

Name, Surname, ID Number

LQR as a P controller [4 Points]

The LQR can also be seen as a simple P controller of the form

a_t = K _t s^des_t s _t + k_t , (4)

which corresponds to the controller used in the canonical LQR system with the introduction of the target s^des_t. Assume as target

	8	10	if	t = 0, 1, . . . , 14
	>
	>
	>	0
s _t = r	_t = ^<	0	if	t = 15, 16, . . . , T	(5)
s _t = r	_t = ^<	20	if	t = 15, 16, . . . , T	(5)
des	>
	>

Use the same LQR system as in the previous exercise and run 20 experiments. Plot in one figure the mean

and 95% confidence (see “68–95–99.7 rule” and matplotlib.pyplot.fill_between function) of the first dimension of the state, for both s^des_t = r _t and s^des_t = 0.

Name, Surname, ID Number

Optimal LQR [8 Points]

To compute the optimal gains K _t and k _t , which maximize the cumulative reward, we can use an analytic optimal solution. This controller recursively computes the optimal action by

a_t = H _t + B^T_t V _t₊₁ B_t ¹ B^T_t (V _t₊₁ (A_t s _t + b_t ) v _t₊₁) ,

(6)

which can be decomposed into

Robot Learning Homework 2 Solution

Share this:

Share this:

Description

Share this:

Related products

Programming Assignment # 1 Dynamic Memory Allocation Solution

Exercise 5: Regularized Linear Regression and Bias v.s. Variance Solution

LAB 05 QUESTIONS SOLUTION

Homework 03 Solution

Lab 4: Bash Script and Bitwise Operations Solution