Homework 3 (Solutions)

Homework 3 (Solutions)#

Linear Quadratic Regulator#

1. Continuous-Time LQR#

In this problem, you will derive the equations for the continuous-time Linear Quadratic Regulator (LQR). The steps will be similar to those covered for the discrete-time case in lecture; however, instead of using the Bellman equation, you will apply the Hamilton-Jacobi-Bellman (HJB) equation to the continuous-time setting.

The continuous-time LQR problem is define below.

\[\begin{split} \begin{align*} \min_{u(\cdot)} & \quad x(T)^TQ_Tx(T) + \int_{0}^{T} x(t)^TQ(t)x(t) + u(t)^TR(t)u(t) dt\\ \text{subject to} & \quad \dot{x} = A(t)x(t) + B(t)u(t), \:\: t\in[0,T]\\ & \quad x(0) = x_\text{current} \end{align*} \end{split}\]

The HJB equation is

\[ \frac{\partial V}{\partial t} + \min_{u(t)\in \mathcal{U}} g(x(t), u(t), t) + \nabla V(x(t), t)^T f(x(t), u(t), t) = 0 \]

where $V$ is the value function.

(a)#

Let us begin with the time-invariant case for the continuous-time LQR problem.

Based on the LQR formulation and the Hamilton-Jacobi-Bellman (HJB) equation:

Explicitly write out the functions $g(x, u)$, $f(x, u)$, and the terminal condition for the value function $V(x(T), T)$.
Then, using these expressions, write the HJB equation specialized to the continuous-time, finite horizon, time-invariant LQR problem.

Solution

For the time-invariant continuous-time finite-horizon LQR problem,

\[\dot{x} = Ax + Bu\]

the functions in the HJB equation are:

\[\begin{split}\begin{align*} g(x,u) = x^T Q x + u^T R u \\ f(x,u) = Ax + Bu \end{align*}\end{split}\]

The terminal condition is:

\[V(x(T),T) = x(T)^TQ_Tx(T)\]

Therefore, the HJB equation becomes

\[\frac{\partial V}{\partial t} + \min_{u} ( x^TQx + u^TRu + \nabla V(x,t)^T(Ax+Bu) ) = 0\]

with terminal condition

\[V(x,T)=x^TQ_Tx\]

(b)#

Next, suppose the value function is quadratic at all times; specifically, let $V(x(t), t) = x(t)^T P(t) x(t)$. Using this form, explicitly derive expressions for both the partial derivative of the value function with respect to time, $\frac{\partial V}{\partial t}$, and for the gradient of the value function with respect to the state, $\nabla V(x(t), t)$.

Solution

Assume the value function has the quadratic form:

\[V(x(t),t) = x(t)^T P(t) x(t)\]

where $P(t)$ is symmetric.

First, the partial derivative with respect to time is

\[\frac{\partial V}{\partial t} = x(t)^T \dot{P}(t) x(t)\]

In the HJB equation, $V(x,t)$ is treated as a function of two independent variables, $x$ and $t$.

Therefore, when computing the partial derivative with respect to $t$, the state $x$ is held fixed. The effect of the state changing with time is accounted for separately by the term $\nabla V(x,t)^T f(x,u)$ in the HJB equation.

Next, the gradient of $V$ with respect to the state is

\[ \nabla V(x(t),t) = \frac{\partial}{\partial x} [ x(t)^T P(t) x(t) ]\]

Using the quadratic derivative rule,

\[ \nabla V(x(t),t) = (P(t)+P(t)^T)x(t)\]

Since $P(t)=P(t)^T$, this becomes

\[\nabla V(x(t),t) = 2P(t)x(t)\]

(c)#

After substituting your expressions from (b) into the HJB equation, show that the optimal policy $\pi^\star(x,t)$ is given by:

\[ \pi^\star(x,t) = -R^{-1}B^TP(t)x(t) \]

Solution

Substituting the results from part (b) into the HJB equation gives

\[x^T\dot{P}x + \min_u [ x^TQx + u^TRu + (2Px)^T(Ax+Bu) ] =0\]

To find the optimal control, differentiate with respect to $u$ and set the result equal to zero:

\[\frac{\partial}{\partial u} [ u^TRu + 2x^TPBu ] = 2Ru + 2B^TPx =0\]

Solving for $u$,

\[ Ru = -B^TPx \]

\[u^\star = -R^{-1}B^TPx\]

Therefore, the optimal policy is

\[\pi^\star(x,t) = -R^{-1}B^TP(t)x(t)\]

(d)#

With the optimal policy $\pi^\star(x,t)$, show that the HJB equation reduces to

\[ 0 = x(t)^T(\dot{P} + Q + 2P(t)A - P(t)BR^{-1}B^TP(t))x(t) \]

Will the matrix in the above quadratic form be always symmetric?

Solution

Substitute the optimal policy

\[u^\star = -R^{-1}B^TPx\]

into the HJB equation:

\[x^T\dot{P}x + x^TQx + (u^\star)^TRu^\star + (2Px)^T(Ax+Bu^\star) =0\]

Now compute each term involving $u^\star$.

First,

\[(u^\star)^TRu^\star = x^T PBR^{-1}B^T Px\]

Next,

\[(2Px)^TBu^\star = 2x^T PB(-R^{-1}B^T Px) = -2x^T PBR^{-1}B^T Px\]

Therefore,

\[(u^\star)^TRu^\star + (2Px)^T Bu^\star = -x^T PBR^{-1}B^T Px\]

Also,

\[(2Px)^T Ax = 2x^T PAx\]

Thus, the HJB equation becomes

\[0 = x^T\dot{P}x + x^T Q x + 2x^T P Ax - x^T PBR^{-1}B^T Px\]

or equivalently,

\[0 = x^T ( \dot{P} + Q + 2PA - PBR^{-1}B^TP ) x\]

Let’s check whether $\dot{P}+Q+2PA-PBR^{-1}B^TP$ is symmetric by checking it transpose:

\[(\dot{P}+Q+2PA-PBR^{-1}B^TP)^T\]

Since $P(t)$ is symmetric for all t, taking the derivative preserves symmetry. So $\dot{P}$ is symmetric.

In LQR, Q is symmetric.

Next,

\[(PBR^{-1}B^TP)^T = P^TB(R^{-1})^TB^TP^T\]

then since $P^T = P$ and $R = R^T$, we have $(R^{-1})^T = R^{-1}$, so

\[(PBR^{-1}B^TP)^T = PBR^{-1}B^TP\]

is symmetric.

Only $PA$ is not necessarily symmetric. Therefore, $\dot{P}+Q+2PA-PBR^{-1}B^TP$ is not necessarily symmetric.

(e)#

Recall that any matrix $M$ can be decomposed in terms of its symmetric and skew-symmetric components:

\[ M = \underbrace{\frac{M + M^T}{2}}_{\text{symmetric}} + \underbrace{\frac{M-M^T}{2}}_{\text{skew-symmetric}} \]

Prove that for any vector $x$ of appropriate size, that $x^TMx = x^T (\frac{M + M^T}{2})x$. That is, $x^T (\frac{M-M^T}{2})x = 0$.

Solution

Let

\[M_s = \frac{M+M^T}{2}, \qquad M_k = \frac{M-M^T}{2}\]

where $M_s$ is symmetric and $M_k$ is skew-symmetric. Then

\[M = M_s + M_k\]

and therefore

\[x^TMx = x^T(M_s+M_k)x = x^TM_sx + x^TM_kx\]

Now consider the skew-symmetric term

\[x^TM_kx\]

Since this is a scalar, it is equal to its transpose:

\[x^TM_kx = (x^TM_kx)^T\]

Using the transpose rule,

\[(x^TM_kx)^T = x^TM_k^Tx\]

Because $M_k$ is skew-symmetric,

\[M_k^T = -M_k\]

Thus,

\[x^TM_kx = x^T(-M_k)x = -x^TM_kx\]

Therefore,

\[2x^TM_kx = 0\]

so

\[x^TM_kx = 0\]

Hence,

\[x^TMx = x^TM_sx\]

or equivalently,

\[x^TMx = x^T (\frac{M+M^T}{2})x\]

and,

\[x^T(\frac{M-M^T}{2})x = 0\]

(f)#

Given the results from (e), finally show that the value function for the continuous-time LQR problem is the solution to the following ODE:

\[ -\dot{P}(t) = Q + P(t)A + A^TP(t) -P(t)BR^{-1}B^TP(t) \]

With boundary condition $P(T) = Q_T$. This is the Riccati differential equation.

Solution

From part (d), after substituting the optimal policy into the HJB equation, we obtained

\[ 0 = x^T ( \dot{P} + Q + 2PA - PBR^{-1}B^TP ) x\]

However, the matrix inside the quadratic form is not necessarily symmetric because $PA$ is not necessarily symmetric. From part (e), for any matrix $M$,

\[x^TMx = x^T(\frac{M+M^T}{2})x\]

Therefore, the term $2PA$ in the quadratic form can be replaced by its symmetric part:

\[x^T(2PA)x = x^T(\frac{2PA+(2PA)^T}{2})x\]

Since

\[(2PA)^T = 2A^TP\]

we have

\[x^T(2PA)x = x^T(PA+A^TP)x\]

Thus, the HJB equation becomes

\[0 = x^T ( \dot{P} + Q + PA + A^TP - PBR^{-1}B^TP ) x\]

Since the matrix inside the quadratic form is symmetric, and the equality holds for all (x), the matrix must be zero:

\[\dot{P} + Q + PA + A^TP - PBR^{-1}B^TP = 0\]

Rearranging gives the Riccati differential equation:

\[-\dot{P}(t) = Q + P(t)A + A^TP(t) - P(t)BR^{-1}B^TP(t)\]

The terminal condition comes from the terminal cost:

\[V(x(T),T)=x(T)^TQ_Tx(T)\]

Since

\[V(x(T),T)=x(T)^TP(T)x(T)\]

we must have

\[P(T)=Q_T\]

Therefore, the value function is determined by solving

\[-\dot{P}(t) = Q + P(t)A + A^TP(t) - P(t)BR^{-1}B^TP(t), \qquad P(T)=Q_T\]

(g)#

Consider the infinite-horizon case. How does the above ODE change? The resulting equation is referred to as the continuous algebraic Riccati equation (CARE).

Solution

For the infinite-horizon LQR problem, there is no finite terminal time and no terminal condition $P(T)=Q_T$. We look for a steady-state value function

\[P(t) = P, \qquad V(x) = x^T P x\]

where $P$ is constant in time. Therefore,

\[\dot{P}(t)=0\]

Starting from the Riccati differential equation,

\[-\dot{P}(t) = Q + P(t)A + A^TP(t) - P(t)BR^{-1}B^TP(t)\]

setting $\dot{P}(t)=0$ gives

\[0 = Q + PA + A^TP - PBR^{-1}B^TP\]

This is the continuous algebraic Riccati equation (CARE):

\[A^TP + PA - PBR^{-1}B^TP + Q = 0\]

The corresponding infinite-horizon optimal policy is

\[u^\star(t) = -R^{-1}B^T Px(t)\]

2. Time-varying case#

Now consider the time-varying (and finite horizon) case for both continuous-time and discrete-time systems. Clearly state the Riccati equations that govern the evolution of the matrix $P(t)$ and $P_k$ in each setting. Provide the forms of the time-varying Riccati differential equation for the continuous-time case, and the Riccati recursion for the discrete-time case.

Solution

The continuous-time finite-horizon time-varying Riccati differential equation is

\[ -\dot{P}(t) = Q(t) + A(t)^TP(t) + P(t)A(t) - P(t)B(t)R(t)^{-1}B(t)^TP(t), \]

with terminal condition

\[ P(T)=Q_T. \]

The corresponding optimal control law is

\[ u^\star(t) = -R(t)^{-1}B(t)^TP(t)x(t). \]

For the discrete-time finite-horizon time-varying LQR problem, the Riccati recursion is

\[ P_k = Q_k + A_k^TP_{k+1}A_k - A_k^TP_{k+1}B_k (R_k+B_k^TP_{k+1}B_k)^{-1} B_k^TP_{k+1}A_k, \]

with terminal condition

\[ P_N=Q_N. \]

The corresponding optimal control law is

\[ u_k^\star=-K_kx_k, \]

where

\[ K_k = (R_k+B_k^TP_{k+1}B_k)^{-1} B_k^TP_{k+1}A_k. \]

Both Riccati equations are solved backward in time from their terminal conditions.

3. LQR stability#

Consider the continuous-time infinite-horizon LQR problem. Show that the closed-loop system $\dot{x} = (A - BK)x$, where $K = R^{-1} B^T P$ is the optimal LQR gain, is asymptotically stable. Specifically, prove that the optimal value function $V(x) = x^T P x$ serves as a valid Lyapunov function for the closed-loop system, thereby establishing asymptotic stability of the origin.

You may assume that under the LQR assumption, the solution to CARE $P$ is positive definite.

Solution

For the continuous-time infinite-horizon LQR problem, the optimal control is

\[u^\star = -Kx\]

where

\[K = R^{-1}B^TP\]

Thus, the closed-loop system is

\[\dot{x} = (A-BK)x\]

We want to show that

\[V(x)=x^TPx\]

is a Lyapunov function for the closed-loop system.

Since $P$ is positive definite, we have

\[V(x)=x^TPx > 0, \qquad x\neq 0, \qquad V(0)=0 \]

Therefore, $V(x)$ is positive definite. Now compute the time derivative of $V$ along the closed-loop trajectory:

\[ \dot{V}(x) = \dot{x}^TPx + x^TP\dot{x} \]

Substituting

\[ \dot{x}=(A-BK)x \]

gives

\[ \dot{V}(x) = x^T(A-BK)^TPx + x^TP(A-BK)x \]

Therefore,

\[ \dot{V}(x) = x^T [ (A-BK)^TP + P(A-BK) ] x \]

To establish asymptotic stability, we need to show that $\dot{V}(x)<0$.

Since

\[ \dot{V}(x)=x^T (A-BK)^T P + P(A-BK)x\]

$\dot{V}(x)<0$ can be proved by showing

\[ (A-BK)^T P+P(A-BK) \prec 0 \]

Expanding the matrix term,

\[ (A-BK)^TP + P(A-BK) = A^TP + PA - K^TB^TP - PBK \]

From the CARE in part (g),

\[ A^TP + PA - PBR^{-1}B^TP + Q = 0 \]

and use

\[ K = R^{-1}B^TP \]

we have

\[ A^TP + PA - PBK + Q = 0, \quad \Rightarrow \quad A^TP + PA - PBK = -Q \]

Substituting this into our $\dot{V}$ expression,

\[ A^TP + PA - K^TB^TP - PBK = - K^TB^TP - Q \]

Breaking down $- K^TB^TP - Q $ further,

\[ - K^TB^TP - Q = - K^TRR^{-1}B^TP - Q = - K^TRK - Q = - (K^TRK + Q) \]

Since in LQR we assume $Q\succeq 0$ and $R\succ 0$, we also have $K^TRK \succ 0$. This means

\[ - (K^TRK + Q) \prec 0 \]

As such, this means that we have $$ \dot{V}(x) = x^T [(A-BK)^TP + P(A-BK)]x = - x^T[(K^TRK + Q)]x < 0 $$

Hence, the origin of the closed-loop system is asymptotically stable, and $V(x)=x^TPx$ is a valid Lyapunov function.

4. Discounted LQR#

Consider the discrete-time LQR problem but with a discount term on the stage cost: $\sum_{k=0}^{N-1} \gamma^k (x_k^TQ_kx_k + u_k^TR_ku_k)$. How does the optimal gain and resulting Riccati recursion equation differ?

For the infinite horizon setting and without discount, to solve for $P_\infty$, we can use the dare(A,B,Q,R) function (in Matlab or Python) given matrices $A,B,Q,R$. Is it still possible to use the same function to solve for $P_\infty$ for the discounted setting? Provide a brief explanation.

Solution

For the finite horizon case, with the discount, it remains a discrete-time finite-horizon time-varying problem with $\tilde{Q}_k = \gamma^kQ_k$ and $\tilde{R}_k = \gamma^kR_k$

Now, consider the discounted discrete-time infinite-horizon LQR problem

\[J = \sum_{k=0}^{\infty} \gamma^k ( x_k^TQx_k + u_k^TRu_k ), \qquad 0<\gamma<1\]

subject to

\[x_{k+1}=Ax_k+Bu_k\]

Assume the value function has the form

\[V(x_k)=x_k^TPx_k\]

The Bellman equation is

\[V(x_k) = \min_{u_k} [ x_k^TQx_k + u_k^TRu_k + \gamma V(x_{k+1}) ]\]

Substituting $x_{k+1}=Ax_k+Bu_k$,

\[V(x_k) = \min_{u_k} [ x_k^TQx_k + u_k^TRu_k + \gamma (Ax_k+Bu_k)^TP(Ax_k+Bu_k) ]\]

Taking the derivative with respect to (u_k) and setting it equal to zero gives

\[2Ru_k + 2\gamma B^TP(Ax_k+Bu_k) = 0\]

Therefore,

\[(R+\gamma B^TPB)u_k = -\gamma B^TPAx_k\]

so the optimal control is

\[u_k^\star=-Kx_k\]

where

\[K = (R+\gamma B^TPB)^{-1} \gamma B^TPA\]

Substituting the optimal control back into the Bellman equation gives the discounted algebraic Riccati equation

\[P = Q + \gamma A^TPA - \gamma^2 A^TPB (R+\gamma B^TPB)^{-1} B^TPA\]

We can still use the standard DARE solver by rewriting the discounted problem as an undiscounted LQR problem with scaled dynamics:

\[\tilde{A}=\sqrt{\gamma}A, \qquad \tilde{B}=\sqrt{\gamma}B\]

Then the discounted Riccati equation becomes

\[P = Q + \tilde{A}^TP\tilde{A} - \tilde{A}^TP\tilde{B} ( R+\tilde{B}^TP\tilde{B} )^{-1} \tilde{B}^TP\tilde{A}\]

Therefore, $P_\infty$ can be computed using

\[P_\infty = \texttt{dare}(\sqrt{\gamma}A,\sqrt{\gamma}B,Q,R)\]

After obtaining $P_\infty$, the discounted optimal gain is

\[K = (R+\gamma B^TP_\infty B)^{-1} \gamma B^TP_\infty A\]

5. Trajectory tracking#

Run the tracking_LQR.ipynb demo notebook. Please read the comments in the notebook to get a good sense of what the notebook is doing.

After reading and running the notebook, answer the following questions.

(a)#

Why does there exist some “steady state error” at the end of the trajectory?

Solution

There is no integral action. The LQR controller minimizes a quadratic cost function that balances tracking performance and control effort. Since control effort is penalized in the cost function, the optimal solution may allow a small tracking error if eliminating that error would require additional control input.

As a result, the controller does not necessarily drive the tracking error exactly to zero at the end of the trajectory. Instead, it finds the state and control trajectories that minimize the overall cost, which can lead to a small residual error.

(b)#

If we found ourselves running up against control limits, what could we change in (i) the tracking LQR formulation, or (ii) the computation of the nominal trajectory, to make this less likely to happen?

Solution

The controller may run into control limits because the nominal control plus the feedback correction exceeds the actuator bounds. In the tracking LQR formulation, this can be reduced by increasing the control cost $R$, which penalizes large feedback inputs and makes the controller less aggressive. We can also decrease the state-tracking weights in $Q$ if the controller is trying too hard to remove small errors.

In the nominal trajectory computation, we can design the nominal control sequence to stay away from the actuator limits. For example, we can impose tighter thrust constraints or add a margin around the actuator bounds. This gives the feedback controller additional authority to reject disturbances without saturating.

(c)#

Even with closed-loop control, we see that the red “safety bubble” surrounding the quad intersects the obstacle over a short time interval. What could we do to avoid this?

Solution

Even with LQR feedback, the quadcopter does not follow the nominal trajectory exactly because of disturbances, modeling error, and control limits. Therefore, if the nominal trajectory passes too close to the obstacle, the safety bubble can still intersect the obstacle.

To avoid this, we should modify the nominal trajectory optimization to include a larger obstacle clearance margin. This can be done by inflating the obstacle radius or adding an additional safety buffer in the obstacle avoidance constraint. As a result, the nominal trajectory stays farther away from the obstacle, leaving room for tracking errors caused by disturbances, modeling inaccuracies, and actuator saturation.

Homework 3 (Solutions)

Contents

Homework 3 (Solutions)#

Linear Quadratic Regulator#

1. Continuous-Time LQR#

(a)#

(b)#

(c)#

(d)#

(e)#

(f)#

(g)#

2. Time-varying case#

3. LQR stability#

4. Discounted LQR#

5. Trajectory tracking#

(a)#

(b)#

(c)#