# Homework 3 (Solutions)
## Linear Quadratic Regulator


### 1. Continuous-Time LQR

In this problem, you will derive the equations for the continuous-time Linear Quadratic Regulator (LQR). The steps will be similar to those covered for the discrete-time case in lecture; however, instead of using the Bellman equation, you will apply the Hamilton-Jacobi-Bellman (HJB) equation to the continuous-time setting.

The continuous-time LQR problem is define below.

$$
\begin{align*}
\min_{u(\cdot)} & \quad x(T)^TQ_Tx(T) + \int_{0}^{T} x(t)^TQ(t)x(t) + u(t)^TR(t)u(t) dt\\
\text{subject to} & \quad \dot{x} = A(t)x(t) + B(t)u(t), \:\: t\in[0,T]\\
& \quad x(0) = x_\text{current}
\end{align*}
$$

The HJB equation is

$$
\frac{\partial V}{\partial t} + \min_{u(t)\in \mathcal{U}} g(x(t), u(t), t) + \nabla V(x(t), t)^T f(x(t), u(t), t) = 0
$$

where $V$ is the value function.


#### (a)
Let us begin with the *time-invariant* case for the continuous-time LQR problem.

Based on the LQR formulation and the Hamilton-Jacobi-Bellman (HJB) equation:
- Explicitly write out the functions $g(x, u)$, $f(x, u)$, and the terminal condition for the value function $V(x(T), T)$.
- Then, using these expressions, write the HJB equation specialized to the continuous-time, finite horizon, time-invariant LQR problem.

```{admonition} Solution
:class: dropdown


For the time-invariant continuous-time finite-horizon LQR problem,

$$

\dot{x} = Ax + Bu

$$

the functions in the HJB equation are:

$$

\begin{align*}
    g(x,u) = x^T Q x + u^T R u \\
    f(x,u) = Ax + Bu 
\end{align*}

$$

The terminal condition is:

$$

V(x(T),T) = x(T)^TQ_Tx(T)

$$

Therefore, the HJB equation becomes

$$

\frac{\partial V}{\partial t} + \min_{u}
(
x^TQx
+
u^TRu
+
\nabla V(x,t)^T(Ax+Bu)
)
= 0

$$

with terminal condition

$$

V(x,T)=x^TQ_Tx

$$
```


#### (b)

Next, suppose the value function is quadratic at all times; specifically, let $V(x(t), t) = x(t)^T P(t) x(t)$. Using this form, explicitly derive expressions for both the partial derivative of the value function with respect to time, $\frac{\partial V}{\partial t}$, and for the gradient of the value function with respect to the state, $\nabla V(x(t), t)$.


```{admonition} Solution
:class: dropdown

Assume the value function has the quadratic form:

$$

V(x(t),t) = x(t)^T P(t) x(t)

$$

where $P(t)$ is symmetric.

First, the partial derivative with respect to time is

$$

\frac{\partial V}{\partial t}
=
x(t)^T \dot{P}(t) x(t)

$$

In the HJB equation, $V(x,t)$ is treated as a function of two independent variables, $x$ and $t$.

Therefore, when computing the partial derivative with respect to $t$, the state $x$ is held fixed. The effect of the state changing with time is accounted for separately by the term $\nabla V(x,t)^T f(x,u)$ in the HJB equation.

Next, the gradient of $V$ with respect to the state is

$$


\nabla V(x(t),t)
=
\frac{\partial}{\partial x}
[
x(t)^T P(t) x(t)
]


$$

Using the quadratic derivative rule,

$$


\nabla V(x(t),t)
=
(P(t)+P(t)^T)x(t)

$$

Since $P(t)=P(t)^T$, this becomes

$$

\nabla V(x(t),t)
=
2P(t)x(t)


$$

```


#### (c) 
After substituting your expressions from (b) into the HJB equation, show that the optimal policy $\pi^\star(x,t)$ is given by:

$$
\pi^\star(x,t) = -R^{-1}B^TP(t)x(t)
$$


```{admonition} Solution
:class: dropdown

Substituting the results from part (b) into the HJB equation gives

$$

x^T\dot{P}x
+
\min_u
[
x^TQx
+
u^TRu
+
(2Px)^T(Ax+Bu)
]
=0

$$

To find the optimal control, differentiate with respect to $u$ and set the result equal to zero:

$$

\frac{\partial}{\partial u}
[
u^TRu + 2x^TPBu
]
=
2Ru + 2B^TPx
=0

$$

Solving for $u$,

$$
Ru = -B^TPx
$$

$$

u^\star = -R^{-1}B^TPx

$$

Therefore, the optimal policy is

$$

\pi^\star(x,t)
=
-R^{-1}B^TP(t)x(t)

$$

```


#### (d) 
With the optimal policy $\pi^\star(x,t)$, show that the HJB equation reduces to

$$
0 = x(t)^T(\dot{P} + Q + 2P(t)A - P(t)BR^{-1}B^TP(t))x(t)
$$

Will the matrix in the above quadratic form be always symmetric? 


```{admonition} Solution
:class: dropdown

Substitute the optimal policy

$$

u^\star = -R^{-1}B^TPx

$$

into the HJB equation:

$$

x^T\dot{P}x
+
x^TQx
+
(u^\star)^TRu^\star
+
(2Px)^T(Ax+Bu^\star)
=0

$$

Now compute each term involving $u^\star$.

First,

$$

(u^\star)^TRu^\star
=
x^T PBR^{-1}B^T Px

$$

Next,

$$

(2Px)^TBu^\star
=
2x^T PB(-R^{-1}B^T Px)
=
-2x^T PBR^{-1}B^T Px

$$

Therefore,


$$ 

(u^\star)^TRu^\star
+
(2Px)^T Bu^\star
=
-x^T PBR^{-1}B^T Px

$$

Also,

$$

(2Px)^T Ax
=
2x^T PAx

$$

Thus, the HJB equation becomes

$$

0
=
x^T\dot{P}x
+
x^T Q x
+
2x^T P Ax
-
x^T PBR^{-1}B^T Px

$$

or equivalently,

$$

0
=
x^T
(
\dot{P}
+
Q
+
2PA
-
PBR^{-1}B^TP
)
x

$$


Let's check whether $\dot{P}+Q+2PA-PBR^{-1}B^TP$ is symmetric by checking it transpose:

$$

(\dot{P}+Q+2PA-PBR^{-1}B^TP)^T

$$

Since $P(t)$ is symmetric for all t, taking the derivative preserves symmetry. So $\dot{P}$ is symmetric.

In LQR, Q is symmetric.

Next, 

$$

(PBR^{-1}B^TP)^T = P^TB(R^{-1})^TB^TP^T

$$

then since $P^T = P$ and $R = R^T$, we have $(R^{-1})^T = R^{-1}$, so

$$

(PBR^{-1}B^TP)^T = PBR^{-1}B^TP

$$

is symmetric. 

Only $PA$ is not necessarily symmetric. Therefore, $\dot{P}+Q+2PA-PBR^{-1}B^TP$ is not necessarily symmetric. 

```


#### (e)

Recall that any matrix $M$ can be decomposed in terms of its symmetric and skew-symmetric components:

$$
M = \underbrace{\frac{M + M^T}{2}}_{\text{symmetric}} + \underbrace{\frac{M-M^T}{2}}_{\text{skew-symmetric}}
$$

Prove that for any vector $x$ of appropriate size, that $x^TMx = x^T (\frac{M + M^T}{2})x$. That is, $x^T (\frac{M-M^T}{2})x = 0$.


```{admonition} Solution
:class: dropdown

Let

$$

M_s = \frac{M+M^T}{2},
\qquad
M_k = \frac{M-M^T}{2}

$$

where $M_s$ is symmetric and $M_k$ is skew-symmetric. Then

$$

M = M_s + M_k

$$

and therefore

$$

x^TMx
=
x^T(M_s+M_k)x
=
x^TM_sx + x^TM_kx

$$

Now consider the skew-symmetric term

$$

x^TM_kx

$$

Since this is a scalar, it is equal to its transpose:

$$

x^TM_kx
=
(x^TM_kx)^T

$$

Using the transpose rule,

$$

(x^TM_kx)^T
=
x^TM_k^Tx

$$

Because $M_k$ is skew-symmetric,

$$

M_k^T = -M_k

$$

Thus,

$$

x^TM_kx
=
x^T(-M_k)x
=
-x^TM_kx

$$

Therefore,

$$

2x^TM_kx = 0

$$

so

$$

x^TM_kx = 0

$$

Hence,

$$

x^TMx
=
x^TM_sx

$$

or equivalently,

$$

x^TMx
=
x^T (\frac{M+M^T}{2})x

$$

and,

$$

x^T(\frac{M-M^T}{2})x = 0

$$

```


#### (f)
Given the results from (e), finally show that the value function for the continuous-time LQR problem is the solution to the following ODE:

$$
-\dot{P}(t) = Q + P(t)A + A^TP(t) -P(t)BR^{-1}B^TP(t)
$$

With boundary condition $P(T) = Q_T$. 
This is the Riccati differential equation.

```{admonition} Solution
:class: dropdown

From part (d), after substituting the optimal policy into the HJB equation, we obtained

$$
0 =
x^T
(
\dot{P}
+
Q
+
2PA
-
PBR^{-1}B^TP
)
x

$$

However, the matrix inside the quadratic form is not necessarily symmetric because
$PA$ is not necessarily symmetric. From part (e), for any matrix $M$,

$$

x^TMx
=
x^T(\frac{M+M^T}{2})x

$$

Therefore, the term $2PA$ in the quadratic form can be replaced by its
symmetric part:

$$

x^T(2PA)x
=
x^T(\frac{2PA+(2PA)^T}{2})x

$$

Since

$$

(2PA)^T = 2A^TP

$$

we have

$$

x^T(2PA)x
=
x^T(PA+A^TP)x

$$

Thus, the HJB equation becomes

$$

0 =
x^T
(
\dot{P}
+
Q
+
PA
+
A^TP
-
PBR^{-1}B^TP
)
x

$$

Since the matrix inside the quadratic form is symmetric, and the equality holds for all \(x\), the matrix must be zero:

$$

\dot{P}
+
Q
+
PA
+
A^TP
-
PBR^{-1}B^TP
=
0

$$

Rearranging gives the Riccati differential equation:

$$

-\dot{P}(t)
=
Q
+
P(t)A
+
A^TP(t)
-
P(t)BR^{-1}B^TP(t)

$$

The terminal condition comes from the terminal cost:

$$

V(x(T),T)=x(T)^TQ_Tx(T)

$$

Since

$$

V(x(T),T)=x(T)^TP(T)x(T)

$$

we must have

$$

P(T)=Q_T

$$

Therefore, the value function is determined by solving

$$

-\dot{P}(t)
=
Q
+
P(t)A
+
A^TP(t)
-
P(t)BR^{-1}B^TP(t),
\qquad
P(T)=Q_T

$$


```


#### (g)
Consider the infinite-horizon case. How does the above ODE change? The resulting equation is referred to as the *continuous algebraic Riccati equation (CARE)*.

```{admonition} Solution
:class: dropdown

For the infinite-horizon LQR problem, there is no finite terminal time and no terminal condition $P(T)=Q_T$. We look for a steady-state value function

$$

P(t) = P, \qquad V(x) = x^T P x

$$

where $P$ is constant in time. Therefore,

$$

\dot{P}(t)=0

$$

Starting from the Riccati differential equation,

$$

-\dot{P}(t)
=
Q
+
P(t)A
+
A^TP(t)
-
P(t)BR^{-1}B^TP(t)

$$

setting $\dot{P}(t)=0$ gives

$$

0
=
Q
+
PA
+
A^TP
-
PBR^{-1}B^TP

$$

This is the continuous algebraic Riccati equation (CARE):

$$

A^TP + PA - PBR^{-1}B^TP + Q = 0

$$

The corresponding infinite-horizon optimal policy is

$$

u^\star(t)
=
-R^{-1}B^T Px(t)

$$

```


### 2. Time-varying case
Now consider the time-varying (and finite horizon) case for both continuous-time and discrete-time systems. Clearly state the Riccati equations that govern the evolution of the matrix $P(t)$ and $P_k$ in each setting. Provide the forms of the time-varying Riccati differential equation for the continuous-time case, and the Riccati recursion for the discrete-time case.


```{admonition} Solution
:class: dropdown

The continuous-time finite-horizon time-varying Riccati differential equation is

$$
-\dot{P}(t)
=
Q(t)
+
A(t)^TP(t)
+
P(t)A(t)
-
P(t)B(t)R(t)^{-1}B(t)^TP(t),
$$

with terminal condition

$$
P(T)=Q_T.
$$

The corresponding optimal control law is

$$
u^\star(t)
=
-R(t)^{-1}B(t)^TP(t)x(t).
$$

For the discrete-time finite-horizon time-varying LQR problem, the Riccati recursion is

$$
P_k
=
Q_k
+
A_k^TP_{k+1}A_k
-
A_k^TP_{k+1}B_k
(R_k+B_k^TP_{k+1}B_k)^{-1}
B_k^TP_{k+1}A_k,
$$

with terminal condition

$$
P_N=Q_N.
$$

The corresponding optimal control law is

$$
u_k^\star=-K_kx_k,
$$

where

$$
K_k
=
(R_k+B_k^TP_{k+1}B_k)^{-1}
B_k^TP_{k+1}A_k.
$$

Both Riccati equations are solved backward in time from their terminal conditions.

```


### 3. LQR stability
Consider the continuous-time infinite-horizon LQR problem. Show that the closed-loop system $\dot{x} = (A - BK)x$, where $K = R^{-1} B^T P$ is the optimal LQR gain, is asymptotically stable. Specifically, prove that the optimal value function $V(x) = x^T P x$ serves as a valid Lyapunov function for the closed-loop system, thereby establishing asymptotic stability of the origin.

You may assume that under the LQR assumption, the solution to CARE $P$ is positive definite.


```{admonition} Solution
:class: dropdown


For the continuous-time infinite-horizon LQR problem, the optimal control is

$$

u^\star = -Kx

$$

where

$$

K = R^{-1}B^TP

$$

Thus, the closed-loop system is

$$

\dot{x} = (A-BK)x

$$

We want to show that

$$

V(x)=x^TPx

$$

is a Lyapunov function for the closed-loop system.

Since $P$ is positive definite, we have

$$

V(x)=x^TPx > 0, \qquad x\neq 0, \qquad V(0)=0
$$

Therefore, $V(x)$ is positive definite.
Now compute the time derivative of $V$ along the closed-loop trajectory:

$$
\dot{V}(x) = \dot{x}^TPx + x^TP\dot{x}
$$

Substituting

$$
\dot{x}=(A-BK)x
$$

gives

$$
\dot{V}(x) = x^T(A-BK)^TPx + x^TP(A-BK)x
$$

Therefore,

$$
\dot{V}(x) = x^T [ (A-BK)^TP + P(A-BK) ] x
$$

To establish asymptotic stability, we need to show that $\dot{V}(x)<0$. 

Since 

$$
\dot{V}(x)=x^T (A-BK)^T P + P(A-BK)x

$$ 

$\dot{V}(x)<0$ can be proved by showing 

$$
(A-BK)^T P+P(A-BK) \prec 0
$$

Expanding the matrix term,

$$
(A-BK)^TP + P(A-BK) = A^TP + PA - K^TB^TP - PBK 
$$


From the CARE in part (g),

$$
A^TP + PA - PBR^{-1}B^TP + Q = 0
$$

and use

$$
K = R^{-1}B^TP
$$

we have

$$
A^TP + PA - PBK + Q = 0, \quad \Rightarrow \quad A^TP + PA - PBK = -Q
$$


Substituting this into our $\dot{V}$ expression,

$$
A^TP + PA - K^TB^TP - PBK =  - K^TB^TP - Q 
$$


Breaking down $- K^TB^TP - Q $ further,

$$
- K^TB^TP - Q  = - K^TRR^{-1}B^TP - Q =  - K^TRK - Q = - (K^TRK + Q)
$$


Since in LQR we assume $Q\succeq 0$ and $R\succ 0$, we also have $K^TRK \succ 0$. This means 

$$
- (K^TRK + Q) \prec 0
$$

As such, this means that we have
$$
\dot{V}(x) = x^T [(A-BK)^TP + P(A-BK)]x = - x^T[(K^TRK + Q)]x < 0 
$$

Hence, the origin of the closed-loop system is asymptotically stable, and $V(x)=x^TPx$ is a valid Lyapunov function.

```


### 4. Discounted LQR
Consider the discrete-time LQR problem but with a discount term on the stage cost: $\sum_{k=0}^{N-1} \gamma^k (x_k^TQ_kx_k + u_k^TR_ku_k)$. How does the optimal gain and resulting Riccati recursion equation differ? 

For the infinite horizon setting and without discount, to solve for $P_\infty$, we can use the `dare(A,B,Q,R)` function (in Matlab or Python) given matrices $A,B,Q,R$. Is it still possible to use the same function to solve for $P_\infty$ for the discounted setting? Provide a brief explanation.


```{admonition} Solution
:class: dropdown

For the finite horizon case, with the discount, it remains a discrete-time finite-horizon time-varying problem with $\tilde{Q}_k = \gamma^kQ_k$ and $\tilde{R}_k = \gamma^kR_k$

Now, consider the discounted discrete-time infinite-horizon LQR problem

$$

J
=
\sum_{k=0}^{\infty}
\gamma^k
(
x_k^TQx_k + u_k^TRu_k
),
\qquad
0<\gamma<1

$$


subject to


$$

x_{k+1}=Ax_k+Bu_k

$$


Assume the value function has the form


$$

V(x_k)=x_k^TPx_k

$$


The Bellman equation is


$$

V(x_k)
=
\min_{u_k}
[
x_k^TQx_k
+
u_k^TRu_k
+
\gamma V(x_{k+1})
]

$$


Substituting $x_{k+1}=Ax_k+Bu_k$,


$$

V(x_k)
=
\min_{u_k}
[
x_k^TQx_k
+
u_k^TRu_k
+
\gamma (Ax_k+Bu_k)^TP(Ax_k+Bu_k)
]

$$


Taking the derivative with respect to \(u_k\) and setting it equal to zero gives


$$

2Ru_k
+
2\gamma B^TP(Ax_k+Bu_k)
=
0

$$


Therefore,


$$

(R+\gamma B^TPB)u_k
=
-\gamma B^TPAx_k

$$


so the optimal control is


$$

u_k^\star=-Kx_k

$$


where


$$

K
=
(R+\gamma B^TPB)^{-1}
\gamma B^TPA

$$


Substituting the optimal control back into the Bellman equation gives the
discounted algebraic Riccati equation


$$

P
=
Q
+
\gamma A^TPA
-
\gamma^2 A^TPB
(R+\gamma B^TPB)^{-1}
B^TPA

$$


We can still use the standard DARE solver by
rewriting the discounted problem as an undiscounted LQR problem with scaled
dynamics:


$$

\tilde{A}=\sqrt{\gamma}A,
\qquad
\tilde{B}=\sqrt{\gamma}B

$$


Then the discounted Riccati equation becomes


$$

P
=
Q
+
\tilde{A}^TP\tilde{A}
-
\tilde{A}^TP\tilde{B}
(
R+\tilde{B}^TP\tilde{B}
)^{-1}
\tilde{B}^TP\tilde{A}

$$


Therefore, $P_\infty$ can be computed using


$$

P_\infty
=
\texttt{dare}(\sqrt{\gamma}A,\sqrt{\gamma}B,Q,R)

$$


After obtaining $P_\infty$, the discounted optimal gain is


$$

K
=
(R+\gamma B^TP_\infty B)^{-1}
\gamma B^TP_\infty A

$$

```


### 5. Trajectory tracking

Run the `tracking_LQR.ipynb` demo notebook. Please read the comments in the notebook to get a good sense of what the notebook is doing.

After reading and running the notebook, answer the following questions.


#### (a) 
Why does there exist some "steady state error" at the end of the trajectory?

```{admonition} Solution
:class: dropdown

There is no integral action. The LQR controller minimizes a quadratic cost function that balances tracking performance and control effort. Since control effort is penalized in the cost function, the optimal solution may allow a small tracking error if eliminating that error would require additional control input.

As a result, the controller does not necessarily drive the tracking error exactly to zero at the end of the trajectory. Instead, it finds the state and control trajectories that minimize the overall cost, which can lead to a small residual error.

```

#### (b) 
If we found ourselves running up against control limits, what could we change in (i) the tracking LQR formulation, or (ii) the computation of the nominal trajectory, to make this less likely to happen?


```{admonition} Solution
:class: dropdown

The controller may run into control limits because the nominal control plus the feedback correction exceeds the actuator bounds. In the tracking LQR formulation, this can be reduced by increasing the control cost $R$, which penalizes large feedback inputs and makes the controller less aggressive. We can also decrease the state-tracking weights in $Q$ if the controller is trying too hard to remove small errors.

In the nominal trajectory computation, we can design the nominal control sequence to stay away from the actuator limits. For example, we can impose tighter thrust constraints or add a margin around the actuator bounds. This gives the feedback controller additional authority to reject disturbances without saturating.

```

#### (c)
Even with closed-loop control, we see that the red "safety bubble" surrounding the quad intersects the obstacle over a short time interval. What could we do to avoid this?

```{admonition} Solution
:class: dropdown

Even with LQR feedback, the quadcopter does not follow the nominal trajectory exactly because of disturbances, modeling error, and control limits. Therefore, if the nominal trajectory passes too close to the obstacle, the safety bubble can still intersect the obstacle.

To avoid this, we should modify the nominal trajectory optimization to include a larger obstacle clearance margin. This can be done by inflating the obstacle radius or adding an additional safety buffer in the obstacle avoidance constraint. As a result, the nominal trajectory stays farther away from the obstacle, leaving room for tracking errors caused by disturbances, modeling inaccuracies, and actuator saturation.