Homework 3 (Solutions)#
Linear Quadratic Regulator#
1. Continuous-Time LQR#
In this problem, you will derive the equations for the continuous-time Linear Quadratic Regulator (LQR). The steps will be similar to those covered for the discrete-time case in lecture; however, instead of using the Bellman equation, you will apply the Hamilton-Jacobi-Bellman (HJB) equation to the continuous-time setting.
The continuous-time LQR problem is define below.
The HJB equation is
where \(V\) is the value function.
(a)#
Let us begin with the time-invariant case for the continuous-time LQR problem.
Based on the LQR formulation and the Hamilton-Jacobi-Bellman (HJB) equation:
Explicitly write out the functions \(g(x, u)\), \(f(x, u)\), and the terminal condition for the value function \(V(x(T), T)\).
Then, using these expressions, write the HJB equation specialized to the continuous-time, finite horizon, time-invariant LQR problem.
Solution
For the time-invariant continuous-time finite-horizon LQR problem,
the functions in the HJB equation are:
The terminal condition is:
Therefore, the HJB equation becomes
with terminal condition
(b)#
Next, suppose the value function is quadratic at all times; specifically, let \(V(x(t), t) = x(t)^T P(t) x(t)\). Using this form, explicitly derive expressions for both the partial derivative of the value function with respect to time, \(\frac{\partial V}{\partial t}\), and for the gradient of the value function with respect to the state, \(\nabla V(x(t), t)\).
Solution
Assume the value function has the quadratic form:
where \(P(t)\) is symmetric.
First, the partial derivative with respect to time is
In the HJB equation, \(V(x,t)\) is treated as a function of two independent variables, \(x\) and \(t\).
Therefore, when computing the partial derivative with respect to \(t\), the state \(x\) is held fixed. The effect of the state changing with time is accounted for separately by the term \(\nabla V(x,t)^T f(x,u)\) in the HJB equation.
Next, the gradient of \(V\) with respect to the state is
Using the quadratic derivative rule,
Since \(P(t)=P(t)^T\), this becomes
(c)#
After substituting your expressions from (b) into the HJB equation, show that the optimal policy \(\pi^\star(x,t)\) is given by:
Solution
Substituting the results from part (b) into the HJB equation gives
To find the optimal control, differentiate with respect to \(u\) and set the result equal to zero:
Solving for \(u\),
Therefore, the optimal policy is
(d)#
With the optimal policy \(\pi^\star(x,t)\), show that the HJB equation reduces to
Will the matrix in the above quadratic form be always symmetric?
Solution
Substitute the optimal policy
into the HJB equation:
Now compute each term involving \(u^\star\).
First,
Next,
Therefore,
Also,
Thus, the HJB equation becomes
or equivalently,
Let’s check whether \(\dot{P}+Q+2PA-PBR^{-1}B^TP\) is symmetric by checking it transpose:
Since \(P(t)\) is symmetric for all t, taking the derivative preserves symmetry. So \(\dot{P}\) is symmetric.
In LQR, Q is symmetric.
Next,
then since \(P^T = P\) and \(R = R^T\), we have \((R^{-1})^T = R^{-1}\), so
is symmetric.
Only \(PA\) is not necessarily symmetric. Therefore, \(\dot{P}+Q+2PA-PBR^{-1}B^TP\) is not necessarily symmetric.
(e)#
Recall that any matrix \(M\) can be decomposed in terms of its symmetric and skew-symmetric components:
Prove that for any vector \(x\) of appropriate size, that \(x^TMx = x^T (\frac{M + M^T}{2})x\). That is, \(x^T (\frac{M-M^T}{2})x = 0\).
Solution
Let
where \(M_s\) is symmetric and \(M_k\) is skew-symmetric. Then
and therefore
Now consider the skew-symmetric term
Since this is a scalar, it is equal to its transpose:
Using the transpose rule,
Because \(M_k\) is skew-symmetric,
Thus,
Therefore,
so
Hence,
or equivalently,
and,
(f)#
Given the results from (e), finally show that the value function for the continuous-time LQR problem is the solution to the following ODE:
With boundary condition \(P(T) = Q_T\). This is the Riccati differential equation.
Solution
From part (d), after substituting the optimal policy into the HJB equation, we obtained
However, the matrix inside the quadratic form is not necessarily symmetric because \(PA\) is not necessarily symmetric. From part (e), for any matrix \(M\),
Therefore, the term \(2PA\) in the quadratic form can be replaced by its symmetric part:
Since
we have
Thus, the HJB equation becomes
Since the matrix inside the quadratic form is symmetric, and the equality holds for all (x), the matrix must be zero:
Rearranging gives the Riccati differential equation:
The terminal condition comes from the terminal cost:
Since
we must have
Therefore, the value function is determined by solving
(g)#
Consider the infinite-horizon case. How does the above ODE change? The resulting equation is referred to as the continuous algebraic Riccati equation (CARE).
Solution
For the infinite-horizon LQR problem, there is no finite terminal time and no terminal condition \(P(T)=Q_T\). We look for a steady-state value function
where \(P\) is constant in time. Therefore,
Starting from the Riccati differential equation,
setting \(\dot{P}(t)=0\) gives
This is the continuous algebraic Riccati equation (CARE):
The corresponding infinite-horizon optimal policy is
2. Time-varying case#
Now consider the time-varying (and finite horizon) case for both continuous-time and discrete-time systems. Clearly state the Riccati equations that govern the evolution of the matrix \(P(t)\) and \(P_k\) in each setting. Provide the forms of the time-varying Riccati differential equation for the continuous-time case, and the Riccati recursion for the discrete-time case.
Solution
The continuous-time finite-horizon time-varying Riccati differential equation is
with terminal condition
The corresponding optimal control law is
For the discrete-time finite-horizon time-varying LQR problem, the Riccati recursion is
with terminal condition
The corresponding optimal control law is
where
Both Riccati equations are solved backward in time from their terminal conditions.
3. LQR stability#
Consider the continuous-time infinite-horizon LQR problem. Show that the closed-loop system \(\dot{x} = (A - BK)x\), where \(K = R^{-1} B^T P\) is the optimal LQR gain, is asymptotically stable. Specifically, prove that the optimal value function \(V(x) = x^T P x\) serves as a valid Lyapunov function for the closed-loop system, thereby establishing asymptotic stability of the origin.
You may assume that under the LQR assumption, the solution to CARE \(P\) is positive definite.
Solution
For the continuous-time infinite-horizon LQR problem, the optimal control is
where
Thus, the closed-loop system is
We want to show that
is a Lyapunov function for the closed-loop system.
Since \(P\) is positive definite, we have
Therefore, \(V(x)\) is positive definite. Now compute the time derivative of \(V\) along the closed-loop trajectory:
Substituting
gives
Therefore,
To establish asymptotic stability, we need to show that \(\dot{V}(x)<0\).
Since
\(\dot{V}(x)<0\) can be proved by showing
Expanding the matrix term,
From the CARE in part (g),
and use
we have
Substituting this into our \(\dot{V}\) expression,
Breaking down \(- K^TB^TP - Q \) further,
Since in LQR we assume \(Q\succeq 0\) and \(R\succ 0\), we also have \(K^TRK \succ 0\). This means
As such, this means that we have $\( \dot{V}(x) = x^T [(A-BK)^TP + P(A-BK)]x = - x^T[(K^TRK + Q)]x < 0 \)$
Hence, the origin of the closed-loop system is asymptotically stable, and \(V(x)=x^TPx\) is a valid Lyapunov function.
4. Discounted LQR#
Consider the discrete-time LQR problem but with a discount term on the stage cost: \(\sum_{k=0}^{N-1} \gamma^k (x_k^TQ_kx_k + u_k^TR_ku_k)\). How does the optimal gain and resulting Riccati recursion equation differ?
For the infinite horizon setting and without discount, to solve for \(P_\infty\), we can use the dare(A,B,Q,R) function (in Matlab or Python) given matrices \(A,B,Q,R\). Is it still possible to use the same function to solve for \(P_\infty\) for the discounted setting? Provide a brief explanation.
Solution
For the finite horizon case, with the discount, it remains a discrete-time finite-horizon time-varying problem with \(\tilde{Q}_k = \gamma^kQ_k\) and \(\tilde{R}_k = \gamma^kR_k\)
Now, consider the discounted discrete-time infinite-horizon LQR problem
subject to
Assume the value function has the form
The Bellman equation is
Substituting \(x_{k+1}=Ax_k+Bu_k\),
Taking the derivative with respect to (u_k) and setting it equal to zero gives
Therefore,
so the optimal control is
where
Substituting the optimal control back into the Bellman equation gives the discounted algebraic Riccati equation
We can still use the standard DARE solver by rewriting the discounted problem as an undiscounted LQR problem with scaled dynamics:
Then the discounted Riccati equation becomes
Therefore, \(P_\infty\) can be computed using
After obtaining \(P_\infty\), the discounted optimal gain is
5. Trajectory tracking#
Run the tracking_LQR.ipynb demo notebook. Please read the comments in the notebook to get a good sense of what the notebook is doing.
After reading and running the notebook, answer the following questions.
(a)#
Why does there exist some “steady state error” at the end of the trajectory?
Solution
There is no integral action. The LQR controller minimizes a quadratic cost function that balances tracking performance and control effort. Since control effort is penalized in the cost function, the optimal solution may allow a small tracking error if eliminating that error would require additional control input.
As a result, the controller does not necessarily drive the tracking error exactly to zero at the end of the trajectory. Instead, it finds the state and control trajectories that minimize the overall cost, which can lead to a small residual error.
(b)#
If we found ourselves running up against control limits, what could we change in (i) the tracking LQR formulation, or (ii) the computation of the nominal trajectory, to make this less likely to happen?
Solution
The controller may run into control limits because the nominal control plus the feedback correction exceeds the actuator bounds. In the tracking LQR formulation, this can be reduced by increasing the control cost \(R\), which penalizes large feedback inputs and makes the controller less aggressive. We can also decrease the state-tracking weights in \(Q\) if the controller is trying too hard to remove small errors.
In the nominal trajectory computation, we can design the nominal control sequence to stay away from the actuator limits. For example, we can impose tighter thrust constraints or add a margin around the actuator bounds. This gives the feedback controller additional authority to reject disturbances without saturating.
(c)#
Even with closed-loop control, we see that the red “safety bubble” surrounding the quad intersects the obstacle over a short time interval. What could we do to avoid this?
Solution
Even with LQR feedback, the quadcopter does not follow the nominal trajectory exactly because of disturbances, modeling error, and control limits. Therefore, if the nominal trajectory passes too close to the obstacle, the safety bubble can still intersect the obstacle.
To avoid this, we should modify the nominal trajectory optimization to include a larger obstacle clearance margin. This can be done by inflating the obstacle radius or adding an additional safety buffer in the obstacle avoidance constraint. As a result, the nominal trajectory stays farther away from the obstacle, leaving room for tracking errors caused by disturbances, modeling inaccuracies, and actuator saturation.