Homework 3#
Linear Quadratic Regulator#
1. Continuous-Time LQR#
In this problem, you will derive the equations for the continuous-time Linear Quadratic Regulator (LQR). The steps will be similar to those covered for the discrete-time case in lecture; however, instead of using the Bellman equation, you will apply the Hamilton-Jacobi-Bellman (HJB) equation to the continuous-time setting.
The continuous-time LQR problem is define below.
The HJB equation is
where \(V\) is the value function.
(a)#
Let us begin with the time-invariant case for the continuous-time LQR problem.
Based on the LQR formulation and the Hamilton-Jacobi-Bellman (HJB) equation:
Explicitly write out the functions \(g(x, u)\), \(f(x, u)\), and the terminal condition for the value function \(V(x(T), T)\).
Then, using these expressions, write the HJB equation specialized to the continuous-time, finite horizon, time-invariant LQR problem.
(b)#
Next, suppose the value function is quadratic at all times; specifically, let \(V(x(t), t) = x(t)^T P(t) x(t)\). Using this form, explicitly derive expressions for both the partial derivative of the value function with respect to time, \(\frac{\partial V}{\partial t}\), and for the gradient of the value function with respect to the state, \(\nabla V(x(t), t)\).
(c)#
After substituting your expressions from (b) into the HJB equation, show that the optimal policy \(\pi^\star(x,t)\) is given by:
(d)#
With the optimal policy \(\pi^\star(x,t)\), show that the HJB equation reduces to
Will the matrix in the above quadratic form be always symmetric?
(e)#
Recall that any matrix \(M\) can be decomposed in terms of its symmetric and skew-symmetric components:
Prove that for any vector \(x\) of appropriate size, that \(x^TMx = x^T (\frac{M + M^T}{2})x\). That is, \(x^T (\frac{M-M^T}{2})x = 0\).
(f)#
Given the results from (e), finally show that the value function for the continuous-time LQR problem is the solution to the following ODE:
With boundary condition \(P(T) = Q_T\). This is the Riccati differential equation.
(g)#
Consider the infinite-horizon case. How does the above ODE change? The resulting equation is referred to as the continuous algebraic Riccati equation (CARE).
2. Time-varying case#
Now consider the time-varying (and finite horizon) case for both continuous-time and discrete-time systems. Clearly state the Riccati equations that govern the evolution of the matrix \(P(t)\) and \(P_k\) in each setting. Provide the forms of the time-varying Riccati differential equation for the continuous-time case, and the Riccati recursion for the discrete-time case.
3. LQR stability#
Consider the continuous-time infinite-horizon LQR problem. Show that the closed-loop system \(\dot{x} = (A - BK)x\), where \(K = R^{-1} B^T P\) is the optimal LQR gain, is asymptotically stable. Specifically, prove that the optimal value function \(V(x) = x^T P x\) serves as a valid Lyapunov function for the closed-loop system, thereby establishing asymptotic stability of the origin.
You may assume that under the LQR assumption, the solution to CARE \(P\) is positive definite.
4. Discounted LQR#
Consider the discrete-time LQR problem but with a discount term \(\gamma \in(0,1)\). How does the optimal gain and resulting Riccati recursion equation differ?
For the infinite horizon setting and without discount, to solve for \(P_\infty\), we can use the dare(A,B,Q,R) function (in Matlab or Python) given matrices \(A,B,Q,R\). Is it still possible to use the same function to solve for \(P_\infty\) for the discounted setting? Provide a brief explanation.
5. Trajectory tracking#
Run the tracking_LQR.ipynb demo notebook. Please read the comments in the notebook to get a good sense of what the notebook is doing.
After reading and running the notebook, answer the following questions.
(a)#
Why does there exist some “steady state error” at the end of the trajectory?
(b)#
If we found ourselves running up against control limits, what could we change in (i) the tracking LQR formulation, or (ii) the computation of the nominal trajectory, to make this less likely to happen?
(c)#
Even with closed-loop control, we see that the red “safety bubble” surrounding the quad intersects the obstacle over a short time interval. What could we do to avoid this?