Introduction to optimization
Contents
Introduction to optimization#
In this chapter, we give a brief overview of mathematical optimization and highlight ways we can go about solving them.
At a high-level, optimization is the process of finding the best solution to a problem by minimizing (or maximizing) an objective function while satisfying constraints. Humans are constantly solving some form of optimization problem in everyday activities. For example, to get to class on time, you may want to minimize the distance traveled while avoiding high-traffic areas and avoid stairs. In aerospace engineering, you want to minimize the weight and cost of an aircraft design (e.g., wing shape, fuselage size, engine placement) while ensuring the aircraft is stable, exhibits desired aerodynamics properties, and can withstand the forces during operations. In the context of controls, you want to pick the best sequence of control inputs that minimized fuel usage over the time while reaching the goal, avoiding obstacles, and obeying the system dynamics.
While it may be simple to describe the optimization problem in words, ultimately we need to represent the problem mathematically and in such a way that can be tractably solved.
NOTE: We won’t be diving deep into optimization theory and solvers in this course. Instead, we will learn how to frame control synthesis problems as optimization problems and in such as way that it is tractable (i.e., convex) to solve using off-the-shelf solvers/packages (e.g., cvxpy
).
Unconstrained optimization#
Mathematical description#
First let’s consider an unconstrained optimization problem. An (unconstrained) optimization problem consists of two main elements: \(x\in\mathbb{R}^n\) is the decision variable, \(n\) number of variables whose values are to be determined, and \(f: \mathbb{R}^n \rightarrow \mathbb{R}\) an objective function that determines how desirable \(x\) is. Supposing that \(f\) describes the cost, or how expensive \(x\) is, then naturally we want to choose \(x\) that minimizes \(f\). As such, the mathematical optimization problem becomes:
where the first equation describes the minimum value of \(f\) over the variable \(x\), while the second equation is describing the value of the argument that minimizes \(f\). Typically the optimal decision variable, i.e., the minimizer is denoted by a superscript \(\star\).
Solving unconstrained optimization problems#
Depending on the properties of \(f\), finding the optimal solution can either be very straightforward, or very hard or expensive. If \(f\) is “nice” and differentiable, then we can simply apply gradient descent to find the stationary points. If \(f\) is not differentiable, or evaluating \(f\) is an expensive process (e.g., running expensive simulations such as computation fluid dynamics), then we would likely need to use derivative-free techniques and computing the optimal solution can be very difficult.
Gradient descent#
One of the most popular and simplest approach is gradient descent. If \(f\) is nice smooth differentiable function where it is possible, and hopefully cheap, to compute \(\nabla f(x)\), the gradient of \(f\), then we can simply update our value of \(x\) by moving moving in the direction og steepest descent. Mathematically, if \(\alpha\) is a step size, and \(x_k\) is the current guess for the optimal solution, then we can update our guess by applying the update rule,
Pause and think
What are some practical considerations when performing gradient descent?
Practical considerations
We need to be careful about how to pick \(\alpha\); if it is too small, then you may need to take many steps, but if it’s too big, you may struggle to converge to a (local) optimum. We can perform line search where you search in the along the direction to find a suitable step size. There are more advanced techniques that consider the momentum* that can adaptive change \(\alpha\) based on the geometry of \(f\). We won’t discuss further here, but Algorithms for Optimization touches on this and has some useful references.
Just because the objective is differentiable does not mean you will always find the globally optimal solution. The objective could have many local optima, meaning even if you converge to a value \(x_1\) where \(\nabla f(x_1) = 0\), there could be another value \(x_2\) where \(f(x_2) \leq f(x_1)\).
However, if \(f\) is convex, loosely speaking, “bowl-shaped”, then that implies that any minimum found is the global minimum. We won’t go into too much detail into convex optimization, but we will use the fact later on that quadratic and linear functions are convex.
But if your problem is convex, then generally (in most cases), this is tractable to solve, and there are many well-support tools and solvers for solving convex optimization problems. In this course, we will use cvxpy
.
Derivative free#
What if you can’t take gradients, or that you don’t want to because it’s very expensive or difficult to?! Well, there are many other methods that don’t rely on gradient information at all. A common approach is a sampling-based, or population-based approach where you sample your search space to get a sense of the objective landscape, and from that, you can resample or refine your search towards more promising areas. Popular methods include cross-entropy method, simulated annealing, and genetic algorithms.
Constrained optimization#
Suppose now that there are some values of \(x\) that are not allowed. For example, a negative value may be physically impossible or undesirable, such as negative weight. Or that some values of \(x\) must be a certain value. Or more generally, there is a function \(g:\mathbb{R}^n \rightarrow \mathbb{R}^{m_\leq}\) and another function \(h:\mathbb{R}^n \rightarrow \mathbb{R}^{m_=}\) such that we require \(g(x) \leq 0\) and \(h(x) = 0\). Here, \(m_\leq\) and \(m_=\) denotes the number of inequality and equality constraints respectively.
With these constraints, we can also consider a constrained optimization problem,
This becomes a hard problem to solve, and applying regular gradient descent is not going to work because you may descend towards a region where the constraints are violated.
Pause and think
What are some ways you could go about handling these constraints?
Handling constraints
There are a few ways to go about handling these constraints. Again, we won’t be going into depth here, but we are briefly mentioning these approaches in case you want to read more about them.
Projected gradient descent: Take a gradient descent step as normal, and then project the solution back to the closest feasible point in the feasible set. This projection may involve solving another optimization problem(!) but for certain geometries of the feasible set, the projection could be found in closed form.
Use the Lagrangian: A special way to convert a constrained optimization problem into an unconstrained one. But this is performed in a specific way where Lagrangian multipliers are introduced, and the neccesary conditions for optimality are given by the Karush–Kuhn–Tucker (KKT) conditions. There are duality connections between the lagrangian (dual problem) and the original problem (primal problem) where solving the dual problem gives you insight about the primal problem.
Treat constraints as (big) penalities in the objective: We can add the constraint functions as part of the objective function so that there is a high cost then the constraints are violated. This turns the constrained problem into an unconstrained one, but then you have to carefully tune the weightings on the constraints, and there is not guarantee that the constraints will be perfectly satisfied.
Log-barrier: A version of the approach above expect that a log function is applied on the constraint to so that the cost approaches infinity as \(x\) approaches the the infeasible region. See homework 1.
A reminder again, in this course, we will focus on framing control problems as optimization problems and ensure that the optimization problem is formulated such that we directly use off-the-shelf optimizers.
Optimization solvers#
There are many optimization solvers out there. Note there there are some packages that are modeling languages, that is, a nice intuitive wrapper for users to define an optimization problem in a natural way that follows the math, rather than in the restrictive standard form required by solvers. Then the user can select different backend solvers. While other packages are the actualy solvers and they come with their own interface.
Others are simply a function you can use as part of a toolbox.
In this class, we will mainly be using cvxpy
as it is designed to be very intuitive and simple to use. Though for your project, you are welcome to use whatever solver/language/function you wish.
Here’s a list of optimization solvers and modeling languages, including links and short descriptions for each.
🔹 Optimization solvers#
Free & open-source solvers#
CVXOPT – Convex optimization library in Python, supports linear and quadratic programming.
SciPy Optimize – General-purpose optimization module in SciPy (Python).
OSQP – Quadratic programming solver designed for fast embedded optimization.
GLPK (GNU Linear Programming Kit) – Solves large-scale linear programming (LP) and mixed-integer programming (MILP) problems.
Ipopt – Interior-point solver for nonlinear programming (NLP).
CBC (Coin-or Branch and Cut) – Open-source MILP solver.
Clp (Coin-or Linear Programming) – Open-source LP solver.
CasADi – Symbolic framework for nonlinear optimization and optimal control.
GEKKO – Python package for solving LP, NLP, and MILP problems.
Commercial solvers (often more Powerful, but some have free academic licenses)#
Gurobi – Industry-leading solver for LP, QP, and MILP. Free academic licenses available.
CPLEX (IBM) – Powerful solver for LP, MILP, and NLP. Academic licenses available.
MOSEK – Optimized for large-scale conic and linear problems.
Knitro – High-performance nonlinear optimization solver.
MATLAB Optimization Toolbox – Various solvers for LP, QP, and NLP problems.
🔹 Optimization modeling languages#
CVXPY – Python-based modeling framework for convex optimization.
PuLP – Python library for linear programming modeling.
Pyomo – Python-based modeling for LP, NLP, and MILP.
JuMP – High-performance optimization modeling in Julia.
AMPL – A powerful modeling language for mathematical programming.
GAMS – General Algebraic Modeling System for LP, NLP, and MILP.
YALMIP – Optimization modeling toolbox for MATLAB.
CVX (MATLAB) – Convex optimization modeling in MATLAB.
🔹 Optimization functions within toolboxes#
Here’s a list of optimization functions within toolboxes for popular programming environments like MATLAB, SciPy, and Julia.
🔹 Optimization functions in MATLAB (from Optimization Toolbox and other toolboxes)#
fmincon – Solves constrained nonlinear optimization problems.
linprog – Linear programming solver for minimizing linear objectives subject to linear constraints.
quadprog – Solves quadratic programming (QP) problems.
fminunc – Unconstrained nonlinear optimization.
lsqnonlin – Solves nonlinear least-squares problems.
ga – Genetic algorithm solver for global optimization.
particleswarm – Particle swarm optimization (PSO) for global optimization.
simulannealbnd – Simulated annealing for constrained or unconstrained problems.
🔹 Optimization functions in SciPy (Python) (from scipy.optimize
module)#
scipy.optimize.minimize – General function for unconstrained/constrained minimization with multiple algorithms (BFGS, Nelder-Mead, SLSQP, etc.).
scipy.optimize.linprog – Linear programming solver.
scipy.optimize.least_squares – Nonlinear least-squares solver.
scipy.optimize.curve_fit – Curve fitting using nonlinear least squares.
scipy.optimize.differential_evolution – Global optimization using evolutionary algorithms.
🔹 Optimization unctions in Julia (from JuMP
and Optim.jl
packages)#
JuMP.optimize!
– Defines and solves optimization models.Optim.optimize
– General-purpose unconstrained optimization.NLopt.optimize
– Interfaces nonlinear optimization methods in Julia.BlackBoxOptim.optimize
– Black-box optimization algorithms like genetic algorithms.