31.2.1. Full-state feedback#
Optimal control can be recast as a constrained optimization problem, \(J\), where an extreme - optimum - of an objective function must be found, subject to constraints that include the equations of motion. Some constraints may be included into an augmented objective function \(\widetilde{J}\) with the methods of Lagrange multipliers.
Different models - governing equations
Generic ODE
\[\mathbf{M} \dot{\mathbf{x}} = \mathbf{f}(\mathbf{x}, \mathbf{u})\]Linear ODE for deterministic signals
\[\begin{split}\begin{cases} \dot{\mathbf{x}} = \mathbf{A} \mathbf{x} + \mathbf{B}_u \mathbf{u} + \mathbf{B}_d \mathbf{d} \\ \mathbf{y} = \mathbf{C} \mathbf{x} + \mathbf{D}_u \mathbf{u} + \mathbf{D}_d \mathbf{d} + \mathbf{D}_r \mathbf{r} \end{cases}\end{split}\]Linear SDE for stochastic signals. Exogenous inputs like disturbances \(\mathbf{d}\) and measuerement noise \(\mathbf{r}\) can be treated as stochastic processes. Thus, the equations are stochastic equations. As an example, if these noise are white noise they can be interpreted as the time derivative of Wiener processes, and the SDE can be written as
\[ d \mathbf{x} = \mathbf{A} \mathbf{x} \, dt + \mathbf{B}_u \mathbf{u} \, dt + \mathbf{B}_d d \mathbf{w}_d \ ,\]begin “\(d \mathbf{w}_d = \mathbf{d} \, dt\)”. While this may look like a useless trick, it helps us recallling that \(| d \mathbf{w}_d |^2 \sim dt\), when evaluating covariances.
Different approaches to the solution
Different approaches to the solution can be used, and help for a detailed comprehension of the topics.
Variational apporach to constrained optimization,
\[J(\mathbf{x},\mathbf{u}) = \int_{\tau=t}^{T} C(\mathbf{x}(\tau), \mathbf{u}(\tau)) d \tau + D(\mathbf{x}(T)) - \int_{\tau=0}^{T} \boldsymbol\lambda^T \left( \dot{\mathbf{x}} - \mathbf{f}(\mathbf{x},\mathbf{u}) \right) \ ,\]where the equations of motion are constraints inserted in the objective function with the method of Lagrange multipliers.
Hamilton-Bellman-Jacobi equation
\[V(\mathbf{x}_t, t; \mathbf{u}) = \int_{\tau=t}^{T} C(\mathbf{x}(\tau), \mathbf{u}(\tau)) d \tau + D(\mathbf{x}(T)) \ ,\]with the dynamics of the system subject to the governing equation \(\dot{\mathbf{x}} = \mathbf{f}(\mathbf{x},\mathbf{u})\), and the initial condition for the value function - the tail cost function - \(x(t) = x_t\). These constraints can be explicitly applied if an expression of the solution of the dynamical equation exists, or they can be added with the method of Lagrange multipliers.
…
A common choice of the running cost \(C(\mathbf{x},\mathbf{u})\) and the final cost \(D(\mathbf{x}(T))\) are
Finite time vs. Infinite time horizon.
…
31.2.1.1. Generic ODE without exogenous inputs#
with initial condition \(\mathbf{x}(0) = \mathbf{x}_0\).
The objective function combines (weights) the error on a desired performance and the control input, in order to get the desired behavior with feasible control (that can be provided by actuators, without saturation, avoiding unnecessary high power input and too sharp behavior,…)
As an example, if the goal of the control \(\mathbf{u}\) is to keep the system around \(\mathbf{x} = \mathbf{0}\), the cost function to be minimized can be designed as
with \(\mathbf{Q} \ge 0\), and \(\mathbf{R} > 0\) and symmetric.
Constrained optimization
So that
The equations can be recast after the definition of the Hamiltonian of the system \(H(\mathbf{x},\mathbf{u},\boldsymbol\lambda) = C(\mathbf{x},\mathbf{u}) + \boldsymbol\lambda^T \mathbf{f}(\mathbf{x},\mathbf{u})\) as
Gradient descent
Start from a control law \(\mathbf{u}^{(0)}(t)\),
the state equation is integrated forward in time to get \(\mathbf{x}^{(0)}(t)\),
Lagrange multiplier equation is integrated backward in time to get \(\boldsymbol\lambda^{(0)}(t)\),
The control law is updated with an increment \(\delta \mathbf{u}\) that’s proportional to the gradient of the cost function (with “opposite direction” to get \(\delta \widetilde{J} < 0\)), i.e. \(\mathbf{u}^{(1)}(t) = \mathbf{u}^{(0)}(t) + \delta \mathbf{u}^{(0)}(t)\) with \(\delta \mathbf{u}^{(0)}(t) = - c \nabla_{\mathbf{u}(t)} J^{(0)}\), with a positive step \(c > 0\) and \(\nabla_{\mathbf{u}} J = \mathbf{S}^T \mathbf{x} + \mathbf{R} \mathbf{u} + \partial_{\mathbf{u}} \mathbf{f}^T \boldsymbol\lambda\).
Repeat previous steps with the updated control law, until convergence.
HJB equation - Evaluation equation
Without dynamical equations as a constraint introduced with Lagrange multipliers
subject to the dynamical equations of motion. The value function is a function of two arguments, \(\mathbf{x}_t\), and \(t\), and the first argument is a function of \(t\). Thus the ordinary derivative w.r.t. time \(t\) reads
or, using the definition and the rule of derivative for integrals
Comparing the two expressions of the time derivative, Hamilton-Bellman-Jacobi equation for evaluating a given control \(\mathbf{u}\) follows
With dynamical equations as a constraint introduced with Lagrange multipliers
so that the variation (where’s the variation w.r.t. \(t\)? Integration extreme is not prescribed) reads
All the variations but \(\delta \mathbf{x}_t\) are identically zero if the equations (31.7) are satisfied, the variation w.r.t. \(\delta \mathbf{x}_t\) shows that the sensitivity to the initial state in the value function \(\mathbf{x}_t\) equals the value of the Lagrange multiplier \(\boldsymbol\lambda(t)\),
with \(\boldsymbol\lambda(t)\) part of the solution of the equations (31.7). Thus the Lagrange multiplier \(\boldsymbol\lambda(t)\) provides an information (first order, sensitivity) about the change in the value function \(V(\mathbf{x}_t, t)\) for small changes of the intial state \(\mathbf{x}_t\), i.e.
todo
useful to exploit adjoint info for sensitivity
HJB equation - Optimality equation
subject to the dynamical equations of motion. The value function is a function of two arguments, \(\mathbf{x}_t\), and \(t\), and the first argument is a function of \(t\). Thus the ordinary derivative w.r.t. time \(t\) reads
or, using the definition and the rule of derivative for integrals
Comparing the two expressions of the time derivative, Hamilton-Bellman-Jacobi equation for evaluating a given control \(\mathbf{u}\) follows
or
31.2.1.2. Linear system#
Let the dynamical equation of a system be
with measurment output \(\mathbf{y}\) and performance output \(\mathbf{z}\), and with running and final cost
The equations (31.7) become
The weights onf the control are definite positive, \(\widetilde{\mathbf{R}} > 0\), and thus invertible. The control can be written as a function of the state and the co-state as
Now the closed loop system reads
with initial condition \(\mathbf{x}(0) = \mathbf{x}_0\) and final condition \(\boldsymbol\lambda(T) = \mathbf{Q}_T \mathbf{x}(T)\). The solution can be written as
so that the state in \(T\) reads
so that
and thus
Value function and relation \(\ \boldsymbol\lambda = \mathbf{P} \mathbf{x}\)
Let the value function be
subject to the equations of motion as constraints \(\dot{\mathbf{x}}_\tau = \mathbf{A}_\tau \mathbf{x}_\tau + \mathbf{B}_\tau \mathbf{u}_\tau\), and the initial condition \(\mathbf{x}(t) = \mathbf{x}_t\). This constraint can be introduced either 1) expressing the state \(\mathbf{x}_\tau\) as a function of the initial state and the input
or 2) with the methods of Lagrange multipliers. A co-state \(\boldsymbol\lambda_t\) - corresponding to the Lagrange multiplier - can be evaluated as
Method 1.
and thus
From optimization, \(\mathbf{u}_\tau = - \widetilde{\mathbf{R}}^{-1}_\tau \left( \mathbf{B}_\tau^T \boldsymbol\lambda_\tau + \mathbf{S}^T_\tau \mathbf{x}_\tau \right)\),
Method 2.
Optimal control
HJB optimality equation
and thus, from the arbitrariety of \(\delta \mathbf{u}_\tau\), and since \(\widetilde{\mathbf{R}}\) is required to be invertible
…
If (todo why?) \(\boldsymbol\lambda = \mathbf{P} \mathbf{x}\),
Using HJB equation.
Proportional control. This should not be an assumption, but a result of the problem !!!
If \(\mathbf{u} = - \mathbf{G} \mathbf{x}\), the solution of the closed-loop system
reads \(\mathbf{x}(\tau) = \boldsymbol\Psi_c(\tau,t) \mathbf{x}(t)\). The value function becomes
Thus, the costate reads \(\boldsymbol\lambda = \nabla_{\mathbf{x}_t} V = \mathbf{P} \mathbf{x}\).
Matrix \(\mathbf{P}\) satisfies a Lyapunov equation,
as it can be easily found by direct computation of the time derivative of \(\mathbf{P}\).
Decoupled weights
Let \(\mathbf{S} = \mathbf{0}\), then \(\mathbf{u} = - \widetilde{\mathbf{R}}^{-1} \mathbf{B}^T \boldsymbol\lambda\), and thus the control is a linear combination of the co-state.
The transformation \(\mathbf{v} := \mathbf{u} + \widetilde{\mathbf{R}}^{-1} \mathbf{S}^T \mathbf{x}\), make a coupled objective function for the original system a decoupeld objective function for a modified system.
todo
Are \(\Psi\) matrices invertible?
Properties of the Hamilton matrix…
What’s a conjugate point?
…
Coupled state-input weights
Decoupled state-input weights
The problem can be decoupled with a transformation
so that
Extra-diagonal terms are identically zero if \(\mathbf{T} = - \mathbf{R}^{-1} \mathbf{S}^T\). The coordinate transformation becomes
and the running cost reads
The linear system
becomes
Decoupled system
Optimization gives
The modified optimal control reads
and thus