Krotov’s Method

Optimization functional

Quantum optimal control methods formalize the problem of finding “control fields” that achieve some physical objective, using the time evolution of a quantum system, including possible constraints. The most direct example is a state-to-state transition, that is, for a known quantum state at time zero to evolve to a specific target state at final time \(T\), controlling, e.g. a chemical reaction [1]. Another example is the realization of quantum gates, the building blocks of a quantum computer. In this case, the states forming a computational basis must transform according to a unitary transformation [2]. The control fields might be the amplitudes of a laser pulse, for the control of a molecular system, RF fields for nuclear magnetic resonance, or microwave fields for superconducting circuits. There may be multiple independent controls involved in the dynamics, such as different color lasers used in the excitation of a Rydberg atom, or different polarization components of an electric field.

The quantum control methods build on a rich field of classical control theory [3][4]. This includes Krotov’s method [5], which was originally formulated to optimize the soft landing of a spacecraft from orbit to the surface of a planet [6], before being applied to quantum mechanical problems [7]. Fundamentally, they rely on the variational principle, that is, the minimization of a functional \(J[\{\ket{\phi_k^{(i)}(t)}\}, \{\epsilon_l^{(i)}(t)\}]\) that includes any required constraints via Lagrange multipliers. The condition for minimizing \(J\) is then \(\nabla_{\phi_k, \epsilon_l} J = 0\). In rare cases, the variational calculus can be solved in closed form, based on Pontryagin’s maximum principle [4]. Numerical methods are required in any other case. These start from an initial guess control (or set of guess controls, if there are multiple controls), and calculate an update to these controls that will decrease the value of the functional. The updated controls then become the guess for the next iteration of the algorithm, until the value of the functional is sufficiently small, or convergence is reached.

Mathematically, Krotov’s method, when applied to quantum systems [8][9], minimizes a functional of the most general form

(1)\[ J[\{\ket{\phi_k^{(i)}(t)}\}, \{\epsilon_l^{(i)}(t)\}] = J_T(\{\ket{\phi_k^{(i)}(T)}\}) + \sum_l \int_0^T g_a(\epsilon_l^{(i)}(t)) \dd t + \int_0^T g_b(\{\phi^{(i)}_k(t)\}) \dd t\,,\]

where the \(\{\ket{\phi_k^{(i)}(T)}\}\) are the time-evolved initial states \(\{\ket{\phi_k}\}\) under the (guess) controls \(\{\epsilon^{(i)}_l(t)\}\) of the \(i\)’th iteration. In the simplest case of a single state-to-state transition, the index \(k\) vanishes. For the example of a two-qubit quantum gate, \(\{\ket{\phi_k}\}\) would be the logical basis states \(\ket{00}\), \(\ket{01}\), \(\ket{10}\), and \(\ket{11}\). The sum over \(l\) vanishes if there is only a single control. For open system dynamics, the states \(\{\ket{\phi_k}\}\) may be density matrices.

The functional consists of three parts:

  • A final-time functional \(J_T\). This is the “main” part of the functional, and we can usually think of \(J\) as being an auxiliary functional in the optimization of \(J_T\).

  • A running cost on the control fields, \(g_a\). The most commonly used expression (and the only one currently supported by the krotov package) is [10]

    (2)\[g_a(\epsilon_l^{(i+1)}(t)) = \frac{\lambda_{a,l}}{S_l(t)} \Delta\epsilon_l^2(t)\,, \quad \Delta\epsilon_l(t) \equiv \epsilon_l^{(i+1)}(t) - \epsilon_l^{(i)}(t)\,.\]

    This introduces two parameters for each control, the (inverse) Krotov “step width” \(\lambda_{a,l}\) and the update-shape function \(S_l(t) \in [0, 1]\). \(\Delta\epsilon_l(t)\) is the update of the control in a single iteration of the optimization algorithm. As we will see below, \(\lambda_{a,l}\) determines the overall magnitude of \(\Delta\epsilon_l(t)\), and \(S_l(t)\) can be used to ensure boundary conditions on \(\epsilon^{(i+1)}_l(t)\).

  • An optional state-dependent running cost, \(g_b\), can be employed, e.g., for penalizing population in a subspace [11]. This is rarely used, as there are other methods to achieve the same effect, like using a non-Hermitian Hamiltonian to remove population from the forbidden subspace during the time evolution. Currently, the krotov package only supports \(g_b \equiv 0\).

The most commonly used final-time functionals (cf. krotov.functionals) optimize for a set of initial states \(\{\ket{\phi_k}\}\) to evolve to a set of target states \(\{\ket{\phi_k^\tgt}\}\). The functionals can then be expressed in terms of the complex overlaps of the target states with the final-time states under the given control. Thus,

(3)\[ \tau_k = \Braket{\phi_k^\tgt}{\phi_k(T)}\]

in Hilbert space, or

\[\tau_k = \langle\!\langle \Op{\rho}^{\tgt} \vert \Op{\rho}_k(T) \rangle\!\rangle \equiv \tr\left[\Op{\rho}_k^{\tgt\,\dagger} \Op{\rho}_k(T) \right]\]

in Liouville space.

The following functionals \(J_T\) can be formed from these complex overlaps, taking into account that any optimization functional \(J_T\) must be real. They differ by the way they treat the phases \(\varphi_k\) in the physical optimization goal \(\ket{\phi_k(T)} \overset{!}{=} e^{i\varphi_k}\ket{\phi_k^{\tgt}}\) [10]:

  • Optimize for simultaneous state-to-state transitions, with completely arbitrary phases \(\varphi_k\),

    (4)\[J_{T,\text{ss}} = 1- \frac{1}{N} \sum_{k=1}^{N} \Abs{\tau_k}^2\,,\]

    cf. J_T_ss().

  • Optimize for simultaneous state-to-state transitions, with an arbitrary global phase, i.e., \(\varphi_k = \varphi_{\text{global}}\) for all \(k\) with arbitrary \(\varphi_{\text{global}}\),

    (5)\[J_{T,\text{sm}} = 1- \frac{1}{N^2} \Abs{\sum_{k=1}^{N} \tau_k}^2 = 1- \frac{1}{N^2} \sum_{k=1}^{N} \sum_{k'=1}^{N} \tau_{k'}^* \tau_{k}\,,\]

    cf. J_T_sm().

  • Optimize for simultaneous state-to-state transitions, with a global phase of zero, i.e., \(\varphi_k = 0\) for all \(k\),

    (6)\[J_{T,\text{re}} = 1-\frac{1}{N} \Re \left[\, \sum_{k=1}^{N} \tau_k \,\right]\,,\]

    cf. J_T_re().

Update equation

Krotov’s method is based on a rigorous examination of the conditions for calculating the updated fields \(\epsilon_l^{(i+1)}(t)\) such that \(J(\{\ket{\phi_k^{(i+1)}(t)}\}, \{\epsilon_l^{(i+1)}\}) \leq J(\{\ket{\phi_k^{(i)}(t)}\}, \{\epsilon_l^{(i)}\})\) is true by construction [5][6][10][9][7]. It achieves this by adding a vanishing quantity to the functional that disentangles the implicit dependence of \(\{\ket{\phi_k}\}\) and \(\{\epsilon_l(t)\}\) in the variational calculus. Specifically, the derivation formulates an auxiliary functional \(L[\{\ket{\phi_k^{(i)}(t)}\}, \{\epsilon_l^{(i)}(t)\}, \Phi]\) that is equivalent to \(J[\{\ket{\phi_k^{(i)}(t)}\}, \{\epsilon_l^{(i)}(t)\}]\), but includes an arbitrary scalar potential \(\Phi\). The freedom in this scalar potential is then used to formulate a condition to ensure monotonic convergence,

(7)\[\begin{split}\left.\frac{\partial g_a}{\partial \epsilon}\right\vert_{\epsilon^{(i+1)}(t)} = 2 \Im \sum_{k=1}^{N} \Bigg\langle \chi_k^{(i)}(t) \Bigg\vert \Bigg( \left.\frac{\partial \Op{H}}{\partial \epsilon}\right\vert_{{\scriptsize \begin{matrix}\phi^{(i+1)}(t)\\\epsilon^{(i+1)}(t)\end{matrix}}} \Bigg) \Bigg\vert \phi_k^{(i+1)}(t) \Bigg\rangle\,.\end{split}\]

For \(g_a\) as in Eq. (2), this condition becomes the Krotov update equation [8][10][7],

(8)\[\begin{split}\Delta\epsilon(t) = \frac{S(t)}{\lambda_a} \Im \left[ \sum_{k=1}^{N} \Bigg\langle \chi_k^{(i)}(t) \Bigg\vert \Bigg( \left.\frac{\partial \Op{H}}{\partial \epsilon}\right\vert_{{\scriptsize \begin{matrix}\phi^{(i+1)}(t)\\\epsilon^{(i+1)}(t)\end{matrix}}} \Bigg) \Bigg\vert \phi_k^{(i+1)}(t) \Bigg\rangle \right]\,,\end{split}\]

with the equation of motion for the forward propagation of \(\ket{\phi_k}\) under the optimized controls \(\epsilon^{(i+1)}(t)\) of the iteration \((i)\),

(9)\[\frac{\partial}{\partial t} \Ket{\phi_k^{(i+1)}(t)} = -\frac{\mathrm{i}}{\hbar} \Op{H}^{(i+1)} \Ket{\phi_k^{(i+1)}(t)}\,.\]

For the moment, we have assumed unitary dynamics; the generalization to open system dynamics will be discussed later in this section. The co-states \(\ket{\chi_k^{(i)}(t)}\) are propagated backwards in time under the guess controls of iteration \((i)\), i.e., the optimized controls from the previous iteration, as

(10)\[\frac{\partial}{\partial t} \Ket{\chi_k^{(i)}(t)} = -\frac{\mathrm{i}}{\hbar} \Op{H}^{\dagger\,(i)} \Ket{\chi_k^{(i)}(t)} + \left.\frac{\partial g_b}{\partial \Bra{\phi_k}}\right\vert_{\phi^{(i)}(t)}\,,\]

with the boundary condition

(11)\[\Ket{\chi_k^{(i)}(T)} = - \left.\frac{\partial J_T}{\partial \Bra{\phi_k}} \right\vert_{\phi^{(i)}(T)}\,.\]

Here, and in the following, we have dropped the index \(l\) of the controls and the associated \(\lambda_{a,l}\) and \(S_l(t)\); all equations are valid for each individual control.

Frequently, the control field \(\epsilon(t)\) is required to be zero at \(t=0\) and \(t=T\) in order to smoothly switch on and off. To ensure that the update maintains this behavior, \(S(t) \in [0,1]\) is chosen as a function with those same conditions. A typical example is a flattop() function

\[\begin{split}S(t) = \begin{cases} B(t; t_0=0, t_1=2 t_{\text{on}}) & \text{for} \quad 0 < t < t_{\text{on}} \\ 1 & \text{for} \quad t_{\text{on}} \le t \le T - t_{\text{off}} \\ B(t; t_0=T-2 t_{\text{off}}, t_1=T) & \text{for} \quad T - t_{\text{on}} < t < T\,, \end{cases}\end{split}\]

with the blackman() shape \(B(t; t_0, t_1)\), which is similar to a Gaussian, but exactly zero at \(t = t_0, t_1\).

The scaling factor \(\lambda_a\) controls the overall magnitude of the pulse update, thereby taking the role of an (inverse) “step size”. Values that are too large will change \(\epsilon^{(i)}(t)\) by only a small amount in every iteration, causing slow convergence. Values that are too small will result in numerical instability, see Choice of λₐ.

The coupled equations (8)-(11) can be generalized to open system dynamics by replacing Hilbert space states with density matrices, \(\Op{H}\) with \(i \Liouville\), and brakets with Hilbert-Schmidt products, \(\langle \cdot \vert \cdot \rangle \rightarrow \langle\!\langle \cdot \vert \cdot \rangle\!\rangle\). In full generality, \(\Op{H}\) in Eq. (8) is the operator \(H\) on the right-hand side of whatever the equation of motion for the forward propagation of the states is, written in the form \(i \hbar \dot\phi = H \phi\), cf. Eq. (9), see krotov.mu. Note also that the backward propagation Eq. (10) uses the adjoint operator, which is relevant both for a dissipative Liouvillian [12][13][14] and a non-Hermitian Hamiltonian [15][16].

Optimization of non-linear problems or non-convex functionals

The condition (7) and the update Eq. (8) are based on a first-order expansion of the auxiliary potential \(\Phi\) with respect to the states. This first order is sufficient if the equation of motion is linear (\(\Op{H}\) does not depend on the states \(\ket{\phi_k(t)}\)), the functional \(J_T\) is convex (all the “standard” functionals for quantum control are convex), and no state-dependent constraints are used (\(g_b\equiv 0\)). When these conditions are not fulfilled, it is still possible to derive an optimization algorithm with monotonic convergence via a second term in Eq. (7) [6][9],

(12)\[\begin{split}\begin{split} \left.\frac{\partial g_a}{\partial \epsilon}\right\vert_{\epsilon^{(i+1)}(t)} & = 2 \Im \left[ \sum_{k=1}^{N} \Bigg\langle \chi_k^{(i)}(t) \Bigg\vert \Bigg( \left.\frac{\partial \Op{H}}{\partial \epsilon}\right\vert_{{\scriptsize \begin{matrix}\phi^{(i+1)}(t)\\\epsilon^{(i+1)}(t)\end{matrix}}} \Bigg) \Bigg\vert \phi_k^{(i+1)}(t) \Bigg\rangle \right. \\ & \qquad \left. + \frac{1}{2} \sigma(t) \Bigg\langle \Delta\phi_k(t) \Bigg\vert \Bigg( \left.\frac{\partial \Op{H}}{\partial \epsilon}\right\vert_{{\scriptsize \begin{matrix}\phi^{(i+1)}(t)\\\epsilon^{(i+1)}(t)\end{matrix}}} \Bigg) \Bigg\vert \phi_k^{(i+1)}(t) \Bigg\rangle \right]\,, \end{split}\end{split}\]

with

\[\ket{\Delta \phi_k(t)} \equiv \ket{\phi_k^{(i+1)}(t)} - \ket{\phi_k^{(i)}(t)}\,.\]

This second term is the “non-linear” or “second order” contribution. The corresponding update quation is, assuming Eq. (2),

(13)\[\begin{split}\begin{split} \Delta\epsilon(t) & = \frac{S(t)}{\lambda_a} \Im \left[ \sum_{k=1}^{N} \Bigg\langle \chi_k^{(i)}(t) \Bigg\vert \Bigg( \left.\frac{\partial \Op{H}}{\partial \epsilon}\right\vert_{{\scriptsize \begin{matrix}\phi^{(i+1)}(t)\\\epsilon^{(i+1)}(t)\end{matrix}}} \Bigg) \Bigg\vert \phi_k^{(i+1)}(t) \Bigg\rangle \right. \\ & \qquad \qquad \quad \left. + \frac{1}{2} \sigma(t) \Bigg\langle \Delta\phi_k(t) \Bigg\vert \Bigg( \left.\frac{\partial \Op{H}}{\partial \epsilon}\right\vert_{{\scriptsize \begin{matrix}\phi^{(i+1)}(t)\\\epsilon^{(i+1)}(t)\end{matrix}}} \Bigg) \Bigg\vert \phi_k^{(i+1)}(t) \Bigg\rangle \right]\,. \end{split}\end{split}\]

The prefactor \(\sigma(t)\) to the second order update is a scalar function that must be chosen appropriately to ensure monotonic convergence.

As shown in Ref. [9], it is possible to numerically approximate \(\sigma(t)\). In Refs [17][18], non-convex final-time functionals that depend higher than quadratically on the states are considered, for a standard equation of motion given by a linear Schrödinger equation. In this case,

\[\sigma(t) \equiv -\max\left(\varepsilon_A,2A+\varepsilon_A\right)\,, \label{eq:sigma_A}\]

where \(\varepsilon_A\) is a small non-negative number that can be used to enforce strict inequality in the second order optimality condition. The optimal value for \(A\) in each iteration can be approximated numerically as [9]

\[A = \frac{\sum_{k=1}^{N} 2 \Re\left[ \langle \chi_k(T) \vert \Delta\phi_k(T) \rangle \right] + \Delta J_T} {\sum_{k=1}^{N} \Abs{\Delta\phi_k(T)}^2} \,,\]

cf. krotov.second_order.numerical_estimate_A(), with

\[\Delta J_T \equiv J_T(\{\phi_k^{(i+1)}(T)\}) -J_T(\{\phi_k^{(i)}(T)\})\,.\]

See the Optimization towards a Perfect Entangler for an example.

Note

Even when the second order update equation is mathematically required to guarantee monotonic convergence, very often an optimization with the first-order update equation (8) will give converging results. Since the second order update requires more numerical resources (calculation and storage of the states \(\ket{\Delta\phi_k(t)}\)), you should always try the optimization with the first-order update equation first.

Time discretization

Sequential update scheme in Krotov’s method on a time grid.

Fig. 1 Sequential update scheme in Krotov’s method on a time grid.

The derivation of Krotov’s method assumes time-continuous control fields. Only in this case, monotonic convergence is mathematically guaranteed. However, for practical numerical applications, we have to consider controls on a discrete time grid with \(nt\) points running from \(t=0\) to \(t=T\), with a time step \(\dd t\). The states are defined on the points of the time grid, while the controls are assumed to be constant on the intervals of the time grid. See the notebook Time Discretization in Quantum Optimal Control for details. This discretization yields the numerical scheme shown in Fig. 1. It proceeds as follows [10]:

  1. Construct the states \(\ket{\chi^{(i)}_k(T)}\) according to Eq. (11). These typically depend on the states \(\{\ket{\phi^{(i)}_k(T)}\}\) forward-propagated under the optimized pulse from the previous iteration, that is, the guess pulse in the current iteration.

  2. Perform a backward-propagation using Eq. (10) as the equation of motion over the entire time grid. The resulting state at each point in the time grid must be stored in memory.

  3. Starting from the known initial states \(\ket{\phi_k} = \ket{\phi_k(t=0)}\), calculate the pulse update for the first time step according to Eq. (8), with \(t=\dd t/2\) on the left-hand side (representing the first interval in the time grid, on which the control pulse is defined), and \(t=0\) on the right-hand side (representing the first point on the time grid). This approximation of \(t \approx t + \dd t /2\) is what constitutes the “time discretization” mathematically. It resolves the seeming contradiction in the time-continuous Eq. (8) that the calculation of \(\epsilon^{(i+1)}(t)\) requires knowledge of the states \(\ket{\phi_k^{(i+1)}(t)}\) obtained from a propagation under \(\epsilon^{(i+1)}(t)\).

  4. Use the updated field \(\epsilon^{(i+1)}(\dd t/2)\) for the first interval to propagate \(\ket{\phi_k(t=0)}\) for a single time step to \(\ket{\phi_k^{(i+1)}(t=\dd t)}\), with Eq. (9) as the equation of motion. The updates then proceed sequentially, until the final forward-propagated state \(\ket{\phi^{(i+1)}_k(T)}\) is reached.

  5. The updated control field becomes the guess control for the next iteration of the algorithm, starting again at step 1. The optimization continues until the value of the functional \(J_T\) falls below some predefined threshold, or convergence is reached, i.e., \(\Delta J_T\) approaches zero so that no further significant improvement of \(J_T\) is to be expected.

For multiple objectives, the scheme can run in parallel, and each objective contributes a term to the update. Summation of these terms yields the sum in Eq. (8). See krotov.parallelization for details. For a second-order update, the forward propagated states from step 4, both for the current iteration and the previous iteration, must be stored in memory over the entire time grid.

Choice of λₐ

The monotonic convergence of Krotov’s method is only guaranteed in the continuous limit; a coarse time step must be compensated by larger values of the inverse step size \(\lambda_a\), slowing down convergence. Generally, choosing \(\lambda_a\) too small will lead to numerical instabilities and unphysical features in the optimized pulse. A lower limit for \(\lambda_a\) can be determined from the requirement that the change \(\Delta\epsilon(t)\) should be at most of the same order of magnitude as the guess pulse \(\epsilon^{(i)}(t)\) for that iteration. The Cauchy-Schwarz inequality applied to the update equation yields

\[\Norm{\Delta \epsilon(t)}_{\infty} \le \frac{\Norm{S(t)}}{\lambda_a} \sum_{k} \Norm{\ket{\chi_k (t)}}_{\infty} \Norm{\ket{\phi_k (t)}}_{\infty} \Norm{\frac{\partial \Op{H}}{\partial \epsilon}}_{\infty} \stackrel{!}{\le} \Norm{\epsilon^{(i)}(t)}_{\infty}\,,\]

where \(\norm{\partial \Op{H}/\partial \epsilon}_{\infty}\) denotes the supremum norm (with respect to time) of the operator norms of the operators \(\partial \Op{H}/\partial \epsilon\) obtained at time \(t\). Since \(S(t) \in [0,1]\) and \(\ket{\phi_k}\) is normalized, the condition for \(\lambda_a\) becomes

\[\lambda_a \ge \frac{1}{\Norm{\epsilon^{(i)}(t)}_{\infty}} \left[ \sum_{k} \Norm{\ket{\chi_k(t)}}_{\infty} \right] \Norm{\frac{\partial \Op{H}}{\partial \epsilon}}_{\infty}\,.\]

From a practical point of view, the best strategy is to start the optimization with a comparatively large value of \(\lambda_a\), and after a few iterations lower \(\lambda_a\) as far as possible without introducing numerical instabilities. The value of \(\lambda_a\) may be adjusted dynamically with respect to the rate of convergence. Generally, the optimal choice of \(\lambda_a\) requires some trial and error.

Rotating wave approximation

When using the rotating wave approximation (RWA), it is important to remember that the target states are usually defined in the lab frame, not in the rotating frame. This is relevant for the construction of \(\ket{\chi_k(T)}\). When doing a simple optimization, such as a state-to-state or a gate optimization, the easiest approach is to transform the target states to the rotating frame before calculating \(\ket{\chi_k(T)}\). This is both straightforward and numerically efficient.

Another solution would be to transform the result of the forward propagation \(\ket{\phi_k(T)}\) from the rotating frame to the lab frame, then constructing \(\ket{\chi_k(T)}\), and finally to transform \(\ket{\chi_k(T)}\) back to the rotating frame, before starting the backward propagation.

When the RWA is used the control fields are complex-valued. In this case the Krotov update equation is valid for both the real and the imaginary part independently. The most straightforward implementation of the method is for real controls only, requiring that any complex control Hamiltonian is rewritten as two independent control Hamiltonians, one for the real part and one for the imaginary part of the control field. For example,

\[\epsilon^*(t) \Op{a} + \epsilon(t) \Op{a}^\dagger = \epsilon_{\text{re}}(t) (\Op{a} + \Op{a}^\dagger) + \epsilon_{\text{im}}(t) (i \Op{a}^\dagger - i \Op{a})\]

with two independent control fields \(\epsilon_{\text{re}}(t)= \Re[\epsilon(t)]\) and \(\epsilon_{\text{im}}(t) = \Im[\epsilon(t)]\).

See the Optimization of a State-to-State Transfer in a Lambda System in the RWA for an example.

Optimization in Liouville space

The control equations have been written in the notation of Hilbert space. However, they are equally valid for a gate optimization in Liouville space, by replacing Hilbert space states with density matrices, \(\Op{H}\) with \(i \Liouville\) (cf. krotov.mu), and inner products with Hilbert-Schmidt products, \(\langle \cdot \vert \cdot \rangle \rightarrow \langle\!\langle \cdot \vert \cdot \rangle\!\rangle\), cf., e.g., Ref. [14].

See the Optimization of Dissipative Qubit Reset for an example.