Controllability , not chaos , key criterion for ocean state estimation

The Lagrange multiplier method for combining observations and models (i.e., the adjoint method or “4D-VAR”) has been avoided or approximated when the numerical model is highly nonlinear or chaotic. This approach has been adopted primarily due to difficulties in the initialization of low-dimensional chaotic models, where the search for optimal initial conditions by gradient descent algorithms is hampered by multiple local minima. Although initialization is an important task for numeri5 cal weather prediction, ocean state estimation usually demands an additional task – solution of the time-dependent surface boundary conditions that result from atmosphere-ocean interaction. Here, we apply the Lagrange multiplier method to an analogous boundary control problem, tracking the trajectory of the forced chaotic pendulum. Contrary to previous assertions, it is demonstrated that the Lagrange multiplier method can track multiple chaotic transitions through time, so long as the boundary conditions render the system controllable. Thus, the nonlinear timescale poses no limit to the time interval for successful 10 Lagrange multiplier-based estimation. That the key criterion is controllability, not a pure measure of dynamical stability or chaos, illustrates the similarities between the Lagrange multiplier method and other state estimation methods. The results with the chaotic pendulum suggest that there is no fundamental obstacle to ocean state estimation with eddy-resolving, highlynonlinear models, especially when using an improved first-guess trajectory.


Introduction
The most complicated, and probably most realistic, numerical models of the ocean circulation are eddy-resolving ocean general circulation models (e.g., Arbic et al., 2010;Maltrud et al., 2010;Griffies et al., 2015).Such models are a natural choice in ocean state estimation, the combination of models and observations to reconstruct our best estimate of what the ocean has actually done (e.g., Stammer et al., 2002a).Here, we restrict our focus to state estimation as the transient reconstruction of the ocean state over a finite time interval where observations have been collected, following the convention of Wunsch et al. (2009).In order to unambiguously diagnose physical mechanisms of interest, the ocean state must be dynamically consistent: a solution to the dynamical equations of motion without unphysical sources and sinks.The Lagrange multiplier method (e.g., Thacker and Long, 1988;Wunsch, 2010), sometimes called the adjoint method (e.g., Hall et al., 1982;Tziperman and Thacker, 1989), "4D-VAR" (e.g., Courtier et al., 1994;Ferron and Marotzke, 2003), or variational data assimilation (e.g., LeDimet and Talagrand, 1986;Bonekamp et al., 2001;Bennett, 2002), is a method that satisfies these criteria, unlike the Kalman filter (e.g., Fukumori and Malanotte-Rizzoli, 1995) or nudging techniques (e.g., Malanotte-Rizzoli and Tziperman, 1996).
For the Lagrange multiplier method to be successful in state-of-the-art ocean models, two major issues need to be addressed: (1) the high dimensionality of the forward model and estimation problem, and (2) the nonlinearity of ocean models at increasingly fine resolution.Research conducted by the ECCO (Estimating the Circulation and Climate of the Ocean) Consortium (Stammer et al., 2002b(Stammer et al., , 2004) ) has demonstrated that (1) the dimensionality of many million Published by Copernicus Publications on behalf of the European Geosciences Union & the American Geophysical Union.
state variables presents a challenge, but it can be overcome insofar as a solution can be found that fits the ocean data (e.g., Wunsch and Heimbach, 2007).One caveat is that the convergence of the optimization process may be slower than hoped, but this is primarily an issue of computational efficiency.Regarding nonlinearity (2), the adjoint model has the same stability characteristics as the forward model, as the eigenvalues of linearized state transition matrix are the same as the transpose of the matrix (Palmer, 1996).Therefore, nonlinearity in the forward model may be accompanied by an unstable adjoint model and Lagrange multipliers that grow exponentially with time.When the Lagrange multiplier method is used to enforce a nonlinear constraint such as a chaotic model, the search for a solution becomes iterative and the Lagrange multipliers provide gradient information that is used to minimize an objective function that describes the model fit to observations (e.g., Marotzke et al., 1999).For a bounded objective function with growing gradients, multiple local minima are present that complicate the search for a global minimum (e.g., McShane, 1989).Even sophisticated gradient-descent algorithms such as the variablestorage quasi-Newton method (Nocedal, 1980;Gilbert and Lemaréchal, 1989) can become stalled in a local minimum and are not guaranteed to fit the observations adequately.For example, Lea et al. (2000) used the Lorenz (1963) model to conclude that the "adjoint does not tend to useful sensitivity values", echoing previous concerns with simple, chaotic models (e.g., Gauthier, 1992;Miller et al., 1994a;Tanguay et al., 1995).
Due in part to the concerns raised about nonlinearity in simple models, the method of Lagrange multipliers has rarely been applied to realistic models over time windows longer than the eddy scale.For example, some studies restricted the time windows to be short enough that unstable modes would not grow too large (e.g., Schröter et al., 1993;Cong et al., 1998).The Southern Ocean State Estimate was produced with an approximate version of the method of Lagrange multipliers, where the Lagrange multipliers are calculated by an adjoint model with artificially large diffusivities that stabilize the model (Mazloff et al., 2010).Such an approach is not guaranteed to work, as the Lagrange multipliers of the stabilized model have no simple relation with those of the original eddy-resolving model.The iterative search technique could then be led in the opposite direction as the truth, as was shown to occur in a quasi-geostrophic ocean model (Köhl and Willebrand, 2002).We are aware of only one case where the unmodified method of Lagrange multipliers was applied to an eddy-permitting ocean GCM over a timescale longer than the eddy scale of a few months (Gebbie et al., 2006).Contrary to expectation given by the simple chaotic models, an acceptable fit was found to oceanographic observations over a 1-year interval in the northeast Atlantic Ocean (Gebbie, 2007).No clear explanation for these disparate results has been put forward.
In this research, we wish to re-examine (2) the influence of nonlinear models on the method of Lagrange multipliers and ocean state estimation.Is the adjoint method useless with a highly nonlinear or chaotic system, as studies with low-dimensional chaotic models suggest?Here we posit that the initialization problem that has informed much of the current thinking about the Lagrange multiplier method is not the relevant analogy for ocean state estimation.As has been documented in textbooks (e.g., Bennett, 1992;Wunsch, 1996), the ocean state estimation problem is better described as a time-variable boundary value problem because synoptic atmospheric variability acts as an external forcing on the ocean.Given our relatively uncertain knowledge regarding air-sea fluxes, the ocean state estimation is rightfully considered a time-variable boundary value problem where both the initial conditions and boundary conditions must be found.For example, Bennett (2002) described an estimation method for the external forcing, initial and boundary conditions that solves the Euler-Lagrange equations for a linear model.In the typical implementation of ocean state estimation with a general circulation model (e.g., Köhl and Stammer, 2008), the surface forcing is defined to be part of the control vector.Because the effect of nonlinearity is seen as the major roadblock for application of the Lagrange multipler method, we isolate this effect by choosing a model that is highly nonlinear but low-dimensional: the forced, chaotic pendulum (Sect.2).Toy models are worth revisiting because the dynamics is comparatively simple to understand, the nonlinear coupling to periodic forcing has been shown to be important in atmosphere-ocean dynamics (e.g., Tziperman et al., 1994), and these models have strongly influenced when the Lagrange multiplier method has been deployed to realistic ocean problems.We will show that previous toy models have sometimes been misinterpreted.
Rather than developing a new state-of-the-art data assimilation technique, we proceed by taking the existing Lagrange multipler method and developing diagnostics regarding when and why it succeeds or fails, as evaluated by the ability to fit observations.Relative to the initialization problem, the prospects for a successful state estimate are shown to be improved in the boundary control problem, even if one uses a highly nonlinear model such as the forced, chaotic pendulum.If the chaotic nature of the model is not a roadblock, what is the relevant criterion for success with the Lagrange multiplier method?Our results with the chaotic pendulum suggest that "controllability", defined as the ability to move from one arbitrary state to another by adjustments on the control variables (e.g., external forcing), is the relevant diagnostic.The control variables of the pendulum are analogous to adjustments of the atmospheric boundary forcing in an ocean model.Therefore, there is a wide variety of situations where the Lagrange multipiers of an ocean general circulation model (GCM) are useful, and that previous GCM results can be explained in this context.

Pendulum model and synthetic data
The fixed, single pendulum can be modeled as a nonlinear or linear set of equations, and it can also be easily modified to be stable or unstable.In many ways, the pendulum is a more flexible and easily interpreted physical system than the often-used Lorenz (1963) equations that approximate atmospheric convection.The relevance of the pendulum to the ocean is obviously indirect, but much of the community's knowledge of state estimation has been formed through the intuition of simple models.The motion of the forced pendulum is described by the deterministic equation (Baker and Gollub, 1990): where θ is the displacement angle from vertical, q is a damping coefficient, g is gravitational acceleration, l is the pendulum length, and f (t) is an external forcing term.Later, the external forcing will be broken into a first guess and a perturbation, f (t) = f 0 + δf (t), where the first guess is set to periodic forcing, f 0 (t) = b cos(ω d t).With parameters q = 100 s, g/ l = 1.0 s −2 , b = 1.5 rad s −2 , and ω d = 2/3 s −1 , the pendulum is chaotic (here defined as extreme sensitivity to initial conditions).Following the numerical implementation in Appendix A, the state vector is defined, where T is the vector transpose and the state variables are related by ω = dθ/dt.Matrices and vectors are indicated in boldface.The state has dimension M = 2 and the forcing vector has dimension 1.The evolution of the state is succinctly written as where the model state is stepped from time t to t + t, and L is the discretized, nonlinear operator that represents Eq. (1).In the ocean model case, the state would correspond to velocities and property fields, and the external forcing would include air-sea momentum, heat, and freshwater fluxes.
We consider an "identical twin" experiment where the true solution is known (solid line, Fig. 1), and we observe the pendulum angle episodically through time with normally distributed random errors of standard deviation, σ θ = 0.5 rad.In most oceanographically relevant cases, observations have already been collected over some fixed time interval (0 ≤ t ≤ T ).Here, observations, y(t), are taken at a set of N y evenly spaced times with an time interval of t y = T /(N y − 1).

Cost function
We proceed by defining a least-squares cost function to be minimized.The data-based contribution to the cost function, The rapid divergence of pendulum trajectories is indicated by the path density of trajectories (background shading), and the evolution of three sample trajectories: the "truth" or reference trajectory (solid line), a "first-guess" trajectory with incorrect initial angular velocity that diverges within 5 s (dashed line), and a firstguess trajectory with incorrect initial angle, θ , which diverges after 30 s (other dashed line).The path density of trajectories is computed with 10 000 forward integrations with normally distributed perturbations about the truth (standard deviation: σ ω = 1 rad s −1 , σ θ = 0.5 rad).
J d , measures the squared misfit between the model and observations: where the penalty is weighted by the number of observations, N y , and their standard error, σ θ , such that the expected value of J d is near 1.As we have imposed Gaussian error statistics, minimizing this least-squares cost function also leads to the maximum likelihood solution (e.g., Jazwinski, 1970).In matrix-vector notation, Eq. (3) becomes where y(t) is a scalar, E is the observational matrix that samples the observable part of the state and has dimension 1 × M, and W is a weight.Comparison of the first term in Eqs.(4) to (3) shows that W = N y σ 2 θ .While it is unconventional to transpose the scalar data-model misfit in Eq. (4), we retain this notation so that the equations are applicable to cases where multiple observations are available at each time.
A second contribution to the cost function includes two terms that constrain the difference between our posterior and prior estimates of the initial conditions and forcing, G. Gebbie and T.-L.Hsieh: Controllability, not chaos where x 0 (0) is the first-guess initial conditions, there are N t model time steps, f 0 (t) is the first-guess forcing, and S x and S f are weighted by 5 rad and 10 rad s −2 , respectively, to penalize deviations.Note that this cost function is also normalized by the number of observations, N y , to be consistent in the posterior tests later in this work.Here we seek values of x(t) and f (t) that minimize the sum, J = J d + J 0 , but the stationary point found by individually minimizing the values dJ /dx(t) and dJ /df (t) will almost certainly violate the model constraint in Eq. ( 2).We enforce the model constraint by appending a Lagrange multiplier term to the combined cost function, where µ(t) is a Lagrange multiplier, and the scaling with "2" is helpful in later derivations and does not change the numerical value of J because the quantity inside curly brackets vanishes.Now the cost function can be minimized by independently setting the partial derivatives of J with respect to the state, the forcing, and the Lagrange multipliers to zero.This problem will be solved using a gradient-descent method (detailed later) that is excellent at finding the nearest minimum.If the first guess is good, then the closest minimum may actually be the global minimum (e.g., Pires et al., 1996), and therefore we design an improved first guess next.

First-guess trajectory
Minimizing J requires a first-guess of the full model trajectory, x 0 (t).A sensible and common approach is to use the observation at initial time, y(0), to inform the initial conditions for the state, x 0 (0).Then, the first-guess forcing, f 0 (t) = b cos(ω d t), is used to drive the model forward in time.In this case, the state at any time, τ , can be computed directly from the initial state, where L k indicates the nonlinear model operator at time step k, K = τ/ t is the number of time steps between t = 0 and t = τ , and the state transition matrix, R(m, n), defines the aggregate, nonlinear model step to time m from n.In the following, we refer to this trajectory as the "standard" firstguess state.For a nonlinear system, and a chaotic system in particular, this first-guess trajectory usually diverges from the alreadycollected observations at some point, and thus can be ruled out as a possible solution a priori.When the pendulum initial conditions are imperfectly known, the range of possible pendulum trajectories expands greatly with time, even if the forcing evolution is perfectly known (background shading, Fig. 1).Normally distributed initial perturbations to the truth with standard deviation of 1 rad s −1 in angular velocity and 0.5 rad in the initial angle lead to a divergence of roughly 200 rad between extreme trajectories (background shading, Fig. 1).The angle is not renormalized when the angle is greater or less than π , and thus the angle records a history of how many times the pendulum has rotated.If no information about the initial angular velocity is available, a reasonable assumption is that ω = 0 with some large error, but the pendulum trajectory with this initial velocity and the correct initial angle diverges from truth in less than 5 s (first dashed line, Fig. 1).In the case where the initial velocity is known perfectly but the initial angle is observed with an initial error of 0.5 rad (second dashed line, Fig. 1), the trajectory follows truth for 30 s before eventually diverging.As the time interval of interest increases, any uncertainty in the initial conditions will ultimately lead to a divergence between truth and this first-guess model trajectory.While these sample model trajectories may seem overly naive, the first-guess trajectory used for ocean state estimation usually has similar characteristics: usage of an observation at the initial time, some prior knowledge of the forcing, and a freely running forward model.

An improved first guess
The aforementioned standard approach does not use the observational information already in hand that could inform the time evolution of the forcing.There are many methods that are available to update the forcing, such as the Kalman filter (e.g., Keppenne et al., 2005), but these methods rival or exceed the method of Lagrange multipliers in computational cost because of the explicit representation of the solution covariance matrix (Fukumori, 2002).One remedy is to solve the Kalman filter equation in a reduced space with the covariance represented by an ensemble rather than being explicitly represented.Instead, we design a whole-domain method that is computationally efficient and provides a good first guess for the boundary control problem.
Here we seek an update to the initial conditions and the forcing (i.e., ω 1 (0 , which takes the observations into account.For small perturbations, we derive a linearized equation for the change to the state at the time of the first observation, t = t y where x 1 is the "improved" first-guess, K = t y / t is the number of model time steps from t = 0 to t = t y , is constant in time, and is the error due to linearization.We define the column vector of perturbations in Eq. ( 8) to be the control vector, u, so that the equation becomes where C is the controllability (or reachability) matrix (e.g., Dahleh and Diaz-Bobillo, 1999;Wunsch, 2010).The observation, y( t y ), and the combination of Eqs. ( 7) and ( 9) provides one constraint: where the controllability matrix can be calculated given the trajectory, x 0 (t), and n is the misfit.Here we minimize the squared misfit, where Q is a block diagonal matrix with S x and S f on the diagonal.In the case of an underdetermined problem, we solve for u with the least-squares formula, but note the nonlinearity due to R( t y , 0).To handle this quantity, we update the state transition and controllability matrices iteratively, which is identical to the method of total inversion (Tarantola and Valette, 1982).In cases where it saves computations, we employ the overdetermined least-squares formula instead of Eq. ( 12).The full nonlinear model is run with the updated controls to produce the improved first-guess trajectory, x 1 (t), for the first segment (0 ≤ t ≤ t y ).The algorithm proceeds sequentially N y − 1 times, where the terminal state from one segment becomes the initial condition for the next.

Solution for Lagrange multipliers
We obtain the sensitivity of J to the initial conditions by taking the partial derivative, where the improved first guess, x 1 (t), is used.Taking the derivative with respect to the other set of unknowns, we find With knowledge of these gradients, we could improve the initial conditions and forcing, but both Eqs. ( 13) and ( 14) depend upon the Lagrange multipliers, µ(t), that we must solve for first.
Extending the partial derivative of J with respect to x(t) and µ(t) at all times, we recover the Euler-Lagrange (or "normal") equations.The Lagrange multipliers are determined by time stepping backward in time: where the last term on the right-hand side only appears if an observation, y(t), is available.Initial conditions are given by Equations ( 15) and ( 16) are collectively known as the adjoint model (e.g., Bugnion et al., 2006), where the modelobservation misfit is part of the adjoint model forcing, . Now the result of Eq. ( 15) can be substituted into Eqs.( 13) and ( 14) to solve for the gradients.
In summary, the method of Lagrange multipliers is implemented with the following steps.Starting with a guess for the initial conditions and forcing, x 0 (0) and f 0 (t), we improve the first guess and solve for the full trajectory, x 1 (t), where the method of total inversion attempts to account for nonlinearities by re-linearizing about each successive update (Tarantola and Valette, 1982).The adjoint model is then run backward in time to solve for the sensitivity of J with respect to the two types of unknowns, x(0) and f (t).Here we use a quasi-Newton gradient-descent optimization (Nocedal, 1980) to update these uncertain control parameters.Because the model is nonlinear, the tangent-linear model, A(t), will depend upon the nonlinear model trajectory, and we re-run the full nonlinear model to get an updated trajectory that will replace x 1 (t).Forward-adjoint model integrations are repeated until J has an acceptable value by a χ 2 statistical test.Here we consider the state estimation method successful if any solution that acceptably fits the data is found.We acknowledge that many of the cases presented here are underdetermined, and thus we expect those solutions to not be unique.We emphasize, however, that finding any solution would be a breakthrough, as this test has been difficult to satisfy with chaotic models and the Lagrange multiplier method.

Tracking chaotic transitions
Synthetic observations of the pendulum angle are generated every 2.5 s over a 50 s time interval, where a random error of 0.5 rad is added to every observation.We first illustrate the futility of a brute force search for the optimal initial conditions by re-running the forward model with combinations of the initial angle and angular velocity in the neighborhood of the truth.The data contribution to the cost function (Eq. 3) is then evaluated for each forward model trajectory, giving rise to a complex topology where the global minimum is not immediately visible (Fig. 2).That the topology of the cost function is not conducive for gradient descent search was previously documented in models of convection, quasi-geostrophic flow, and the oceanic double gyre model (e.g., Miller et al., 1994b;Köhl and Willebrand, 2003;Lea et al., 2006).When the initial angle and angular velocity are slightly perturbed from the truth, the cost-function values can become extremely large due to the divergence of trajectories.Furthermore, the cost function varies irregularly with many local extrema at locations other than the true solution.The basin of attraction of the true solution, defined in analogy to a drainage basin on a topographic map, is much smaller than the observational uncertainty, and thus, it is likely that the iterations of the adjoint method will converge to a local, non-global, minimum.
We start the application of the methods of Sect. 2 by determining the first-guess trajectory.Here we implement a model time step of 0.01 s over a 50 s integration time and thus the control vector has 5000 forcing variables and two initial condition variables.The first-guess initial conditions, forcing, and trajectory are calculated according to Sect.2.3.Despite the assumption of linearity in the calculation of the first guess, the improved first guess has a trajectory that is nearly consistent with the error bars of the observations (gray line, top panel, Fig. 3).The first guess also tracks the rapid transitions in the interval, 12 s < t < 20 s, where revolutions of the pendulum occur due to chaotic dynamics.These results contrast with a seemingly reasonable first-guess trajectory that is determined by a model simulation initialized with the first observation (θ (0) = y(0)) and zero angular velocity that goes off track in less than 5 s (see "standard first guess", top panel, Fig. 3).Similar first-guesses are common in ocean models, where the circulation field is started from rest with the assumption that geostrophic balance will equilibrate the velocity field rapidly.Our more sophisticated, but still linear, method of deriving an improved first guess makes the Lagrange multiplier method more likely to succeed.
The improved first-guess trajectory is a better fit to the data in large part due to the updated initial angular velocity (middle panel, Fig. 3).Starting with an angular velocity of about 2 rad s −1 , this trajectory has frequent changes in the sign of angular velocity consistent with reversals in pendulum rotation.Conversely, the standard trajectory has a long period of strictly positive angular velocity (10 s < t < 4 0 s) that is inconsistent with the observations.Another important factor is the position of the pendulum around 10 s after the start of the integration, when small differences in the state become greatly magnified.The true pendulum trajectory then enters a period where several revolutions occur successively.The inaccurate initial velocity causes errors at this critical time of instability and thus the trajectories diverge.
For θ and ω, the difference between the improved-firstguess and final (Lagrange multiplier method) trajectories is smaller than the changes brought about by the first-guess improvement itself.The method of Lagrange multipliers acts similarly to a combined filter-smoother that simultaneously takes into account past and future observations, leading to an angular velocity evolution with somewhat smaller range while still fitting the observations.Consequently, the evolution of the pendulum angle is also smoother, with fewer variations at the observational sampling frequency of 1/2.5 s.The full impact of the Lagrange multiplier method only becomes clear when considering the external forcing in the following section.

Reconstruction of the forcing
The improved first-guess trajectory better fits the observations than a standard first-guess, but there are tradeoffs in the estimated forcing (bottom panel, Fig. 3).In order to track the chaotic transitions of the pendulum, the improved firstguess trajectory makes forcing adjustments that are sometimes strong and abrupt in order to compensate for previous errors in the trajectory.In other words, these adjustments take the forcing evolution farther away from the truth that was used in the standard first-guess trajectory.This is reminiscent of the small-scale features that are added to the surface forcing of ocean models in order to fit observations (e.g., Stammer et al., 2002b), although our case is not a compensation for inaccurate model dynamics because we are operating under a perfect model assumption.This tradeoff is prob- ably unacceptable for those wishing to physically interpret the forcing field, and indicates that the improved first-guess estimate is not a good final solution despite fitting the observations.The power of the Lagrange multiplier method is now clear; not only is the final estimate smoother than the first guess, but also the final forcing estimate accurately reproduces the amplitude and frequency of the true forcing: f (t) = 1.5cos(2t/3).
Our identical twin experiment permits a comparison with the truth to diagnose actual errors even at times without observations.While the improved first guess appeared to fit the data well, the misfit to the truth displays considerable structure, including a large deviation around t = 45 s to values that are inconsistent with the observations (top panel, Fig. 4).Such a deviation reflects an inaccurate interpolation between data points during the construction of the improved first-guess trajectory.The first guess also appears to overfit the observations, as this estimate deviates from the truth in the neighborhood of observations with large error (e.g., t = 32.5, 42.5 s).The final estimate, on the other hand, hovers near 0 for the entire time interval, with a standard deviation of 0.46 rad, very near to the actual observational uncertainty of 0.50 rad.The final estimate reproduces 72 % of the variance in the observational error, computed by comparison of the estimated to true observational error.Visually, the final estimate is closer to the truth than the observations over the majority of the time interval, indicating that the Lagrange multiplier method filters out the observational noise even in this chaotic system.
For the angular velocity and forcing (middle and bottom panels, Fig. 4), the Lagrange multiplier method reproduces the truth despite the imperfect, sparse observations and the chaotic model dynamics.The suppression of the abrupt and large changes in forcing is not simply a smoothing or averaging of the forcing, but instead is seen to reflect the true forcing, as evidenced by the deviation from true forcing being small.Strictly speaking, a χ 2 posterior statistical test, discussed later in Sect.3.4, is needed to assess what is meant by "small".Under this condition, the method of Lagrange multipliers is superior to the first-guess estimate because all components of the solution, both the state and forcing, can be physically interpreted.

Influence of the data stream
In the case where only two observations are available (θ (0) = −2 ± 0.5, θ (50) = 10 ± 0.5), the time between observations is greater than both the fundamental period and the nonlinear timescale of the pendulum.The nonlinear timescale is defined in this work as the time interval that the tangent-linear model well approximates the nonlinear dynamics, which depends upon the size of initial perturbation.Despite the long time between observations, the estimated trajectory fits both observations via the Lagrange multiplier method (top left panel, Fig. 5).Thus, there appears to be no lower limit on the number of observations necessary in order to produce an acceptable state estimate with this model.Of the two steps in our method, it is the first-guess calculation that is responsible for fitting the data within their errors, and the optimization with Lagrange multipliers does not substantially change or improve this estimate.When comparing the estimate to the true trajectory that was withheld from the reconstruction method, differences larger than 10 rad exist (bottom left panel, Fig. 5).Thus, reconstruction of the full, partially unobserved trajectory without any intervening observations is a challenging task, as expected.Here, we emphasize that the first goal in state estimation is to find any model trajectory that fits the observations, and that in realistic cases we will not know whether the model interpolates between the observations in the correct way.We recognize that a chaotic model usually has many trajectories that satisfy the initial and final times, and thus, any one trajectory is unlikely to reconstruct the truth at all intervening times.
In the case that many (N y = 20) observations are taken but the standard error is large (5 rad), all observations are again fit within their 1σ error bars.Strictly speaking, the data appear to be overfit, as 32 % of the points are expected to reside outside the one standard error level but none do.This fit has a lower standard deviation, 3.3 rad, than that expected by the observational error of 5.0 rad, suggesting that the numerical model is adding information that is complementary to the observations.Only in short time intervals does the estimate differ from the truth by more than 5 rad, such as near the observation of the 2σ outlier at t = 25 s.Unlike the case where 20 observations were taken with a smaller standard error of 0.5 rad in the previous section, this estimate is only partially successful at filtering noise out of the observations.The correlation coefficient between the estimated and actual observational error is r = 0.43, indicating that some noise remains.Both case studies in this section indicate that neither the quality nor the quantity of the data stream affect whether the Lagrange multiplier method can be successful with a chaotic model.

Influence of the number of controls
The previous section addresses cases where the forcing is adjusted at every time step, leading to 5002 control variables.After application of the Lagrange multiplier method, the resulting value of the cost function (Eq. 3) depends upon both the number of control parameters and the number of observations (Fig. 6).The value of J is always below 1 when 5002 control variables are defined, consistent with the previously reported results.Here we test the null hypothesis that the model is consistent with the observations, and we perform a χ 2 statistical test with N y degrees of freedom as is appropriate for our cost function.Our one-sided test statistic is the value of J where 5 % of cases are expected to have larger values by chance.For 5002 controls, we find that all values are small enough that the null hypothesis cannot be rejected at the 5 % insignificance level (area above black line, Fig. 6).In this one-sided test, the Lagrange multiplier method is expected to acceptably fit the observations if enough controls are available.In the ocean state estimation problem, all air-sea fluxes are uncertain and temporally variable; therefore, a large number of controls can usually be defined.For some cases where only 5 or 10 observations are available, the cost function is small enough that overfitting may be occurring.In these cases, we find that the control perturbations necessary to fit the data are very small, and this impacts the size of the cost function through the J 0 term in Eq. ( 6).This effect has been documented in chaotic systems by the control engineering literature (e.g., Ott et al., 1990).
We investigate the effect of a decrease in the number of controls by redefining the external forcing control perturbation.For N u forcing controls, we define where (t) is a row vector that performs linear interpolation in time, and δf (t) is only defined at N u control times.This formulation enforces some temporal correlation in the external forcing.Alternatively, this could be accomplished using a non-diagonal weighting matrix, S f .In this case, the number of degrees of freedom is reduced relative to the total number of controls.
For a given number of observations, a decrease in the number of controls leads to a decrease in the likelihood of a successful fit to the data.The initialization problem is equivalent to the case with two control variables, and Fig. 6 suggests that the Lagrange multiplier method will not produce a good fit to data, as documented by previous works.The criterion of a good fit also depends upon the number of observations, where more observations decrease the likelihood of success.
To understand why the data can or cannot be fit, consider that each observation gives a constraint of the type documented in Eq. ( 10).If all of these constraints are enforced simultaneously, the problem is formally underdetermined when the number of controls exceeds the number of observations, and it is generally likely that a solution exists.The simple interpretation that the number of controls must exceed the number of observations does not strictly hold due to the logarithmic scale in Fig. 6.Even when the problem is formally underdetermined, a singular value decomposition analysis of the controllability matrix, C, reveals that not all controls are independent and that the data cannot always be fit perfectly.Here, we identify formally underdetermined cases with a J value that is unacceptably high (Fig. 6); thus, the solvability condition is sometimes violated.Such a result indicates that the controllability matrix has an effective rank less than the number of observations, showing the importance of this quantity as a diagnostic measure of the conditioning of the estimation problem.In practice, the singular values need not be strictly zero, as a large discrepancy between the magnitudes of singular values can give ill conditioning.
We also find cases where the gradient-descent method is capable of navigating the complex cost-function topology with Lagrange multiplier sensitivity information.A slice of J along ω(0) and θ (0) is focused on a region of phase space that appears to contain a local minimum, although this is not the true solution (left panel, Fig. 7).In the initial control problem, the optimization would proceed in these two dimensions and be trapped by the local minimum.Taking a two-dimensional (2-D) slice of the cost function in the dimension of the initial angular velocity and forcing, δf (0), however, the same location may no longer be a local minimum in the expanded phase space (right panel, Fig. 7).In our example, the cost function can be further minimized by decreasing δf (0), and the optimization process may eventually get out of the trap in ω(0)/θ(0) space.Thus, additional dimensions in the optimization space can sometimes alleviate problems with the gradient-descent algorithm.

Influence of prior forcing information
The previous examples in Sect. 3 proceed with prior information that the forcing is periodic with an accurate magnitude and phase.A good analogy is the regular forcing of solar insolation on the ocean surface.Here, we test the performance of the Lagrange multiplier method with inaccurate prior information about the forcing, as is a more realistic analogy to the uncertainty of air-sea fluxes.In particular, our first guess of the forcing, f 0 (t), is systematically biased by decreasing b from 1.5 to 0.75 rad s −2 .The trajectory driven by inaccurate forcing is no worse than the previous cases with accurate forcing due to the dominance of the chaotic dynamics of system (Fig. 8).Using the same observations as shown in Fig. 3, we find that the chaotic pendulum trajectory is tracked over multiple nonlinear timescales despite this more stringent test.In this case, however, the forcing estimate still contains errors relative to the true forcing calculated with b = 1.5 rad s −2 , and some high-frequency structures remain in f (t) (see "improved first guess" in bottom panel, Fig. 8).If instead the Lagrange multiplier method is started from the standard first guess, a smoother and more accurate estimate of the forcing is obtained at the expense of not fitting the data as well (see "final estimate" in bottom panel).Any remaining irregular structures can be handled by imposing temporal correlations as was done in Sect.3.4.If such measures are not taken, the investigator must take care to decide what elements of the forcing represent true variability and which are compensating for model error.In our simple system of equations, model errors and forcing errors are mathematically equivalent.In state estimates with eddy-resolving GCMs, however, smallscale forcing variability is found near oceanic fronts and the investigator must determine on a case-by-case basis to what extent it reflects real variability.

Relation to Kalman filter/smoother
Our suggestion that controllability is a key criterion brings our understanding of the Lagrange multiplier method into closer consistency with the Kalman filter/smoother (i.e., the combined usage of the Kalman filter and smoother).Both methods solve the same least-squares problem, and the solution of a linear problem should not depend upon the chosen method (e.g., Wunsch, 2010).Fukumori et al. (1993) found that the problem must be controllable for the Kalman filter/smoother to be successful.In addition, the chaotic Lorenz (1963) model was tracked with the Kalman filter/smoother over time windows much longer than the nonlinear timescale when the system was completely controllable (i.e., all estimated quantities are uncertain and are treated as control variables) (Evensen, 1997).Our results suggest that the equivalence of the Kalman filter/smoother and Lagrange multiplier method may be extended to nonlinear problems, thus explaining why the chaotic estimation problem may be solved by the Lagrange multiplier method.
To recover the true trajectory of a system, observability is also important, as the estimation problem is the dual of the control problem (Fukumori et al., 1993;Marchal, 2014).For the linear problem, Cohn and Dee (1988) showed that complete observability implies asymptotic stability of the Kalman filter/smoother.Defining observability and controllability conditions for nonlinear state estimation problems is difficult (Casti, 1985).In practice, the important criterion is ability to solve Eq. ( 10).Strictly speaking, the solution criteria will therefore depend upon both the controllability matrix, C, and the observational matrix, E, which combines the issues of observability and controllability.Here, we suggest the operational definition that a system is effectively controllable when the solution to Eq. ( 10), generalized to multiple observations, exists.
Related to the idea of observability, Wunsch (1996) stated that "problems owing to the multiple minima in the cost function can always be overcome by having enough observations to keep the estimates close to the true state."To evaluate this statement, we emphasize that there are two levels of successful reconstruction: (1) one that accurately fits the data, and (2) one that accurately fits the truth at all times and locations.Criterion (1) has been our metric for success in this work, as in real-world problems, criterion (2) cannot be tested.Here we have shown that only controllability is necessary for (1) even with a nonlinear system.In addition, we show that the data can still be fit even if very few observations are available, as an off-track estimate can be righted by precise adjustments to the forcing (recall Fig. 3).That short-lived forcing adjustments can put the estimate on track is likely a consequence of the nonlinear dynamics of our particular problem, although we believe that an eddy-resolving ocean general circulation model could behave the same way.Our interpretation is consistent with work in the control of chaotic systems.Engineers have described the control of a chaotic system as being "easier" than control of other systems because the necessary control adjustments are very small (e.g., Ott et al., 1990).

Comparison of controllability and stability metrics
In this section, we compare criteria for the success of the Lagrange multiplier method.Previously suggested criteria include Lyapunov exponents or other stability metrics of the www.nonlin-processes-geophys.net/24/351/2017/ Nonlin.Processes Geophys., 24, 351-366, 2017 tangent-linear model (e.g., Lea et al., 2000).The tangentlinear matrix has eigenvalues with absolute value greater than one when linearized about a state in the upper-half plane (π/2 < θ < 3π/2), reflecting the divergence of neighboring nonlinear trajectories when a pendulum perturbed towards the horizontal is more rapidly accelerated downwards.Conversely, the lower-half plane is locally linearly stable.The unforced pendulum with initial conditions in the upper-half plane is episodically unstable, until damping brings the pendulum permanently into a stable configuration in the lowerhalf plane.
Here we investigate the influence of stability versus that of nonlinearity.The pendulum is a useful system because it is easily modified to have four distinct dynamical states: (1) nonlinear, unstable, (2) nonlinear, stable, (3) linear, stable, and (4) linear, unstable.( 1) is the original dynamical equation for the pendulum (Eq.1).By restricting the phase space to the lower-half plane, the pendulum is locally linearly stable at all times, although it is still nonlinear (Case 2).When the pendulum is linearized by the smallangle approximation with the linear term of the Taylor series expansion (sin θ ≈ θ , see Appendix A2), we obtain a linear, stable model (Case 3).If instead the pendulum is linearized around its apex, the sign of θ in the linearized equation is reversed, rendering the system linear but unstable (Case 4).
We revisit the problem of estimating the initial angle when θ is observed at the final time.A synthetic observation is generated by running the model with initial displacement of −π/6 rad and zero velocity.Assuming a perfect model and observation, the shape of the cost function is generated by changing the initial conditions and evaluating J .In the two nonlinear cases, a slice of the cost function contains many local minima that emerge when the time window is extended from 5 to 50 s (Fig. 9, lower panels).The cost function in the nonlinear, stable case deviates from a parabola because the state transition matrix is non-self-adjoint and non-normal growth occurs (Farrell, 1989;Farrell and Ioannou, 1993).Thus, even nonlinear models that are stable are subject to local minima, and linear stability is not always a good metric to determine whether a gradient-descent search will successfully find the global minimum.
Conversely, the linear, unstable case does yield a parabolic cost function (Fig. 9, upper panels), implying that instability does not impede the search for the minimum.Again, local linear stability does not appear to be a good metric for determining the presence of local minima, because an unstable system may yield a well-behaved function.This example reinforces the counterintuitive relationship between stability and local minima, where a linearly stable system does not have a paraboloidal cost function but an unstable system does.While this reversed relationship does not always hold, linear stability metrics are not reliable.We suggest that controllability is a better metric, but note that controllability and stability are not unrelated, as a system with a growing unstable mode could lead to a controllability matrix that effectively drops rank.

Relevance to ocean state estimation
In the Introduction, we remarked on the only ocean state estimate known to the authors that successfully implemented the Lagrange multiplier method in an eddy-permitting ocean GCM without any modification to the adjoint model (Gebbie et al., 2006).In light of the results of this work, a combination of factors appears to have been responsible for that success.While the ocean model had 1/6 • × 1/6 • horizontal resolution and contained mesoscale eddies (Fig. 10), the resolution was not adequate to fully resolve the eddies.Also, the model domain of the northeast Atlantic Ocean was a relatively quiescent one.Both factors likely led to the ocean model being more linear than other studies.In addition, the adjoint model of a coarse-resolution twin was used to form an improved first guess, which would improve the likelihood of success with gradient descent much as our method did here.Perhaps most importantly, the ocean state estimate included air-sea control fields that were updated every 10 days, leading to a total of 5.5 × 10 6 control variables.Given the rapid adjustment of the ocean to barotropic waves, it is likely that the system passed the controllability criterion derived in this work.Controllability could be numerically evaluated in a GCM by a series of impulse functions: a dynamical equivalent to the passive response recorded by transit time distributions (e.g., Delhez et al., 1999;Haine and Hall, 2002).Open questions include whether the deep ocean is completely controllable by surface boundary conditions, and whether ocean data require variability at timescales shorter than 10 days to be introduced through the surface forcing.

Conclusions
Nonlinearity is not a fundamental obstacle to constraining a model to observations using the Lagrange multiplier method.
On the basis of research primarily with toy models, chaotic systems were thought to represent such an obstacle if the estimation time window was too long.Here we find that the trajectories of the nonlinear pendulum can be tracked over multiple rapid transitions that are due to chaotic dynamics.The Lagrange multiplier method is successful under the condition that enough boundary controls are available through time, and that the system passes a test of controllability.In the case of the pendulum, the rank of the controllability matrix is a better metric to predict a success of state estimation rather than a measure of dynamical stability.The ocean state estimation problem is analogous to the problem posed here; uncertain air-sea fluxes contain large errors that require control adjustments through time.
Our implementation of the Lagrange multiplier method includes a step to construct a good first guess that helps the iterative gradient-descent search.The first-guess method has been developed with implementation in an ocean GCM in mind.Specifically, sub-problems are defined over the interval between observations and thus require less memory than a whole-domain approach.In addition, we suggest that the particular first-guess method of this work is not the only way to produce a good first guess, and that other methods would bring the first-guess state close enough to the truth to increase the likelihood of success.A good example is the Green's function method (e.g., Stammer and Wunsch, 1996;Menemenlis et al., 2004) that selects a subset of the full control variables and makes some linearity assumptions.Following up the Green's function optimization with a gradientdescent search with the Lagrange multiplier method is therefore a worthwhile research goal.The results of this work suggest that ocean state estimation should continue with the Lagrange multiplier method and models that resolve higher and higher resolution physics.
Figure1.The rapid divergence of pendulum trajectories is indicated by the path density of trajectories (background shading), and the evolution of three sample trajectories: the "truth" or reference trajectory (solid line), a "first-guess" trajectory with incorrect initial angular velocity that diverges within 5 s (dashed line), and a firstguess trajectory with incorrect initial angle, θ , which diverges after 30 s (other dashed line).The path density of trajectories is computed with 10 000 forward integrations with normally distributed perturbations about the truth (standard deviation: σ ω = 1 rad s −1 , σ θ = 0.5 rad).

Figure 2 .
Figure 2. Cost-function values as a function of initial angle, θ , and angular velocity, ω, for an observational time window of 50 s.The local minimum found by the optimization (yellow dot) is not the absolute minimum value of the cost function.

Figure 3 .
Figure 3.Control of the chaotic pendulum with an improved-firstguess and the Lagrange multiplier method.(a) Observations are taken every 2.5 s with standard error of 0.5 rad (circles with 1σ error bars).The trajectory of the pendulum angle (θ(t), a), angular velocity (ω(t), b), and the forcing (f (t), c) are given for a standard first-guess (dashed line), the improved first-guess (gray line), and the final Lagrange multiplier-based estimate (black line).The standard first-guess forcing is not shown due to its similarity to the final estimate.

Figure 4 .
Figure 4. Comparison of the reconstructed pendulum and truth.(a) Difference between the observed (circles), standard first guess (dashed), improved first guess (solid gray line), and final estimate (solid, black line) of pendulum angle relative to the truth.The standard first-guess is off scale for much of the panel.(b) Similar but for angular velocity, ω.(c) Same but for forcing, f .The standard first-guess forcing is suppressed because it is identically zero.

Figure 5 .
Figure 5. State estimation of the chaotic pendulum with a reduced set of two observations with standard error 0.5 rad (a, c) and a set of 20 observations with standard error 5 rad.(a, b) Comparison of the observations (circles with 1σ error bars), the standard first-guess (dashed), the improved first-guess (gray solid line), and the final state estimate (solid black line), as in Fig. 3. (c, d) Similar to the top row, except all quantities are referenced to the truth, θ true , as in Fig. 4. Again the standard first guess is off scale for much of the time window.The improved first guess is nearly identical (and obscured) to the final estimate in all panels.

Figure 6 .
Figure6.Influence of the number of observations and controls on the ability to track the chaotic pendulum.The base-10 logarithm of the cost function, log 10 (J ), is calculated as a function of the number of evenly spaced observations over a 50 s window (the abscissa), and the number of effective degrees of freedom in the control perturbations to the forcing (ordinate).A χ 2 statistical test determines the limit where 95 % of realizations are expected to have smaller J values (thick, black line); therefore, cases below this threshold represent an unacceptable fit to the data at the 5 % insignificance level.

Figure 7 .
Figure 7. Escaping an apparent local minimum.(a) Cost-function values as a function of θ (0) and ω(0), in a region of phase space where a local minimum is present in this slice (yellow dot).The same cost function, but oriented along a slice with constant θ (0) = 0 in the dimensions of δf (0) and ω(0).The local minimum in the first two dimensions is no longer an extremum in the other two dimensions or the combined three-dimensional (3-D) space.This case used T = 10 s for illustration.

Figure 8 .
Figure 8.Control of the chaotic pendulum with an inaccurate first guess of the external forcing.(a) Observations are taken every 2.5 s with standard error of 0.5 rad (circles with 1σ error bars).The trajectory of the pendulum angle (θ(t), a), angular velocity (ω(t), b), and the forcing (f (t), c) are given for a standard first-guess (dashed line), the improved first-guess (gray line), and the final Lagrange multiplier-based estimate (black line).In the forcing panel, we also include the true forcing (dash-dot line).

Figure 9 .
Figure 9. Cost function with respect to the initial pendulum angle.A synthetic observation was made from a model run with initial angle, θ = −π/6.The time between the initial state and the cost-function evaluation is 0.5, 5, or 50 s.(a) Linear, stable pendulum.(b) Linear, unstable pendulum.(c) Nonlinear, stable pendulum.(d) Nonlinear, unstable pendulum.Notice the wider scale for θ in the lower, right panel.The pendulum's dynamical regimes are further explained in the text.Reproduced with permission from Gebbie (2004).

Figure 10 .
Figure10.Nested view of the 1/6 • regional state estimate ofGebbie et al. (2006) inside the 2 • state estimate ofStammer et al. (2002b).Potential temperature at 310 m depth, with a contour interval of 1 • C, is shown.The boundary between the two estimates (thick black line) is discontinuous in temperature because of the open-boundary control adjustments.Reproduced with permission fromGebbie (2004).