4DVAR by ensemble Kalman smoother

We propose to use the ensemble Kalman smoother (EnKS) as linear least squares solver in the Gauss-Newton method for the large nonlinear least squares in incremental 4DVAR. The ensemble approach is naturally parallel over the ensemble members and no tangent or adjoint operators are needed. Further, adding a regularization term results in replacing the Gauss-Newton method, which may diverge, by^M the Levenberg-Marquardt method, which is known to be convergent. The regularization is implemented efficiently as an additional observation in the EnKS.


Introduction
4DVAR is a dominant data assimilation method used in weather forecasting centers worldwide. 4DVAR attempts to reconcile model and data variationally, by solving a very large weighted nonlinear least squares problem. The unknown is a vector of system states over discrete points in time, when the data are given. The objective function minimized is the sum of the squares of the differences of the initial state from a known background state at the initial time and the differences of the values of observation operator and the data at every given time point. In the weak-constraint 4DVAR (Trémolet 2007), considered here, the model error is accounted for by allowing the ending and starting state of the model at every given time point to be different, and adding to the objective function also the sums of the squares of those differences. The sums of the squares are weighted by the inverses of the appropriate error covariance matrices, and much of the work in the applications of 4DVAR goes into modeling those covariance matrices.
In the incremental approach (Courtier et al. 1994), the nonlinear least squares problem is solved iteratively by using a succession of linear least square solutions. The major cost in 4DVAR iterations is in evaluating the model, tangent and adjoint operators, and solving large linear least squares. A significant software development effort is needed for the additional code to implement the tangent and adjoint operators to the model and the observation operators. Straightforward linearization, called the incremental approach (Courtier et al. 1994), leads to the Gauss-Newton method for nonlinear least squares (Bell 1994;Tshimanga et al. 2008). However, Gauss-Newton iterations may not converge, not even locally. Finally, while the evaluation of the model operator is typically parallelized on modern computer architectures, there is a need to further parallelize the 4DVAR process itself.
The Kalman filter is a sequential Bayesian estimation of the gaussian state of a linear system at a sequence of discrete time points. At each of the time points, the use of the Bayes theorem results in an update of the state, represented by its mean and covariance. The Kalman smoother simply considers all states at all time points from the beginning to be a large composite state. Consequently, the Kalman smoother is obtained from the Kalman filter by simply applying the same update as in the filter to the past states as well. However, historically, the focus was on efficient short recursions (Rauch et al. 1965;Strang and Borre 1997), similar the sequential Kalman filter.
It is well known that weak constraint 4DVAR is equivalent to the Kalman smoother in the linear case. To apply the Kalman smoother in the nonlinear case, the problem needs to be linearized, leading to variants of the extended Kalman filter and the Gauss-Newton method. Use of the Kalman smoother to solve the linear least squares in the Gauss-Newton method is known as the iterated Kalman smoother, and considerable improvements can be obtained against running the Kalman smoother only once (Bell 1994;Fisher et al. 2005).
The Kalman filter and smoother require maintaining the covariance of the state, which is not feasible for large systems, such as in numerical weather prediction. Hence, the ensemble Kalman filter (EnKF) and ensemble Kalman smoother (EnKS) (Evensen 2009) use a Monte-Carlo approach for large systems, representing the state by an ensemble of simulations, and estimating the state covariance from the ensemble. The implementation of the EnKS in Stroud et al. (2010) uses the adjoint model with the short recursions as in the KS. However, the implementations in Khare et al. (2008); Evensen (2009) do not depend on the adjoint model and simply apply EnKF algorithms to the composite state over multiple time points. We use the latter approach here.
The EnKF has become a competitive method for data assimilation. Consequently, combinations of ensemble and variational approaches have become of considerable recent interest. Estimating the background covariance for 4DVAR from an ensemble was one of the first connections (Hamill and Snyder 2000), and it is now standard and became operational (Wang 2010). Gradient methods in the span of the ensemble for one analysis cycle (i.e., 3DVAR) include Zupanski (2005), Sakov et al. (2012) (with square root EnKF as a linear solver in Newton method), and Bocquet and Sakov (2012), who added regularization and use LETKF-like approach to minimize the nonlinear cost function over linear combinations of the ensemble. Liu et al. (2008Liu et al. ( , 2009) combine ensembles with (strong constraint) 4DVAR and minimize in the observation space. Their method, called Ens4DVAR, does not need tangent or adjoint operators also. Zhang et al. (2009) use a twoway connection between EnKF and 4DVAR, to obtain the covariance for 4DVAR, and 4DVAR to feed the mean analysis into EnKF. However, ensemble methods for the solution of the 4DVAR nonlinear least squares problem itself or for the weak constraint 4DVAR do not seem to have been developed before.
In this paper, we propose to use the ensemble Kalman smoother (EnKS) as linear least squares solver in 4DVAR. The ensemble approach is naturally parallel over the ensemble members. The rest of the computational work is relatively cheap compared to the ensemble of simulations, and parallel dense linear algebra libraries can be used. The proposed approach uses finite differences from the ensemble, and no tangent or adjoint operators are needed. To stabilize the method and assure convergence, a Tikhonov regularization term is added to the linear least squares, and the Gauss-Newton method becomes the Levelberg-Marquardt method. The Tiknonov regularization is implemented within EnKS as a computationally cheap additional observation (Johns and Mandel 2008). We call the resulting method EnKS-4DVAR.
The paper is organized as follows. In Section 2, we review the formulation of 4DVAR. The EnKS for the incremental linearized squares problem is reviewed in Section 3. The new method without tangent operators is introduced in Section 4. The modifications for the regularization and the Levenberg-Marquardt method are presented in Section 5. Section 6 contains the results of our computational experiments, and Section 7 is the conclusion.

Incremental 4DVAR and the Gauss-Newton method
We want to determine x 0 , . . . , x k , where x i is the state at time i, from the background state,

and the observations
where M i is the model operator, and H i is the observation operator. Quantifying the uncertainty by covariances, with , we get the nonlinear least squares problem called weak-constraint 4DVAR (Trémolet 2007). Originally in 4DVAR, The least squares problem (1) is solved iteratively by linearization, For k vectors u i , i = 1 . . . k, denote the composite vector In each iteration x 0:k ← x 0:k + δx 0:k , one solves the auxiliary linear least squares problem for the increments δx 0:k , This is the Gauss-Newton method (Bell 1994;Tshimanga et al. 2008) for nonlinear squares, known in 4DVAR as the incremental approach (Courtier et al. 1994). Denote and write the auxiliary linear least squares problem (2) as The function minimized in (4) is exactly the same as the one minimized in the Kalman smoother. The Gauss-Newton method with the Kalman smoother as the linear least squares solver is known as the iterated Kalman smoother, and considerable improvements can be obtained against running the Kalman smoother, applied to the linearized problem, only once (Bell 1994;Fisher et al. 2005).

Ensemble Kalman Filter and Smoother
We present the EnKF and EnKS algorithms, essentially following Evensen (2009), in a form needed to state our theorems. We start with a formulation of the EnKF in a notation suitable for extension to EnKS. The notation independently of anything else. The ensemble of states of the linearized model at time i, conditioned on data up to time j (that is, with the data up to time j already ingested), is denoted by where the ensemble member index ℓ always runs over ℓ = 1, . . . , N , and similarly for other ensembles.
Algorithm 1 (EnKF) 1. Initialize 2. For i = 1, . . . , k, advance in time followed by the analysis step where is the sample mean, and 1 is the vector of all ones size N × 1. (8),

Remark 1 From
and Hence, the analysis ensemble Z N i|i consists of linear combination of the forecast ensemble, which can be written as multiplying the ensemble by a transformation matrix T N i , where Remark 2 The matrix formula in the analysis step (7) is not efficient when the dimension of the data space is large. Using (10) and the Sherman-Morrisson-Woodbury formula (Hager 1989), we transform the inverse in (7) into Using (15) requires only the solution of systems with the data error covariance matrix R i (which is typically easy, and often R i is diagonal) and of a system of the size N , the number of ensemble members. See Mandel et al. (2009) for details and operation counts.
The EnKS is obtained by applying the same analysis step (7) as in the EnKF to the composite state Z 0:i|i−1 from time 0 to i, conditioned on data up to time i − 1, Algorithm 2 (EnKS) Given z b , 1. Initialize 2. For i = 1, . . . , k, advance in time, followed by the analysis step where H 0:i = [0, . . . , H i ], and P N 0:i,0:i is the sample covariance matrix of Z N 0:i|i−1 .
The following theorem allows a straightforward implementation of the EnKS from the EnKF -the same transformation matrix is applied to the composite state from times 0 to i, not just the last time i. Also, one can use a transformation matrix from another version of EnKF, such as the square root filter, e.g., LETKF ); Fertig et al. (2007) assume such relation a-priori for a related method based on LETKF.

Theorem 1 The EnKS satisfies
where T N i is the transformation matrix (11) from the EnKF. Proof. We have with the blocks The terms in (7) become in (19) , and, similarly as in (9), The result now follows by the comparison of (16)-(21) with (7)-(11). ✷ When the original, nonlinear operators instead of the linearizations are used, we get the nonlinear EnKS method, which is common and useful in practice, even if it may not be justified theoretically. This method is obtained from the linear EnKS by replacing (18) and (14) by their original, nonlinear versions. It operates on the original ensemble of the states X N = x ℓ N ℓ=1 rather than on the increments z ℓ = δx ℓ .
Using Theorem 1, it is easy to see that Algorithm 3 coincides with Algorithm 2 in the linear case, i.e., when M i and H i are affine operators.

Nonlinear EnKS-4DVAR method
So far, the algorithm was relying on the linearized (i.e., tangent) model operators M i and H i and their adjoints.
The linearized model M i = M ′ i (x i−1 ) occurs only in advancing the time as action on the ensemble members δx ℓ = z ℓ , Approximating by finite differences based at x i−1 with step τ > 0, we get Thus, advancing the linarized model in time requires N + 1 evaluations of M i , at x i−1 and x i−1 + τ δx n i−1 . The observation matrix H i occurs only in the action on the ensemble, Approximating by finite differences based at x i , with step τ > 0, we have Thus, evaluating the action of the linarized observation requires N + 1 evaluations of H i , at x i−1 and x i−1 + τ z ℓ i−1 .
Here is the overall method. First, initialize if not given already. One iteration (2) of the incremental 4DVAR is then implemented as follows.
Algorithm 4 (EnKS-4DVAR) Given x 0 , . . . , x k : 1. Initialize z ℓ 0|0 ∼ N (z b , B) following (5), with z b = 0. 2. For i = 1, . . . , k, advance z ℓ in time following (18) with the linearized operator approximated from (25), , followed by the analysis step (20) with the transformation matrix T N i computed from (12) and with the matrix-vector products H i z i approximated from (26). 3. Update Note that for small τ , the resulting method is asymptotically equivalent to the method with the derivatives. Amazingly, it turns out that in the case when τ = 1, we recover the standard EnKS applied directly to the nonlinear problems, that is, with the linearized advance in time (6) replaced by application of the original, nonlinear operator M i . In particular, the incremental 4DVAR does not converge unless it is already at a stationary point, because each iteration delivers the same result, up to the randomness of the EnKS.

Theorem 2 If τ = 1, then one step of EnKS-4DVAR (Algorithm 4) is exactly the nonlinear EnKS (Algorithm 3).
In particular, the result of the step does not depend on the previous iterate.

Proof. Indeed, (27) becomes
which is exactly the same as advancing the ensemble member ℓ following (22) which is exactly the same as (24) with x ℓ i|i−1 = x i + z ℓ i|i−1 . Finally, (13) becomes using (3), which is exactly the same as in (23). ✷

Tikhonov regularization and the Levenberg-Marquardt method
The Gauss-Newton method may diverge, but convergence to a stationary point of (1) can be recovered by a control of the step δx. Adding a constraint of the form δx i ≤ ε leads to globally convergent trust region methods (Gratton et al. 2013). Here, we add δx i in a Tikhonov regularization term of the form γ δx i 2 S −1 i , which controls the step size as well as rotates the step direction towards the steepest descent, and obtain the Levenberg-Marquardt method x 0:k ← x 0:k + δx 0:k , where Under suitable technical assumptions, the Levenberg-Marquardt method is guaranteed to converge globally if the regularization parameter γ ≥ 0 is large enough (Gill and Murray 1978;Osborne 1976). Estimates for the convergence of the Levenberg-Marquardt method in the case when the linear system is solved only approximately exist (Wright and Holt 1985).
Similarly as in Johns and Mandel (2008), we interpret the regularization terms γ δx i 2 S −1 i in (33) as arising from additional independent observations δx i ∼ N 0, γ −1 S i Because the additional regularization observations δx i ≈ 0 are independent of the other observations and the state, separately, resulting in the mathematically equivalent but often more efficient two-stage method -simply run the EnKF analysis (7) twice, and apply both transformation matrices in turn following (20). With the choice of S i as identity or, more generally a diagonal matrix, the implementation following (15) is efficient; see Mandel et al. (2009) for operation counts.
Note that unlike in Johns and Mandel (2008), where the regularization was applied to a nonlinear problem and thus the sequential data assimilation was only approximate, here the EnKS is run on the auxiliar linearized problem (33), so all distributions are gaussian and the equivalence of solving (33) at once and assimilating the observations sequentially is statistically exact.

Lorenz 63 model
We first show an example without model error, where convergence is achieved with γ = 0(algo 4).
We consider the Lorenz 64 equations (Lorenz 1963), a simple dynamical model with chaotic behaviour. The Lorenz equations are given by the nonlinear system where x = x(t), y = y(t), z = z(t) and σ, ρ, β are parameters, which in these experiments are chosen to have the values 10, 28 and 8/3 respectively. The system is discretized using the fourth-order Runge-Kutta method. In (1), we choose In the experiments below, we assume perfect model, therefore Q i = εI, ε ≈ 0. We put σ b = 1, σ r = 1, and ǫ = 0.0001. As can be seen in Figures 1 and 2, the Lorenz attractor is fully nonlinear. It has two lobes connected near the origin, and the trajectories of the system in this saddle region are particularly sensitive to perturbations. Hence, slight perturbations can alter the subsequent path from one lobe to the other. In Figure 2, we keep x(t) and y(t) constant and we vary just z(t), then we compute the state at time t + 1, this figure shows the non linear dependence between the different components of the state at time t + 1 and the third component of the state at time t, To evaluate the performance of the method, we use the twin experiment technique. That is, an integration of the model is chosen as the true state. We then obtain the data y i by applying the observation operator H i to the   truth and then adding a gaussian perturbation N (0, R i ). Similarly, the background x b is sampled from the gaussian distribution with the mean equal to the initial conditions and the covariance B. Then we try to recover the truth using the observations y i and the background x b . Figure 3 reports simulation results for assimilating observations over 50 assimilation cycles, using the Hybrid 4DVAR and nonlinear EnKS method. Cycles are separated by a time interval of dt = 0.1. Figure 4 and shows the root mean square error (RMSE) between the sample posterior mean and the true state of the system.
As can be seen from Table 1, five iterations were enough for the method to converge. Note that the error does not converge to zero, because of the approximation and variability inherent in the ensemble approach.

Lorenz 96 model
We now consider the effect of the model error in the algorithm 4. The Lorenz 96 model (Lorenz 2006) is defined by the system of differential equations This model behaves chaotically in the case of external forcing F = 8. The first term of right-hand side simulates advection, and this model can be regarded as the time evolution of a one-dimensional quantity on a constant latitude circle, that is, the subscript corresponds to longitude. The PDE is discretized using a fourth-order Runge-Kutta method. For the results below κ = 1, F = 8 and dt = 0.01. Figure 5 shows the chaotic dynamics of the Lorenz 96 system. For the tests we took the parameters B = σ 2 b diag(1, ..., 1 i 2 , ...), R i = σ 2 r I, H i (X i ) = X t + X 2 t , M i is the Lorenz 96 model, Q i = σ q C, where C i,j = exp(−|i − j| dt L ), L = dt 5 , σ b = 1, σ r = 1, σ q = 0.1, and the ensemble size N = 40. We use again the twin experiments technique, the true state is equal to an integration of the model plus a gaussian perturbation N (0, Q i ). Figures 6 and 7 illustrate EnKS-4DVAR on this problem.

Example where Gauss-Newton does not converge
We now show that the algorithm 4 may not be convergent and that Tikhonov regularization may be needed in some circumstances. The following academic example illustrates this fact.
The Gauss Newton method for nonlinear least squares is not globally convergent, but convergence to a stationary point of any least square problem can be recovered by using the Levenberg-Marquart control. Consider the following example, which requires Levenberg-Marquart regularization to converge. The objective function to minimize is J(x 0 , x 1 ) = (x 0 − 2) 2 + (3 + x 3 1 ) 2 + This problem could be seen as a 4DVAR problem where the state at time 0 is x 0 , the background state is x b = 2, the background covariance B = I, there is one time step, the state at time t = 1 is x 1 , the model M 1 = I, and the model is perfect. Q 1 = 0.000001 ≈ 0, observation operator H 1 (x) = −x 3 and observation error covariance is R 1 = I. Figure 8 shows the iterations of the EnKS-4DVAR method applied to the problem 34 seen as 4Dvar problem, in two cases: when γ = 0 ( Gauss Newton iteration) the method does not converge, and for γ = 200 ( Levenberg-Marquart iteration), the method seems to converge to the local minimum (x ⋆ 0 , x ⋆ 1 ) = (0.419, 0.419).

Conclusion
The EnKS-4DVAR method was formulated and shown to be capable of handling strongly nonlinear problems, and it converges to the nonlinear least squares solution in a small number of iterations. Its performance on large and realistic problems will be studied elsewhere.