Data assimilation experiments using diffusive back-and-forth nudging for the NEMO ocean model

. The diffusive back-and-forth nudging (DBFN) is an easy-to-implement iterative data assimilation method based on the well-known nudging method. It consists of a sequence of forward and backward model integrations, within a given time window, both of them using a feedback term to the observations. Therefore, in the DBFN, the nudging asymptotic behaviour is translated into an inﬁnite number of iterations within a bounded time domain. In this method, the backward integration is carried out thanks to what is called backward model, which is basically the forward model with reversed time step sign. To maintain numeral stability, the diffusion terms also have their sign reversed, giving a diffusive character to the algorithm. In this article the DBFN performance to control a primitive equation ocean model is investigated. In this kind of model non-resolved scales are modelled by diffusion operators which dissipate energy that cascade from large to small scales. Thus, in this article, the DBFN approximations and their consequences for the data assimilation system set-up are analysed. Our main result is that the DBFN may provide results which are comparable to those produced by a 4Dvar implementation with a much sim-pler implementation and a shorter CPU time for convergence. The conducted sensitivity tests show that the 4Dvar proﬁts of long assimilation windows to propagate surface information downwards, and that for the DBFN, it is worth using short assimilation windows to reduce the impact of diffusion-induced errors. Moreover, the DBFN is less sensitive to the ﬁrst guess than the 4Dvar.


Introduction
In data assimilation, an interesting tool is the Kalman-Bucy filter (Kalman and Bucy, 1961), where a non-linear differential equation of the Riccati type was derived for the covariance matrix of the optimal filtering error, the solution of which completely specifies the optimal filter for linear quadratic problems. A few years later, Luenberger (1966Luenberger ( , 1971) defined an observer for reconstructing the state of an observable deterministic linear system from exact measurements of the output. This Luenberger observer has been called an "asymptotic estimator", since under linearity and observability hypothesis the estimator error converges to zero for time tending to infinity (Gelb et al., 1974;Bonnans and Rouchon, 2005). Its advantage compared to Kalman filtering is that it does not require any information on the various covariance matrices but, as was pointed out in Luenberger (1966), the Kalman-Bucy filter appears as a particular Luenberger observer which is the optimal least mean square estimate of the state in the case of noisy measurements. The stochastic observer unifies the concepts of deterministic Luenberger observer theory and stochastic Kalman filtering theory as it is explained in Gelb's book (Gelb et al., 1974) for instance. Both are useful in practice. It should be mentioned that the concept of a Luenberger observer has been extended to non-linear systems, for example in Zeitz (1987). This Luenberger observer has been rediscovered in the geophysical literature for atmospheric models under the term

Published by Copernicus Publications on behalf of the European Geosciences Union & the American Geophysical Union.
of nudging (Anthes, 1974;Hoke and Anthes, 1976;Stauffer and Seaman, 1990), which consists in adding a forcing term on the right-hand side of a given model in order to gently push (nudge) the solution toward a prescribed value. It is quite interesting to note that there is no mention of the link between nudging and the Luenberger observer in the geophysical literature until the work of Auroux and Blum (2008). More recently, a comprehensive study on the nudging method and its variants was produced by Blum et al. (2008) and Lakshmivarahan and Lewis (2012).
The first appearance of a successful application of nudging to ocean data assimilation (DA) was in 1992 in a work that assimilated sea surface height derived from satellite measurements into a quasi-geostrophic layered model (Verron, 1992). Since then, the method has been successfully applied to several oceanographic numerical problems such as the estimation of boundary conditions (Marchesiello et al., 2001;Chen et al., 2013), downscaling (Li et al., 2012), and other DA problems (Verron, 1992;Haines et al., 1993;Blayo et al., 1994;Lewis et al., 1998;Killworth et al., 2001;Thompson et al., 2006). Concerning applications to DA problems, the weights given to the model and the observations are generally not based on any optimality condition, but are rather scalars or Gaussian-like functions constructed based on physical assumptions or empirical considerations. The appeals of this method are the simplicity of implementation in complex numerical models, the low computational power required and the time smoothness of the solution.
The increasing availability of computing power has allowed one to use more advanced data assimilation methods. In general, these methods use information on the model statistics and observations errors to weight the modelobservations combination. Two of these methods that are widely used by prediction centres are the ensemble Kalman filter -EnKF (Evensen, 1994) -and its variations (Pham, 2001;Hunt et al., 2007), and the four-dimensional variational method 4Dvar (Le Dimet and Talagrand, 1986;Courtier et al., 1994). For the first, the numerical costs are due to the propagation of the ensemble, usually formed by tenths of members, to calculate the forecast. For the second, the costs are due to the need for minimising a cost function in a very large state space (10 8 variables). This requires several iterations of the minimisation algorithm, which involves several integrations of the direct and adjoint models.
However, even with the growing interest in these complex techniques built on solid theoretical arguments, nudging has not been left aside. Recent works have used nudging along with more advanced methods such as optimal interpolation (Clifford et al., 1997;Wang et al., 2013), EnKF (Ballabrera-Poy et al., 2009;Bergemann and Reich, 2010;Lei et al., 2012;Luo and Hoteit, 2012), 4Dvar (Zou et al., 1992;Stauffer and Bao, 1993;Vidard et al., 2003;Abarbanel et al., 2010) or particle filters (Luo and Hoteit, 2013;Lingala et al., 2013) to extract the best of each method. In the particular case of the hybridisation with the EnKF proposed by Lei et al. (2012), the resulting algorithm has the advantage of the dynamical propagation of the covariance matrix from the EnKF and uses nudging to mitigate problems related to the intermittence of the sequential approach, which among other things entails the possible discarding of some observations.
In 2005, Auroux and Blum (2005) revisited the nudging method and proposed a new observer called back-and-forth nudging (BFN), because a Luenberger observer is an asymptotic observer, and as data assimilation is performed for a finite time, the convergence of the real state is not yet achieved at the limited horizon. The BFN consists of a sequence of forward and backward model integrations, both of them using a feedback term to the observations, as in the direct nudging. The BFN integrates the direct model backwards in time, avoiding the construction of the adjoint and/or tangent linear models needed by 4Dvar. Therefore, it uses only the fully non-linear model to propagate information forward and backward in time. The nudging gain, which has an opposite sign with respect to the forward case, has a double role: push the model toward observations and stabilise the backward integration, which is especially important when the model is not reversible. Back-and-forth algorithms have already been used in the past for initialisation and four-dimensional data assimilation (Morel et al., 1971;Talagrand, 1981), but without nudging. In these papers, the authors are just replacing at each observation time the values predicted by the model for the observed parameters with the observed values; this method requires the considered system to be reversible, which is not the case if there exists irreversible dissipation in the model.
The BFN convergence was proved by Auroux and Blum (2005) for linear systems of ordinary differential equations and full observations, by Ramdani et al. (2010) for reversible linear partial differential equations (wave and Schrödinger equations), and by Donovan et al. (2010) and Leghtas et al. (2011) for the reconstruction of quantum states, and was studied by Auroux and Nodet (2012) for non-linear transport equations. The BFN performance in numerical applications using a variety of models, including non-reversible models such as a shallow water (SW) model (Auroux, 2009) and a multi-layer quasi-geostrophic (LQG) model (Auroux and Blum, 2008), are very encouraging. Moreover, by using a simple scalar gain, it produced results comparable to those obtained with 4Dvar, but with lower computational requirements (Auroux, 2009;. In this article we present for the first time a BFN application to control a primitive equation ocean model. The numerical model used is NEMO (Madec, 2008), currently used by the French operational centre, Mercator Océan (http: //www.mercator-ocean.fr/fre), to produce and deliver ocean forecasts. The well-known idealised double gyre configuration at eddy-permitting resolution is used. This configuration has the advantage of being simple from the geometry and forcing points of view; at the same time, it reproduces most of the features found in a middle latitude ocean basin. The BFN application to control a primitive equation ocean model represents a new challenge due to the increased model complexity. Among the differences between NEMO and the simplified oceanic models used by Auroux and Blum (2008) and Auroux (2009) stand out the more complex relationship between the variables in the former since no filtering technique is used in the derivation of the physical model (except the Boussinesq approximation which is also considered by the SW and LQG models), and the inclusion of an equation for the conservation of the thermodynamical properties. The latter requires the use of a non-linear state equation to couple dynamical and thermodynamical variables.
Furthermore, the vertical ocean structure represented by NEMO is more complex than the vertical ocean structure represented by the SW and LQG used by Auroux and Blum (2008) and Auroux (2009). This is because the SW model has no vertical levels and the LQG was implemented with only 3 layers, while in this article NEMO is configured with 11 vertical layers. In addition, NEMO considers vertical diffusion processes, mostly ignored by the LQG model. Vertical diffusion plays an important role in maintaining the ocean stratification and meridional overturning circulation, which is directly related to the transport of heat in the ocean. Moreover from the practical point of view, the diffusion/viscosity required to keep the NEMO simulations stable is by far greater than for the SW or LQG at the same resolution.
These issues call into question the validity of the approximations made by the BFN under realistic conditions. Thus, our primary objective is to study the possibility of applying the BFN in realistic models and to evaluate its performance compared to the 4Dvar. This appears to be the next logical step before using the BFN to assimilate real data.
This article is organised as follows. In Sect. 2 the BFN and the 4Dvar are described. Section 3 describes the model physics and the model set-up. Section 4 discusses some practical aspects of the backwards integration. Section 5 presents the BFN and the 4Dvar set-up and the designed data assimilation experiments. Finally, the data assimilation results are presented in Sect. 6, in which we discuss the impact of the length of the data assimilation window on the method performances as well as the sensitivity of each method to the observation network and the initial condition.

Data assimilation methods
In this section the back-and-forth nudging (BFN) is introduced and the 4Dvar used to assess the BFN performance is briefly described.

The back-and-forth nudging
The conventional nudging algorithm consists in adding a forcing term (feedback term) to the model equations, proportional to the difference between the data and the model at a given time. More generally, given a model described by a set of ordinary equations (or discretised partial differential equations), nudging consists in adding to them the forcing term K(x obs − H(x)): where x represents the state vector, F is the model operator, H is the observation operator allowing one to compare the observations x obs (t) to the corresponding system state H(x), and K is the nudging gain matrix. In this algorithm the model appears as a weak constraint. The feedback term changes the dynamical equations and is a penalty term that forces the state variables to get closer to the observations. In the linear case, i.e. when F and H may be written as matrices F and H, and in the absence of noise in the system, nudging is nothing else than the Luenberger observer (Luenberger, 1966). In this case, and assuming that the observability of the pair (F, H) holds, there is a class of possible matrices K that, thanks to the pole shifting theorem, guarantees the estimator convergence when t → ∞ (Gelb et al., 1974;Bonnans and Rouchon, 2005). This should be one possible explanation for why nudging usually works quite well and the converged state is not strongly affected by the choice of K. However, when constructing K (whose unit is s −1 ), the aim is to obtain an estimator response faster than the timescale of the studied processes.
The BFN is an iterative algorithm which sequentially solves the forward model equations with a feedback term to the observations (Eq. 1) and the backward model equations with an opposite sign for the feedback term. The initial condition of the backward integration is the final state obtained after integration of the forward nudging equation. At the end of each iteration, a new estimation of the system's initial state is obtained. The iterations are carried out until convergence is reached.
The difference of the BFN with respect to the conventional nudging is the model integration backward in time. This allows one to recover initial conditions as well as to use more than once the same observations set. Consequently, the BFN may be seen as a sub-optimal iterative smoother.
Under the hypothesis of a linear model, a variational interpretation is possible. In this case, if we choose K = kH T R −1 , where R is the observation error covariance matrix, and k is a scalar, the solution of the forward nudging is a compromise between the minimisation of the system's energy and the minimisation of the distance between the data and the model (Auroux and Blum, 2008).
However, the backward integration is problematic when the model is diffusive or simply not reversible. In the case of ocean models, there are two main aspects requiring the inclusion of diffusion: (i) the control of numerical noise, and (ii) the modelling of sub-grid-scale processes, i.e. to parametrise the energy transfer from explicitly resolved to non-resolved scales. Indeed, diffusion naturally represents a source of uncertainty in ocean forecasts, even for the purely forward model, and has been investigated from the point of view of optimal control theory in Leredde et al. (1999).
To address the problem of the backward model instability in this article, diffusive back-and-forth nudging -DBFN (Auroux et al., 2011) -is used. In this algorithm the sign of the diffusion term remains physically consistent and only the reversible part of the model equations are really solved backward. Practical consequences of this assumption are analysed in Sect. 4. A similar solution was proposed by Pu et al. (1997) and Kalnay et al. (2000) to stabilise their quasi-inverse linear model.
To describe the DBFN algorithm, let us assume that the time continuous model satisfies dynamical equations of the form with an initial condition x(0) = x 0 , where F denotes the non-linear model operator without diffusive terms, ν is a diffusion coefficient and represents a diffusion operator. If nudging is applied to the forward system (Eq. 2), it gives where k ∈ N ≥1 stands for iterations and x 0 (0) is a given initial guess. Nudging applied to the backward system with the reversed diffusion sign gives The system composed of Eqs. (3) and (4) is the basis of the DBFN algorithm. They are iterated until convergence. Therefore, one important aspect of the DBFN algorithm is the convergence criterion. Ideally, at convergence the nudging term should be null or small comparable to the other equation terms. Otherwise, when the nudging is switched off, which is the case in the forecast phase, the system may return to a state close to the background state or to a state which is not consistent with the one at convergence. The convergence is calculated as where • is the L 2 norm, and the choice for = 0.005 is based on sensitivity tests (not presented in this article). Data assimilation is the ensemble of techniques combining the mathematical information provided by the equations of the model and the physical information given by the observations in order to retrieve the state of a flow. In order to show that the DBFN algorithm achieves this double objective, let us give a formal explanation of the way DBFN proceeds. If K = K and the forward and backward limit trajectory are equal, i.e x ∞ = x ∞ , then taking the sum between Eqs. (3) and (4) shows that x ∞ satisfies the model equations without diffusion: while taking the difference between Eqs. (3) and (4) shows that x ∞ satisfies the Poisson equation which represents a smoothing process on the observations for which the degree of smoothness is given by the ratio ν K (Auroux et al., 2011). Equation (7) corresponds, in the case where H is a matrix H and K = kH T R −1 , to the Euler equation of the minimisation of the following cost function where the first term represents the quadratic difference from the observations and the second one is a first-order Tikhonov regularisation term over the domain of resolution . The vector x ∞ , a solution of Eq. (7), is the point where the minimum of this cost function is reached. It is shown in Sect. 6.1 that at convergence the forward and backward trajectories are very close, which justifies this qualitative justification of the algorithm.
The description of the used K matrix is given in Sect. 5.1.

Four-dimensional variational method -4Dvar
Variational methods minimise a cost function that measures the distance between the estimated state and the available observations. Let us assume that observations are available at every instant (t i ) 1≤i≤N . Given a first guess x b of the initial state, the 4Dvar algorithm will find an optimal initial condition that minimises the distance between the model trajectory and the observations in a given assimilation window. This optimal state is found by minimising the following cost function: where B is the background error covariance matrix and M 0,i represents the model integration from time t 0 to time t i . R i , H i and y i are the observation error covariance matrix, the observation operator and the available observations at time t i , respectively. The optimal initial state is found by solving

Nonlin. Processes
The calculation of this gradient is done using the adjoint method proposed by Lions (1971) and brought to the meteorological context by Le Dimet and Talagrand (1986). Since for ocean applications M and H are non-linear, we used the incremental approach proposed by Courtier et al. (1994), which consists in solving a sequence of linearised quadratic problems, expecting this sequence to converge to the solution of the problem given by Eqs. (9) and (10). In this case, the cost function will not be minimised with respect to the initial state, but with respect to an increment δx 0 defined by (12) and the new cost function is given by is called the innovation vector. To take advantage of the non-linearities, it is a common practice to re-linearise H and M around a new model trajectory after some iterations of the minimiser. This new model trajectory is computed by integrating the non-linear model forward in time using x k 0 = x b + δx k 0 as the initial condition, where k refers to the new run of the non-linear model and δx k 0 is the increment previously obtained through the minimisation of Eq. (13). This gives rise to what is called the inner loop and outer loop iterations. The algorithm implemented in NEMO, called NEMOVAR (Mogensen et al., 2009), uses this technique. It can be summarised as follows: -Search the δx a,k 0 that minimises (inner loop): The description of matrices B and R is given in Sect. 5.2.

Ocean model and experimental set-up
The ocean model used in this study is the ocean component of NEMO (Nucleus for European Modelling of the Ocean; Madec, 1996). This model is able to represent a wide range of ocean motions, from the basin scale up to the regional scale. Currently, it has been used in operational mode by the French Mercator Océan group (http://www.mercator-ocean.fr) and the European Centre for Medium Range Weather Forecast (ECMWF). The model solves a system of five prognostic equations, namely the momentum balance for the horizontal velocities, an equation describing the evolution of the free surface, and the heat and salt conservation equations. A non-linear equation of state couples the two tracers to the fluid fields. In this study, a linear free surface formulation is used along with the approach developed by Roullet and Madec (2000) to filter out the external gravity waves.
Equations are discretised using spherical coordinates in an Arakawa C grid. The model advances in time using a leapfrog scheme for all terms except for the vertical diffusive terms, which are treated implicitly. At every time step the model uses a Robert-Asselin (RA) temporal filter to damp the computational mode. The leap-frog scheme followed by the RA filter leads to a first-order temporal scheme (Willians, 2009). Spatial discretisation uses a centered second-order formulation for both the advective and diffusive terms.
The double gyre configuration, extensively used to study jet instabilities (Chassignet and Gent, 1991;Primeau, 1998;Chang et al., 2001), meso-and submeso-scale dynamics (Levy et al., 2010) and data assimilation methods (Molcard et al., 2004;Krysta et al., 2011;Cosme et al., 2010), is used for the present study. The double gyre configuration simulates the ocean middle latitude dynamics and has the advantage of being simple, when compared to real applications, but still considering full dynamics and thermodynamics.
In our experiments we use a homogeneous horizontal grid with a 25 km resolution and a vertical resolution ranging from 100 m near the upper surface to 500 m near the bottom. The bottom topography is flat and the lateral boundaries are closed and frictionless. The only forcing term considered is a constant wind stress of the form τ = τ 0 cos 2π(y−y 0 ) L , 0 , where y is the latitude geographic coordinate with y 0 = 24 • and y 0 ≤ y ≤ 44 • , L = 20 • and τ 0 = −0.1 N m −2 . Horizontal diffusion/viscosity are modelled by a bi-Laplacian operator meanwhile a Laplacian operator is used in the vertical. They all use constant coefficients in time and space: ν u,v h = −8 × 10 10 m 4 s −1 and ν u,v v = 1.2 × 10 −4 m 2 s −1 for the momentum equations and ν t,s h = −4 × 10 11 m 4 s −1 and ν t,s v = 1.2 × 10 −5 m 2 s −1 for temperature and salinity. The initial condition is similar to that used by Chassignet and Gent (1991) and consists of a homogeneous salinity field of 35 psu and a temperature field created to provide a stratification which has a first baroclinic deformation radius of 44.7 km. Velocity and sea surface height (SSH) fields are initially set to zero. This double gyre configuration is currently used as the NEMO data assimilation demonstrator and as the experimentation and training platform for data assimilation activities (Bouttier et al., 2012). For the present work, the model was integrated for 70 years, in order to reach the statistical steady state. Afterwards, 10 years of free model run were performed that were used to calculate the regression models which are used to calculate the nudging matrix K (see Sect. 5.1), and then 2 additional years were finally completed to be used as the truth from which the observations were extracted.

The backward integration without nudging: practical aspects
The backward model uses exactly the same numerical scheme as the forward model. Since most of the model is solved using centered finite differences, the inverse version of the discretised model is similar to the discrete version of the inverse continuous model. The only distinction between the forward and backward models is the change in the sign of the diffusive terms when stepping backwards, this making the backward integration stable. If this is not taken into account, the model blows up after a few days.
Reversing the diffusion sign in the backward model is a numerical artifact and, being so, its effects should be carefully analysed. In this section, the backward integration accuracy is studied, as well as its sensitivity with respect to the choice of the diffusion coefficient. The errors are analysed by calculating the L2 error norm at the end of one forwardbackward integration relative to a typical 1 day model variation: where t = 1 day and the brackets represent the empirical mean. Figure 1 shows the global error, R err , for different window sizes. The errors grow linearly with the window size for all variables. Temperature is the most affected variable, followed by sea level and velocities. Temperature errors exceed 18 times a typical 1 day variation for the 30 day experiment and 1.2 times for the 2 days. The use of reduced diffusion/viscosity coefficients reduces the errors to 6.8 and 0.16 times the 1 day variation for 30 and 2 day experiments, respectively. Velocities errors were reduced by 50 % for 30 days and 85 % for 2 days, while SSH errors were reduced by 60 and 88 % for 30 and 2 days, respectively.
As shown in Fig. 2, velocity and temperature errors are depth-dependent. Whereas for velocity they are larger at the surface and decrease with depth, for temperature they are larger in the thermocline. In the cases for which the forwardbackward integrations use the same diffusion/viscosity coefficients as in the reference simulation, the temperature errors at thermocline depths exceed 3 times the typical 1 day variation for the 5 day experiments and reach 15 times for 20 day experiments. Considering the velocities, errors are proportional to four 1 day variations for the 5 day experiment and to eight 1 day variations for the 20 day experiments. For time windows of 10, 20 and 30 days, velocities at the thermocline depths start to be influenced by temperature errors.
Reduction of the diffusion/viscosity coefficients greatly reduced the errors especially in the thermocline for the temperature and at the surface for the velocity. It can be noted that when the diffusion coefficient is decreased the errors converge to a limit. This limit changes with respect to the window length and should be related to the diffusion required to stabilise the numerical method, which is of second order in our case, and hence oscillatory. Therefore, there is a compromise between the errors induced by the extra diffusion and errors due to spurious oscillations.
Numerical errors were assessed by changing the model time step from 900 to 90 s. The resulting errors (not shown) do not change, suggesting that the errors induced by the diffusion are dominant. On the one hand, this is important because the complete rewriting of the model's code can be difficult, similar to the adjoint model programming used by the 4Dvar, but on the other hand, if the assimilation cannot control the diffusion errors, it may represent a fundamental problem of the method when it is applied to non-reversible geophysical systems such as the ocean. Figure 3 shows the spatial structures of the sea level error for the 10 day experiment. The errors are highly variable in space, being larger along the main jet axis. This is proba-  bly due to the fact that the backward integration smooths the gradients, and so the largest errors are found near the fronts. Therefore, the error structures may be of high variability in space and time, since they are state dependent. Figure 4 shows the surface kinetic energy spectrum calculated from the experiment employing the reference diffusion coefficient and a reduced diffusion coefficient. The backward integration introduces an extra diffusion, coarsening the ef- Kinetic energy mean power spectra calculated using the first layer velocity fields. Black curves represent the "true" initial condition power spectra; red curves represent the power spectra calculated after one forward-backward iteration without the nudging term and employing the reference diffusion coefficient; magenta curves represent the power spectra calculated after one forwardbackward iteration without the nudging term and employing a reduced diffusion coefficient. Top left: 5 day assimilation window. Top right: 10 day assimilation window. Bottom: 20 day assimilation window. In the bottom abscissa, the tick labels stand for longitudinal wave number (rad m −1 ), while in the top abscissa, the tick labels stand for the corresponding wavelengths in km units. fective model resolution, which is defined as the portion of the spectra for which there is a change in the spectrum slope. In the reference simulation the effective model resolution is estimated to be 190 km, which is coherent with the ≈ 7× x estimation of Skamarock (2004).
The longer the time window, the greater the portion of the spectra affected. For the experiment employing the reference diffusion coefficient, the divergence between the true spectra and the spectra obtained from the backward integration is observed at 126, 314 and 627 km for 5, 10 and 20 day experiments, while for the experiments considering a reduced diffusion coefficient there is almost no difference for the 5 day experiment, and the divergence is observed at 126 and 314 km for the 10 and 20 day experiments. If on the one hand using the reduced diffusion helps to keep the energy distribution coherent with the true distribution, on the other hand it creates noise in the range of 126 to 25 km. This confirms that there is a trade-off between the errors due to the excessive smoothing and the errors due to high-frequency numerical modes.
In this section we have seen that there are large backward errors induced by over-diffusion. Therefore, short time windows with reduced diffusion coefficients would be preferable for use in DA experiments. Two regions have to be cautiously analysed: the surface and the thermocline. prone to feature errors due to their role in the wind energy dissipation, while at the thermocline, strong density gradients contribute to high diffusion rates.

Prescription of the DBFN gain
In this study the increments corresponding to the term K(x obs − H(x)) are calculated in two operations: first the increments of the observed variables are calculated using a prescribed weight and subsequently the increments of the other state variables are calculated using linear regression. More precisely, defining y = H(x) as the observed part of the state vector, the first step may be written as where the superscript "b" denotes the background field or the model field available from the last time step. The prescribed weight is given by where σ 2 m is the mean spatial value of SSH variance calculated from the free model run, σ 2 o is the observation error variance and γ is a parameter used to adjust the variance of the observation errors. When γ = 1 the Eq. (16) for the weight has the same form of the scalar Kalman gain (Gelb et al., 1974). For values greater than one, γ is an inflation factor, i.e. it increases the variance of the observation errors resulting in more weight given to the model in the Eq. (15).
The use of the inflation factor is theoretically justified in the linear Kalman filtering context. In this case, it is wellknown that the Kalman filter provides the best linear unbiased estimator. Therefore, there is no need to use more than once the observations. Consequently, when one is iterating the Kalman filter, the inflation parameter should be used to avoid overfitting and the introduction of correlated errors in the system (Kalnay and Yang, 2010). In this study, γ = 18, which means that theoretically the best solution would be reached in nine iterations. However, since in this study the Kalman filter equations are not fully used and the system is not linear, the γ parameter is used as a guide on how strong the model is nudged toward the observations. Indeed, the iterations are not limited to nine. The used values for the other parameters are σ m = 0.017 m and σ o = 0.03 m consistently with the perturbations added to the observations (see Sect. 5.4).
Then, the increments of the non-observed variables, δx, are calculated by using a regression equation of the form whereB PLS is the partial least squares (PLS) regression coefficients which are described below. It is worth noting that in Sect. 6 we also apply this update scheme to an ordinary direct nudging experiment (ONDG). In this case, there is no backward integration, there is no iteration and each observation is used only once. Accordingly, the parameter γ is equal to 1. The PLS can be seen as an improvement to the ordinary least square (OLS) regression. The most important difference between OLS and PLS is that the latter assumes that the maximum information about the non-observed variables is in those directions of the observed space which simultaneously have the highest variance and the highest correlation with the non-observed variables.
In the PLS description (Tenenhaus, 1998), Y ∈ R n×M is considered as the observed or predictor variables and X ∈ R n×N as the non-observed or response variables. In our notation n is the sample size and M and N are respectively the size of the state space of Y and X. Besides, Y and X are centered and have the same units. The PLS regression features two steps: a dimension reduction step in which the predictors from matrix Y are summarised in a small number of linear combinations called "PLS components". Then, that components are used as predictors in the ordinary least square regression.
The PLS as well as the principal component regression can be seen as methods to construct a matrix of p mutually orthogonal components t as linear combinations of Y: where T ∈ R n×p is the matrix of new components t i = (t 1i , . . ., t ni ) T , for i = 1, . . ., p, and W ∈ R M×p is a weight matrix satisfying a particular optimality criterion. The columns w 1 , . . ., w p of W are calculated according to the following optimisation problem: subject to w T i w i = 1 and w T i Y T Yw j = 0 for j = 1, . . ., i −1.
The PLS estimatorB PLS is given bŷ An immediate consequence of Eq. (20) is that when W = I , the ordinary least squares solution is obtained. The number of components p is chosen from crossvalidation. This method involves testing a model with objects that were not used to build the model. The data set is divided into two contiguous blocks; one of them is used for training and the other for validating the model. Then the number of components giving the best results in terms of mean residual error and estimator variance is sought.
The weight and the regression modelB PLS are kept constant over the assimilation cycles and the correction steps (Eqs. 15 and 16) are applied at the end of the loop of time. Thus, our updating scheme can be seen as a rough approximation of the two-step update for EnKF presented by Anderson (2003).

The 4Dvar background term configuration
The 4Dvar considers a background term of the form J b = 1 2 (δx k 0 ) T B −1 (δx k 0 ), where B is the background error covariance matrix. This term is also known as a regularisation term in the sense of Tikhonov. It is especially important when there are not enough observations to determine the problem.
The B matrix is supposed to model the spatial covariance of the background errors of a given variable as well as the cross-covariance between the errors of different variables. In order to define the matrix B, Derber and Bouttier (1999) have proposed to decompose it as the product of simply defined matrices. This is accomplished by decomposing the variables into a balanced component and an unbalanced component. This is done to all variables but one should be kept without decomposition so as we can define the balanced and unbalanced components of the other variables. We used the decomposition proposed by Weaver et al. (2005) for which the temperature is the "seed" variable and then, thanks to some physical constraints such as the geostrophic balance, the hydrostatic balance and the principle of water mass conservation, all other state variables may be decomposed into a balanced (B) component and an unbalanced (U) component. Thus, each model variable, namely temperature (temp), salinity (salt), sea surface height (η), zonal velocity (u) and meridional velocity (v), may be written as with ρ the density and p the pressure. Then, since a covariance matrix may be written as the product of variances and correlations, B may be expressed as B = G T C G T , where is a diagonal matrix of error standard deviation, for which the climatological standard deviation are the entries, and C is an univariate correlation matrix modelled using the generalised diffusion equation (Weaver and Courtier, 2001;Weaver et al., 2005). In this method the user should choose typical decorrelation lengths. In this study the horizontal decorrelation length is set to 400 km and the vertical decorrelation length is set to 1500 m. In addition, the 4Dvar is configured to perform one outer loop and a maximum of thirty inner loops for each assimilation cycle.

Assimilation cycle
One assimilation cycle is defined as the process of identifying an initial condition through the iterative process followed by a forecast spanning the assimilation window, which provides a new background to the next assimilation cycle.
The objective of cycling is to provide a background state for the next assimilation window that is closer to the true state than the very first background field. This usually reduces the number of iterations needed by the algorithms to reach convergence.
The length of the data assimilation window (DAw) used in the reference experiments (Sect. 6.1) is 10 days. For the sensitivity experiments presented in Sect. 6.2, the lengths of the assimilation window are 5 days and 30 days.

Observation network
In this article, every 4 days an observation network simulating a Jason-1 satellite density sample is available. The data are perturbed with white Gaussian noise with standard deviation equal to 3 cm. With this observation network a new set of 5000 observations is available every 4 days.
The data assimilation problem we proposed to solve is to recover the full model state at the beginning of the assimilation window. The model state space is composed of five variables: sea surface height (η), meridional and zonal velocities (u and v), temperature and salinity (temp and salt). Since we have a horizontal mesh of size 81 × 121 and 11 vertical layers the total size of the state space is 441 045. Therefore, the problem is undetermined, since the observations represent only a 1.1 % of the total state space. This means that the background term, and accordingly the B matrix for the 4Dvar and the regression modelB PLS for the DBFN, have quite a strong importance on the method performances since they project the increments of the observed variables onto the numerous non-observed variables.
To study to which extent the results are dependent on the number of assimilated observations and on the first guess, in Sect. 6.2.2 two additional experiments assimilating complete daily fields of SSH are conducted: one using the same first guess of the experiments of Sect. 6.1, and another using a perturbed initial condition. Despite the fact that the problem continues to be underestimated, in this case the SSH analysis is no longer dependent on the SSH spatial covariance, and the unstable modes associated with the SSH dynamics are certainly observed. The analysis produced for the other state vector variables remains dependent on the matrices B for the 4Dvar case andB PLS for the DBFN case.

Reference experiment
In this section the results produced by the DBFN, the 4Dvar method, ordinary nudging (ONDG) and the control experiment are presented. All assimilation methods include the five prognostic variables in the state vector. This is possible thanks to the PLS regression method in the case of the DBFN First, the minimisation performance of the 4Dvar implementation is analysed. Figure 5 shows the reduction of the cost function gradient for the 4Dvar and the reduction of the relative error of the zonal velocity for the DBFN, both of them for the first assimilation cycle. 4Dvar takes 26 iterations to approximately achieve the optimality condition ∇J = 0. This represents 3 times the number of iterations required by the DBFN to converge, after which the errors cease to decrease. Moreover, the 4Dvar numerical cost is more than 3 times the DBFN cost, since one execution of the adjoint model is 4 times the cost of the direct model in terms of CPU time.
We note that the minimum error for the DBFN is reached after nine iterations. This is quite consistent with our choice γ = 18, since theoretically it allows the use of the same set of observations for 18 times.
At this point we find it appropriate to present the fact that the trajectories of the forward and backward nudging are very close to each other at convergence, which justifies the qualitative explanation of the DBFN algorithm given by Eqs. (6) and (7). This is illustrated in Fig 6, which shows the forward and backward evolution of the spatially averaged (in black) and single grid point (at 34 • N and 52.6 • W, in red) surface zonal velocity during the last DBFN iteration. The corresponding grid point is located in the region of the unstable jet, where velocity strongly varies through time. Figure 7 shows the root mean square (rms) error for the control experiment (without assimilation), the experiment using the direct nudging with PLS regression (ONDG), the DBFN and the 4Dvar. Each method is cycled according to the description given in Sect. 5.3 and the curves represent the forecast rms error; i.e. they are calculated from the forecasts that were initialised with the analysed fields. The DBFN errors for the velocity and SSH converge to their asymptotic values after the first assimilation cycle, while for ONDG and 4Dvar, errors stop decreasing after 100 and 200 days, respec- tively. This is a benefit of the iterations performed by the DBFN when the model and data are quite different. Among the experiments conducted, the DBFN produced the smallest errors for all variables, except for the zonal velocity, for which the 4Dvar has slightly smaller errors. The ONDG also showed good performance, but with errors larger than the DBFN and 4Dvar errors. With respect to the vertical error (Fig. 8), the DBFN and the ONDG performed better for the upper ocean than 4Dvar. Clearly, the PLS also corrects the deep ocean velocity, but less accurately than 4Dvar. The first error mode is the barotropic one, i.e. it has the same sign over all depths, and accounts for 97 % of the error variability for 4Dvar, 96 and 93 % for DBFN and ONDG, respectively. Although the first mode is the barotropic one for all methods, the 4Dvar barotropic mode of error is out of phase with respect to the PLS barotropic mode. This reflects the better performance of the 4Dvar for the deep ocean and the better performance of the DBFN and ONDG for the upper ocean.
The second mode, which accounts for almost all the remaining variability, has a sign inversion with depth and is found especially over the main axis of the jet. In this region the deep ocean velocities are overestimated due to spurious covariances between the SSH and the deep ocean velocities.
The way both methods correct the model depends on the B matrix in the 4Dvar algorithm and on the regression model B PLS in the DBFN. It means that results may be different if another approximation of B and another model regression model are used. Perhaps the main conclusion of this comparison is that the DBFN, which is easier to implement and cheaper to execute, can produce results similar to 4Dvar. Also, it is shown that iteration is an important aspect of the method. Iterations use the information contained in the observations and in the model equations to reduce the uncer-  tainty on the initial state. Moreover, the iterations allow us to put information from the observations into the model, without causing initialisation problems since the nudging gain can be taken as smaller than the one used for the direct nudging due to the possibility of using more than once the same set of observations.

Length of the DAw
Sensitivity tests with respect to the length of the DAw are presented. As we have shown in Sect. 4, the accuracy of the backward model is inversely proportional to the length of the DAw. Therefore, in this section we present experiments using a DAw of 5 days and 30 days. The experiment configuration is similar to those presented in the previous section. The methods are cycled according to the description given in Sect. 5.3. Figure 9 shows the evolution of the initial condition rms errors for the zonal velocity and temperature during the DBFN iterations over the first assimilation cycle, for three DAw (including the ten day-window used previously). When considering only one iteration, the best results were obtained with the 30 day window experiment. This is a consequence of the asymptotic character of the nudging method: the longer the assimilation window, the more observations accounted for, the smaller the error. This changes when several iterations are considered. The observed divergence for the 30 day window is due to the errors induced by the over-diffusion that induce great increments, which by their nature are not modelled by the ensemble of model states used to construct the regression model. Figure 10 shows the forecast rms error for the DBFN and 4Dvar experiments for three DAw: 5, 10 and 30 days. The methods exhibited comparable performances depending on the length of the DAw. For the DBFN, the 5 and 10 day DAw provided better results than the 30 day window, while for the 4Dvar the 30 day window provided the best estimation in terms of rms error. The DBFN and 4Dvar experiments using the 30 and 5 day DAw, respectively, failed to identify the initial conditions, since their SSH rms errors are greater than the observation error standard deviation. Figure 11 shows the time evolution of vertical profiles of horizontally layer-wise averaged forecast rms errors of zonal velocities for the DBFN and 4Dvar experiments. The 4Dvar profits from the longer DAw to spread the observation to the three-dimensional variables. This is done by the iterations of the direct model and by the B matrix. For the DBFN experiments, after 1 year of data assimilation, the errors in the deep ocean start to grow. This is due to the high variance of the PLS estimator for deep layers. The problem becomes more evident in the second year, because at this stage the observations are farther from the model states used to construct the regression coefficients. Therefore, this means that this behaviour is not intrinsic to the DBFN algorithm and its diffusive aspects, but is due to our implementation. Ideally, the regression model should evolve in time, similarly to the Figure 9. Evolution of the initial condition rms errors for the zonal velocity and temperature during the DBFN iterations over the first assimilation cycle, for three DAw: 5, 10 and 30 days. Kalman filter scheme. The 4Dvar has good performance in the deep ocean thanks to the use of a vertical localisation with a length scale of 1500 m.
Next we investigate which scales are better represented by each assimilation method. This is done by comparing the sur- face kinetic energy spectrum and the deep ocean kinetic spectrum produced by each method. Figure 12 shows that the effective resolution of the model is not affected by the diffusive character of the DBFN algorithm. It is clear that there is a reduction of the energy for the scales close to the grid scale, but the energy contained in scales greater than 7 × x is not affected. It means that the diffusion-induced errors presented in Sect. 4 are "controlled" by the assimilation of sea surface height observations.
There is no great difference between the DBFN and 4Dvar surface spectrum for the assimilation windows shorter than 30 days, which once more proves the reliability of the DBFN for the assimilation of oceanic observations. The energy spectra for the deep ocean velocities produced by the DBFN contain more energy than the true spectrum independently of the used DAw. This confirms that the deep ocean velocity errors are due to the high variance of the PLS regression model. Geophys., 22, 233-248, 2015 www.nonlin-processes-geophys.net/22/233/2015/ Figure 12. Kinetic energy mean power spectra calculated using the first layer (top) and a layer at 2660 m (bottom) and using the 650 days of the assimilation experiments using the DBFN (left) and the 4Dvar (right). Blue curves represent the "true" power spectra; green curves represent the power spectra calculated for the 5 day DAw; red curves represent the power spectra calculated for the 10 day DAw; and black curves represent the power spectra calculated for the 30 day DAw. In the bottom abscissa the tick labels stand for longitudinal wave number (rad m −1 ) while in the top abscissa the tick labels stand for the corresponding wavelengths in km units.

Observation density and first guess
Finally, two new experiments similar to the one presented in Sect. 6.1 and assimilating complete daily fields of SSH are presented. The first one uses the same initial condition of the previously presented experiments and its goal is to investigate the role of the number of assimilated observations in the results. Despite the fact that the problem continues to be underestimated, in this case the SSH analysis is no longer dependent on the SSH spatial covariance, and the unstable modes associated with the SSH dynamics are certainly observed. The analysis produced for the other state vector variables remains dependent on matrices B for the 4Dvar case andB PLS for the DBFN case. Figure 13 shows the forecast rms error for the SSH and zonal velocity. The results are quite similar to the results presented in Sect. 6.1 with a lower rms error for all variables for both methods. Figure 14 shows the initial condition error for the zonal velocity produced by both methods for the satellitelike observations and the complete observations experiments. The figure reveals that while in some places the inclusion of more observations helps to reduce the error, in other places it increases the error. This means that although much more information could be extracted from the new set of observations, which decreases the global rms errors, the solution remains dependent on the covariance structures contained on B andB PLS .  The second experiment is initialised with an initial condition that is 20 days apart from the one used previously, and is closer in terms of rms error to the observations. We call this experiment a perturbed experiment. In this case, the objective is to analyse the sensitivity of the solution to the choice of the first guess. Thus, only one assimilation cycle is performed. Figure 15 shows the initial condition error for the SSH produced by both methods for the perturbed and non-perturbed experiments. Since the perturbed initial condition is not much different from the unperturbed one, the analysis errors have the same structure in both cases, but they differ from one method to another. The DBFN produced smaller differences between the perturbed and non-perturbed experiences than the 4Dvar for the entire domain. A remarkable difference between the errors produced by the 4Dvar and the DBFN is the error structure in the western boundary that is produced by the DBFN, which is positive northward of 34 • N and negative southward of 34 • N. The presence of this structure is related to the fact that the DBFN analysis is the final condition produced by the backward model. The same pattern was also observed in Fig. 3, which shows the backward error for the SSH variable. The rms error of the identified trajectory for the perturbed experiment may be seen in Fig. 13 as the green (4Dvar) and black (DBFN) dashed curves. The results clearly show that, for the configured experiments, the DBFN is much less sensitive to the first guess than the 4Dvar.
The small sensitivity of the DBFN to the first guess is in accordance with the theoretical result about the BFN presented by Auroux and Blum (2005) that states that for a linear system and under complete observation condition the identified trajectory is independent of the first guess. To what extent this theoretical result may be extended to non-linear systems assimilating an incomplete set of observations, such as the one studied in this article, we do not know. The results presented here suggest that the use of the DBFN may be advantageous in situations in which the system passes by strong changes resulting in a background (first guess) that is far from the true state.

Conclusions and perspectives
This study used the NEMO general circulation model in a double gyre configuration to investigate the diffusive backand-forth nudging performance under different configurations of the data assimilation window, observation network and initial conditions, and to compare it with 4Dvar.
It has been shown that the reliability of the backward integration should be carefully examined when the BFN/DBFN is applied to non-reversible systems. This should support the choice of the assimilation window and identify whether the available observations are sufficient to control the errors induced by the non-reversible terms of the model equations. In this article we have shown that the DBFN might be used for the assimilation of realistically distributed ocean observations, despite the limited accuracy of the backward integration. Improving the backward integration would further improve the DBFN performance and make possible the use of longer assimilation windows.
Our results show that the DBFN can produce results comparable to 4Dvar using lower computational power. This is because DBFN demands fewer iterations to converge and because one iteration of 4Dvar corresponds to one integration of the tangent linear model, one integration of the adjoint model, which costs 4 times more than one standard model integration, plus the cost of minimising the cost function, while the DBFN iteration costs twice the integration of the non-linear model.
The sensitivity tests show that for the 4Dvar, long assimilation windows should be preferably used, because they favour the propagation of the SSH information to the deep layers. For the DBFN, short windows are preferable because they reduce the effect of the diffusion-induced errors. In future works it would be beneficial to account for these errors when constructing the nudging gain.
Moreover, the results show that for systems assimilating a much reduced number of observations with respect to the size of the state space, such as ocean data assimilation systems usually do, the set-up of the covariance matrix is a key step, since this matrix propagates the information from the observed variables to the non-observed variables. In addition, although in this study the methods have been configured with different covariance matrices, the results show that the DBFN is less sensitive to the background field than the 4Dvar.
Finally, it appears that the DBFN algorithm is worth exploring further, in both theoretical and practical aspects, especially those related to the optimisation of the matrix K and applications to a more realistic configuration.