Ever since its inception, the ensemble Kalman filter (EnKF) has elicited many heuristic approaches that sought to improve it. One such method is covariance localization, which alleviates spurious correlations due to finite ensemble sizes by using relevant spatial correlation information. Adaptive localization techniques account for how correlations change in time and space, in order to obtain improved covariance estimates. This work develops a Bayesian approach to adaptive Schur-product localization for the deterministic ensemble Kalman filter (DEnKF) and extends it to support multiple radii of influence. We test the proposed adaptive localization using the toy Lorenz'96 problem and a more realistic 1.5-layer quasi-geostrophic model. Results with the toy problem show that the multivariate approach informs us that strongly observed variables can tolerate larger localization radii. The univariate approach leads to markedly improved filter performance for the realistic geophysical model, with a reduction in error by as much as 33 %.

Data assimilation

EnKF is an important family of data assimilation techniques that propagate both the mean and covariance of the state uncertainty through the model using a Monte Carlo approach.
While large dynamical systems of interest have a large number of modes along which errors can grow, the number of ensemble members used to characterize uncertainty remains relatively small due to computational costs.
As a result, inaccurate correlation estimates obtained through Monte Carlo sampling can profoundly affect the filter results. Techniques such as covariance localization and inflation have been developed to alleviate these problems

Localization techniques take advantage of the fundamental property of geophysical systems that correlations between variables decrease with spatial distance

The performance of the EnKF algorithms critically depends on the correct choice of localization radii (also known as the decorrelation distances), since values that are too large fail to correct for spurious correlations, while values that are too small throw away important correlation information.
However, the physical values of the spatial decorrelation scales are not known a priori, and they change with the temporal and spatial location. At the very least the decorrelation scales depend on the current atmospheric flow. In atmospheric chemistry systems, because of the drastic difference in reactivity, each chemical species has its own individual localization radius

Adaptive localization schemes seek to estimate decorrelation distances from the data, so as to optimize the filter performance according to some criteria. One approach to adaptive localization utilizes an ensemble of ensembles to detect and mitigate spurious correlations

This work develops a Bayesian framework to dynamically learn the parameters of the Schur-product-based localization from the ensemble of model states and the observations during the data assimilation in geophysical systems. Specifically, the localization radii are considered random variables described by parameterized distributions and are retrieved as part of the assimilation step together with the analysis states. One of the primary goals of this paper is to develop ways in which such an approach could be extended to both multivariate and time-dependent 4-D-esque cases. We prove the approach's empirical validity through a type of idealized variance minimization that has access to the true solution (which we call an oracle). We then show that the approach provides a more stable result with a much larger initial radius guess. The exploration of the idea is done through the use of several test problems such as that of the Lorenz'96 problem, a multivariate variant of which we introduce specifically for this paper, and a more realistic quasi-geostrophic model to showcase the applicability of the method to scenarios more in line with operational ones.

The paper is organized as follows. Section

We consider a computational model that approximates the evolution of a physical dynamical system such as the atmosphere:

The initial state of the model is also not precisely known, and to model this uncertainty we consider that it is a random variable drawn from a specific probability distribution:

Consider an ensemble of

The mean and the covariance are propagated first through the forecast step. Specifically, each ensemble member is advanced to the current time using the model (

The mean and covariance are then propagated through the analysis step, which fuses information from the forecast mean and covariance and from observations (Eq.

Covariance localization involves the Schur (element-wise) product between a symmetric positive semi-definite matrix

We seek to generate the entries of the localization matrix

If the spatial discretization is time-invariant, and

A common localization function used in production software is due to Gaspari and Cohn

It is intuitively clear that different physical effects propagate spatially at different rates, leading to different correlation distances. Consequently, different state-space variables should be analyzed using different radii of influence. This raises the additional question of how to localize the covariance of two variables when each of them is characterized by a different radius of influence.
One approach

We define the mapping operator

To this end, we introduce a commutative, idempotent, binary operation,

One of the common criticisms of a distance assumption about the correlation of geophysical systems is that two variables in close proximity to each other might have very weak correlations.
For example, in a model that takes into account the temperature and concentration of stationary cars at any given location, the two distinct types of information might not at all be correlated with each other.
The physical distance between the two, however, is 0, and thus any single correlation function will take the value 1 and does not remove any spurious correlations.
One can mitigate this problem by considering univariate localization functions for each pair of components

We denote the analysis step of the filter by

In the Bayesian framework, we consider the localization parameters to be random variables with an associated prior probability distribution. Specifically, we assume that each of the radii

The assimilation algorithm computes a posterior (analysis) probability density over the state space considering the probabilities of observations and parameters. We start with Bayes' identity:

The negative log likelihood of the posterior probability (Eq.

Under the assumptions that the analysis function (

One will note that the form of our cost function (

The choice of using the DEnKF is a bit arbitrary. From the above, however, it is evident that a method that decouples the anomaly updates from the mean updates would most likely be more advantageous. A perturbed observation EnKF does not have this property and thus would incur significantly more computational effort in optimizing the cost function. Extending this idea to a square-root filter, like the ETKF, would require significant algebraic manipulation and heuristics which are outside the scope of this paper.

We now seek to extend the 3D-Var-like cost function (

Various 4-D-type approximation strategies are also applicable to this cost function extension, though they are outside of the scope of this paper.

In practice, instead of dealing with the Gamma distribution parameters of

In order to validate our methodology, we carry out twin experiments under the assumption of identical perfect dynamical systems for both the truth and the model. The analysis accuracy is measured by the spatio-temporally averaged root mean square error:

For each of the models we repeat the experiments with different values of the inflation constant

All initial value problems used were independently implemented

We will make use of oracles to empirically evaluate the performance of the multivariate approach to Schur-product localization. An oracle is an idealized procedure that produces close to optimal results by making use of all the available information, some of which is unknown to the data assimilation system. In our case the oracle minimizes cost functions involving the true model state.
Specifically, in an ideal filtering scenario one seeks to minimize the error of the analysis with respect to the truth, i.e., the cost function,

The 40-variable Lorenz model

The Lorenz'96 model equations,

The initial conditions used for the experiments are obtained by starting with

The physical distance between

For the numerical experiments, we consider a perfect model and noisy observations.
We take a 6 h assimilation window (corresponding to

Figure

Generalized mean oracles for Lorenz'96. Comparison of the various

We also test both the validity of arbitrarily grouping the radii and the validity of using a time-distributed cost function (Eq.

Time-dependent 4-D oracle for Lorenz'96. Comparison of the RMSE for a radius oracle that is both multivariate and time-dependent. The

Adaptive localization results for Lorenz'96 are shown in Fig.

Lorenz'96 adaptive localization results. Comparison of the best univariate localization radius results with their corresponding adaptive localization counterparts.

The canonical Lorenz'96 model is ill suited for multivariate adaptive localization as each variable in the problem behaves identically to all the others. This means that for any univariate localization scheme a constant radius is close to optimal.

We modify the problem in such a way that the average behavior remains very similar to that of the original model, but that instantaneous behavior requires different localization radii. In order to accomplish this we use the time-dependent forcing function that is different for each variable:

Calculated optimal single-time and time-averaged covariance matrices for the multivariate Lorenz'96 model. Comparison of ensemble covariance matrices for the multivariate Lorenz'96 equations for a single time step

For each individual variable the forcing value cycles between 4 and 12, with an average value of 8, just like in the canonical Lorenz'96 formulation. If taken to be a constant, the forcing factor of 4 will make the equation lightly chaotic with only one positive Lyapunov exponent, whilst a constant value of 12 will make the dynamics have about 15 positive Lyapunov exponents. Our modified system still has the same average behavior with 13 positive Lyapunov exponents. The mean doubling times of the two problems are also extremely similar at around 0.36. This is the ideal desired behavior. Figure

Error measurements will be carried out over the interval

Figure

Time-dependent multivariate Lorenz'96 4-D adaptive localization. The constant radius case shows the minimal error when the localization radius is varied between set predefined values. The adaptive localization case has four radii groupings,

As before, the results for the constant radius were their optimal value for each given inflation value, while the adaptive results were obtained through a search for possible means and variances around that value. The largest reduction in error is only about 8 %; however, this is a significant improvement over the behavior of the univariate Lorenz'96.
In the canonical Lorenz'96 there is no meaningful choice of grouping other than arbitrary, but in this case,
the groups were chosen such that all related variables have the same forcing from Eq. (

Figure

ML96 sample radii. Radii for the configuration in Fig.

This gives us insight into a potential way of choosing multivariate localization groups. Based on some measure of the observability of any given state-space variable, similarly “observable” state-space variables should have similar radii.

Tightly coupled models like the multivariate Lorenz'96 have rapidly diverging solutions, and constraining them requires more information about the underlying dynamics. Incorporating future observations and adding degrees of freedom to the cost function increase the performance of our analysis. In the limiting case of one radius per variable and general information from the future one approaches a variant of 4DenVar, which is in principle superior to any pure filtering method.

The 1.5-layer quasi-geostrophic model of Sakov and Oke

A second-order central finite difference spatial discretization of the Laplacian operator

The time between consecutive observations is 5 time units, and the model is run for 3300 such cycles. The first 300 cycles, corresponding to the filter spinup, are discarded, and therefore the assimilation interval is

Quasi-geostrophic model. A typical model state of the 1.5-layer quasi-geostrophic model.

Our rough estimate of the number of positive Lyapunov exponents of this model is 1451, with a Kaplan–Yorke dimension estimate of 6573.4; thus, we will take a conservative 25 ensemble members whose initial states are derived from a random sampling of a long run of the model.

This model has been tested extensively with both the DEnKF and various localization techniques

The adaptive localization results for the quasi-geostrophic problem are shown in Fig.

Quasi-geostrophic model adaptive localization. The inflation factor is kept constant, and

Quasi-geostrophic model adaptive localization raw RMSE. The green line represents the same optimal constant localization radius as in Fig.

A better representation of how well the adaptive localization scheme works is by showing its consistency. The empirical utility of the adaptive localization technique is further analyzed in Fig.

QGSO sample radii. For the configuration in Fig.

Figure

This paper proposes a novel Bayesian approach to adaptive Schur-product-based localization. A multivariate approach is developed, where multiple radii corresponding to different types of variables are taken into account. The Bayesian problem is solved by constructing 3-D and 4-D cost functions and minimizing them to obtain the maximum a posteriori estimates of the localization radii. We show that in the case of the DEnKF these cost functions and their gradients are computationally inexpensive to evaluate and can be relatively easily implemented within existing frameworks. We provide a new approach for assessing the performance of adaptive localization approaches through the use of restricted cost function oracles.

The adaptive localization approach is tested using the Lorenz'96 and quasi-geostrophic models. Somewhat surprisingly, the adaptivity produces better results for the larger quasi-geostrophic problem. This may be due to the ensemble analysis anomaly independence assumption made in Sect.

We believe that the algorithm presented herein has a strong potential to improve existing geophysical data assimilation systems that use ensemble-based filters such as the DEnKF. In order to avoid filter divergence in the long term, these systems often use a conservative localization radius and a liberal inflation factor. The QG model results indicate that, in such cases, our adaptive method outperforms the approach based on a constant localization. The approach leads to a reduction of as much as 33 % in error. For a severely undersampled ensemble, the approach appears to improve the quality of the analysis substantially, potentially because the need for localization is significantly greater than for a small toy problem like L96. The new adaptive methodology can replace the existing approach with a relatively modest implementation effort.

Future work to extend the methodology includes finding good approximations of the probability distribution of the localization parameters, perhaps through a machine learning approach, and reducing the need for assumption that the ensemble members are independent identically distributed random variables.
A future direction of interest is applying this methodology to a larger operational model, e.g., the Weather Research and Forecasting Model (WRF)

No data sets were used in this article.

All the authors contributed equally to this work.

The authors declare that they have no conflict of interest.

The authors thank the Computational Science Laboratory at Virginia Tech. The authors also thank the anonymous referees and the editor for their constructive input that significantly helped with the quality of the paper.

This research has been supported by the Air Force Office of Scientific Research (grant no. DDDAS FA9550-17-1-0015), the National Science Foundation, Division of Computing and Communication Foundations (grant no. CCF-1613905), and the National Science Foundation, Division of Advanced Cyberinfrastructure (grant no. ACI-17097276).

This paper was edited by Zoltan Toth and reviewed by two anonymous referees.