Multivariate localization methods for ensemble Kalman filtering

In ensemble Kalman filtering (EnKF), the small number of ensemble members that is feasible to use in a practical data assimilation application leads to sampling variability of the estimates of the background error covariances. The standard approach to reducing the effects of this sampling variability, which has also been found to be highly efficient in improving the performance of EnKF, is the localization of the estimates of the covariances. One family of localization techniques is based on taking the Schur (element-wise) product of the ensemble-based sample covariance matrix and a correlation matrix whose entries are obtained by the discretization of a distance-dependent correlation function. While the proper definition of the localization function for a single state variable has been extensively investigated, a rigorous definition of the localization function for multiple state variables that exist at the same locations has been seldom considered. This paper introduces two strategies for the construction of localization functions for multiple state variables. The proposed localization functions are tested by assimilating simulated observations experiments into the bivariate Lorenz 95 model with their help.


Introduction
The components of the finite-dimensional state vector of a numerical model of the atmosphere are defined by the spatial discretization of the state variables considered in the model. An ensemble-based Kalman filter (EnKF) data assimilation scheme treats the finite-dimensional state vector as a multivariate random variable and estimates its Introduction of a background (prior) distribution based on the latest observations. The background distribution is represented by an ensemble of short-term forecasts from the previous analysis time. This ensemble is called the background ensemble. Because the number of background ensemble members that is feasible to use in a realistic atmospheric model is small, the estimates of weak covariances (the entries 5 with small absolute values in the background covariance matrix) tend to have large relative estimation errors. These large relative errors have a strong negative effect on the accuracy of an EnKF estimate of the analysis mean. The standard approach to alleviating this problem is to apply a physical-distance-dependent localization to the sample background covariances before their use in the state update step of the EnKF. 10 In essence, localization is a method to introduce the empirical understanding that the true background covariances tend to rapidly decrease with distance into the state estimation process.
Data assimilation schemes treat the spatially discretized state vector, x, as a multivariate random variable. We use the conventional notation x b and x a for the back- 15 ground and the analysis state vectors, respectively. We also use the notation y • for the vector of observations. In an EnKF scheme, the analysis mean, x a , is computed from the background mean, x b , by the update equation The function h(·) is the observation function, which maps the finite-dimensional state 20 vector into observables. Thus, h(x b ) is the ensemble mean of the prediction of the observations by the background. The matrix is the Kalman gain matrix, where P b is the background covariance matrix, H is the linearization of h about x b and R is the observation error covariance matrix. The entry K i j Introduction of K determines the effect of the j th observation on the i th component of the analysis mean, x a . Under the standard assumption that the observation errors are uncorrelated, the matrix, R, is diagonal. Hence, the way the effect of the observations is spread from the observations to the different locations and state variables is determined by P b and H. The sampling variability in the estimates of P b affects the accuracy of the informa-5 tion propagated in space and between the different state variables through the matrix products, P b H T and HP b H T . The goal of localization is to reduce the related effects of sampling variability on the estimates of K. Over the years, many different localization methods have been proposed. Hamill et al. (2001); Houtekamer andMitchell (1998, 2001); Hunt et al. (2007); Ott et al. 10 (2004), and Whitaker and Hamill (2002) used localization functions which set the covariance to zero beyond a certain distance (localization radius). Jun et al. (2011) proposed a nonparametric statistical method to estimate the covariance. Anderson (2007) used a hierarchical ensemble filter which estimates the covariance using an ensemble of ensemble filters. Hodyss (2007, 2009a, b) adaptively determined 15 the width of localization by computing powers of the sample correlations. Buehner and Charron (2007) examined the spectral and spatial localization of error covariance. Anderson and Lei (2013) and Lei and Anderson (2014) proposed an empirical localization function based on the output of an observing system simulation experiment.
The focus of the present paper is on the family of schemes that localize the covari-20 ances by taking the Schur (Hadamard) product of the sample background covariance matrix and a correlation matrix of the same size, whose entries are obtained by the discretization of a distance-dependent correlation function with local (compact) support (e.g., Hamill et al., 2001;Houtekamer and Mitchell, 2001;Whitaker and Hamill, 2002). Such a correlation function is usually called a localization or taper function. far apart in space to zero. This property of the filtered background covariances can also be exploited to increase the computational efficiency of the EnKF schemes.
Equations (1) and (2) provide the solution of a formulation of the data assimilation problem that assumes that P b is invertible (e.g., Sects. 4.2.1 and 4.2.3 of Szunyogh, 2014). Because P b is symmetric, its eigenvalues are real and non-negative, which 5 implies that it is invertible, only if it is also positive-definite. (An n × n symmetric matrix A is positive-definite if x T Ax > 0 for all non-zero vectors x ∈ R n .) Thus, Eqs. (1) and (2) provide a solution of the formal data assimilation problem for a localized estimate of P b , only if that localized estimate is symmetric and positive-definite.
Because the computation of the right-hand-side of Eq.
(2) does not require the in-10 vertibility of P b , singularity of the localized P b usually does not lead to a breakdown of the computations in practice. This does not change the fact, however, that the solution obtained by Eqs. (1) and (2) can no longer be the solution of the problem that they were designed to solve. In addition, an ill-conditioned estimate of P b can degrade the conditioning (increase the condition number) of HP b H T + R, making the numerical 15 computation of the right-hand side of Eq.
(2) less stable. A realistic atmospheric model has multiple scalar state variables (e.g., temperature, coordinates of the wind vector, surface pressure, humidity). If a univariate localization function, such as that described by Gaspari and Cohn (1999), is applied directly to a multivariate state vector, the resulting localized background covariance matrix 20 may not be positive-definite. This motivates us to seek rigorously-derived multivariate localization functions for ensemble Kalman filtering. As will be demonstrated, such rigorously-derived multivariate localization functions often produce more accurate analyses than those that apply the same univariate localization functions to each scalar component of the state vector. Kang et al. (2011) also introduced a multivariate lo- 25 calization method that zeros out covariances between physically unrelated variables. Their motivation to zero out cross-covariances, however, was to filter apparent spurious covariances rather than to preserve the positive-definiteness of the background error covariance matrix. In our search for proper multivariate localization functions, we take advantage of recent developments in the statistics literature. In particular, we use the localization functions developed in Porcu et al. (2012), who studied the radial basis functions to construct multivariate correlation functions with compact support. Note that Sect. 5 in Zhang and Du (2008) described general methodology for covariance tapering in the 5 case of multiple state variables. Du and Ma (2013) used a convolution approach and a mixture approach to derive covariance matrix functions with compactly supported marginal and cross-covariances. Kleiber and Porcu (2015) constructed nonstationary correlation functions with compact support for multivariate random fields. Genton and Kleiber (2015) reviewed approaches to building cross-covariance models such as com-10 pactly supported correlation functions for multivariate Gaussian random fields.
The rest of the paper is organized as follows. Section 2 briefly describes EnKF and localization for the special case of two state variables. Section 3 describes the bivariate Lorenz-95 model we use to test our ideas. Section 4 summarizes the main results of the paper.

Univariate localization
In principle, localization can be implemented by using filtered estimates of the background covariances rather than the raw sample covariances to define the matrix, P b , used in the computation of K by Eq. (2). The filtered (localized) version of covariance 20 matrix, P b , is obtained by computing the Schur (entry-wise) product: where C is a correlation matrix, which has the same dimensions as the sample covariance matrix,P b . In practice, however, the localization is often done by taking advantage of the fact that localization affects the analysis through P b H T and HP b H T , or, ultimately, Introduction

Tables Figures
Back Close

Full Screen / Esc
Printer-friendly Version

Interactive Discussion
Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | through K. In particular, because a distance, d , can be defined for each entry, K i j , of K by the distance between the i th analyzed variable and the j th observation, the simplest localization strategy is to set all entries, K i j , that are associated with a distance longer than a prescribed localization radius, R (d > R), to zero, while leaving the remaining entries unchanged (e.g., Houtekamer and Mitchell, 1998;Ott et al., 2004;Hunt et al., 5 2007). Another approach is to localize P b H T and HP b H T by a tapering function (e.g., Hamill et al., 2001;Houtekamer and Mitchell, 2001). The usual justification for this approach is that H is typically the linearization of a local interpolation function, h(·), for which the localized matrix products provide good approximations of the products computed by us-10 ing localized estimates of P b . Note that P b H T is the matrix of background covariances between the state variables at the model grid points and at the observation locations, while HP b H T is the matrix of background covariances between the state variables at the observation locations. Thus, a distance can be associated with each entry of the two matrix products, which makes the distance-dependent localization of the two prod-15 ucts possible. The approach becomes problematic, however, when h(·) is not a local function, which is the typical case for remotely sensed observations (Campbell et al., 2010).
We consider the situation where localization is applied directly to the background error covariance matrix,P b . Recall that the localized covariance matrix is expressed as 20 in Eq. (3). In particular, C is a positive-definite matrix with strictly positive eigenvalues, while the sample covariance matrix,P b , may have zero eigenvalues (as it is only nonnegative definite). The localization in Eq.
(3) helps to eliminate those zero eigenvalues ofP b and alleviate the related large relative estimation errors. The positive-definiteness of C ensures that localization does not introduce new zero eigenvalues in the process 25 of eliminating the zero eigenvalues ofP b . The proper definition of the localization function that ensures that C is positive-definite has been thoroughly investigated for the univariate case (N = 1) in the literature (e.g. Gaspari and Cohn, 1999

Multivariate localization
We now consider a model with N state variables. For instance, we take a simple model based on the hydrostatic primitive equations. This model solves the equations for the two horizontal components of wind, the surface pressure, the virtual temperature and a couple of atmospheric constituents. The state of the model is represented by the state . ., N, represents the spatially discretized state of the i th state variable in the model. The sample background covariance matrix,P b , can be partitioned aŝ The In order to get a proper matrix-valued localization function, ρ, a seemingly obvious approach to extend the results of Gaspari and Cohn (1999) would be to compute the entries of C based on a univariate correlation function for a multivariate variable. That is, for the pair of state variables i and j , we localize the corresponding sample background covariance matrix,P b i j , by multiplying a localization matrix from the same correlation 5 function for all i and j . Formally, this would be possible because the distance d is uniquely defined for each entry ofP b the same way in the multivariate case as in the univariate case. This approach, however, cannot guarantee the positive-definiteness of the resulting matrix, C. As a simple illustrative example, consider the situation where the discretized state vector has only two components that are defined by two different 10 scalar state variables at the same location (e.g., the temperature and the pressure). In this case, if n is the number of locations, the localization matrix for the two state variables together can be written as independently of the particular choice of the localization function. Here C 0 is an n × n 15 localization matrix from a univariate localization function. From Eq. (5), it is clear that n eigenvalues of C are zero and the rank of C is n, while its dimension is 2n × 2n. As in Eq. (2), although C is rank-deficient and thus so is the localized covariance matrix P b , we may still be able to calculate the inverse of We therefore propose two approaches to construct positive-definite (full rank) matrixvalued localization functions, ρ(d ). The first proposed method takes advantage of the knowledge of a proper univariate localization function, ρ. Instead of using the same correlation function to localize multiple state variables, for a certain distance lag, we let ρ = ρ · B, where B is an N × N symmetric, positive-definite matrix whose diago-5 nal entries are one. It can be easily verified that ρ is a matrix-valued positive-definite function, which makes it a valid multivariate localization function. For instance, in the hypothetical case where the two components of the state vector are two different state variables at the same location, making the choice 10 with |β| < 1, leads to rather than what is given in Eq. (5). Since the eigenvalues of the matrix B are 1±β > 0, it can be easily verified that the matrix in Eq. (7) is positive-definite. For the case with more than two state variables (N ≥ 3), the matrix B can be parametrized as B = LL T 15 with L a lower triangular matrix whose diagonal entries are positive. Furthermore, we need the constraint that for each row of L, the sum of all of the squared elements should be 1 in order to have the diagonal entries of B to be one. Other than these constraints, the elements of L can vary freely in order to guarantee the positive-definiteness of B.
An attractive feature of this approach is that we can take advantage of any known 20 univariate localization function to produce a multivariate localization function. However, the multivariate localization function from this approach is separable in the sense that the multivariate component (i.e., B) and the localization function (i.e. ρ) are factored. Another limitation of the approach is that the localization radius and decay rate are the same for each pair of state variables, leaving no flexibility to account for the potential differences in the correlation lengths and decay rate for the different state vector components. The second proposed method takes advantage of the availability of multivariate compactly supported functions from the spatial statistics literature. To the best of our knowl-5 edge, only a few papers have been published on this subject; one of them is Porcu et al. (2012). The function class they considered was essentially a multivariate extension of the Askey function (Askey, 1973) For instance, a bivariate Askey function, which is a special case of the results of Porcu et al. (2012), is given by (i , j = 1, 2) where c > 0, µ 12 = µ 21 ≤ 1 2 (µ 11 + µ 22 ), ν ≥ 1 2 s + 2, β i i = 1 (i = 1, 2), β 12 = β 21 , and |β 12 | ≤ Γ(1 + µ 12 ) Γ(1 + ν + µ 12 ) Here, Γ(·) is the gamma function (e.g., Wilks, 2006), s is the dimension of the Euclidean space where the state variable is defined, and [x] is the largest integer that is equal 15 to or smaller than x. The Askey function in Eq. (8) has the support c because it sets covariances beyond a distance c to zero. It can be seen from Eq. (9) that, if the scalars, µ i j , are chosen to be the same for all values of i and j , the condition on β 12 for ρ to be valid is |β 12 | ≤ 1. Note that, for this choice, the second method is essentially the same as the first method with the Askey function set to ρ. The localization function given by 20 Eq. (8) is more flexible than the functions of the first method with the Askey function set to ρ because µ i j can be chosen to be different for each pair of indexes, i and j . The localization length, however, is still the same for the different pairs of the state variables.
To illustrate the differences between the shape of the Gaspari-Cohn and the Askey functions, we show the Gaspari-Cohn function for c = 25 and the univariate Askey Introduction function for c = 50, ν = 1, 2, 3, and µ 11 = 0 (Fig. 1). This figure shows that for a given support, the Askey functions are narrower.

The EnKF scheme
There are many different formulations of the EnKF update equations, which produce 5 not only an updated estimate of the mean, but also the ensemble of analysis perturbations that are added to the mean to obtain an ensemble of analyses. This ensemble of analyses serves as the ensemble of initial conditions for the model integration that produce the background ensemble. In our experiments, we use the method of perturbed observations. It obtains the analysis mean and the ensemble of analysis perturbations 10 by the equations where x k , k = 1, 2, . . ., M are the ensemble perturbations and y o k , k = 1, 2, . . ., M are random draws from the probability distribution of observation errors. As the notation 15 suggests, we consider a linear observation function in our experiments. This choice is made for the sake of simplicity and limits the generality of our findings much less than the use of an idealized model of atmospheric dynamics.
For the case of multiple state variables, the ensemble members are considered to be in a single ensemble, that is, not being grouped into distinct sub-ensembles.

The Bivariate Lorenz model
The idealized model we use is the bivariate Lorenz-95 model (Lorenz, 1995). The model mimics the nonlinear dynamics of two linearly coupled atmospheric state variables, X and Y , on a latitude circle. The variable, X , is a "slow" variable represented by K discrete values, X k , and Y is a "fast" variable represented by J × K discrete values. 5 The governing equations are where Y j −J,k = Y j ,k−1 and Y j +J,k = Y j ,k+1 for k = 1, . . ., K and j = 1, . . ., J. The "boundary condition" is periodic; that is, X k−K = X k+K = X k , and Y j ,k−K = Y j ,k+K = Y j ,k . In our 10 experiments, K = 36 and J = 10. The parameter h controls the strength of the coupling between X and Y , a is the ratio of the characteristic time scales of the slow motion of X to the fast motion of Y , b is the ratio of the characteristic amplitudes of X to Y , and F is a forcing term. We choose the parameters to be a = 10, b = 10, F = 10, and h = 2. These values of the model parameters are equal to those originally suggested 15 by Lorenz (1995), except for the value of the coupling coefficient h, which is twice as large in our case. We made this change in h to increase the covariances between the errors in the estimates of X and Y , which makes the model more sensitive to the choices of the localization parameters. We use a fourth-order Runge-Kutta time integration scheme with a time step of 0.005 non-dimensional units as Lorenz (1995) 20 did. We define the physical distances between X k 1 and X k 2 , between Y j 1 ,k 1 and Y j 2 ,k 2 , and between X k 1 and Y j 1 ,k 2 by |10(k 1 − k 2 )|, |10(k 1 − k 2 ) + j 1 − j 2 |, and |10(k 1 − k 2 ) − j 1 |, respectively. Figure 2 shows a typical state of the model for the selected parameters. The figure shows that X tends to drive the evolution of Y : the hypothetical process represented by Y is more active (its variability is higher) with higher values of X .

Experimental design
Since the estimates of the cross-covariances play a particularly important role at locations where one of the variables is unobserved, we expect an improved treatment of the cross-covariances to lead to analysis improvements at locations where only one of the state variables is observed. This motivates us to consider an observation scenario 5 in which X and Y are partially observed. The variable X is observed at randomly chosen 20 % of all locations and Y is observed at randomly chosen 90 % of those locations where X is not observed. Spatial locations of the partially observed X and Y are illustrated in Fig. 3. The results from this experiment are compared to those from a control experiment, in which both X and Y are fully observed.

10
We first generate a time series of "true" model states by a 2000-time-step integration of the model. We initialize an ensemble by adding the standard Gaussian noise to the true state; then, discarding the first 3000 time steps. We then generate simulated observations by adding random observation noise of mean zero and variance 0.02 to the the appropriate components of the "true" state of X at each time step. We use 15 the same procedure to generate simulated observations of Y , except that the variance of the observation noise is 0.005. Observations are assimilated at every time step by first using a 20-member ensemble with a constant covariance inflation factor of 1.015. The error in the analysis at a given verification time is measured by the rootmean-square distance between the analysis mean and the true state. We refer to the  Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | no more than 1.5 IQR from the box. Any values that fall outside of the end points of whiskers are considered outliers and they are displayed as circles.
In the boxplot figures in the next section, we compare the RMSE for four different localization schemes. We use the following notation to distinguish between them in the figures: 5 1. S1 -the bivariate sample background covariance is used without localization; 2. S2 -same as S1 except that the cross-covariance terms are replaced by zeros; 3. S3 -a univariate localization function is used to filter the marginal-covariances, while the cross-covariance terms are replaced by zeros; 4. S4 -one of the bivariate localization methods described in Sect. 2.2.2 is used to 10 filter both the marginal-and the cross-covariances.
In the experiments identified by S4, we consider two different bivariate localization functions: The first one is ρ (1) and |β| < 1. We use the fifth-order piecewise-rational function of Gaspari and Cohn (1999) to define the univariate correlation function, ρ (1) , in the following form, This correlation function attenuates the covariances with increasing distance, setting all the covariances to zero beyond distance 2c. So this function has the support 2c. If |β| < 1 and c is the same for both the marginal-and the cross-covariances, the matrix- The second multivariate correlation function we consider, ρ (2) , is the bivariate Askey function described in Sect. 2.2. In particular, we use µ 11 = 0, µ 22 = 2, µ 12 = 1, and ν = 3. According to Eq. (9), for these choices of parameters, the one remaining parameter, β 12 , must be chosen such that |β 12 | < 0.79. Figure 4 shows the distribution of RMSE for variable X for different configurations of the localization scheme in the case where the state is only partially observed. This figure compares the Askey function and Gaspari-Cohn function which have the same support (localization radius), so setting all the covariances to zero beyond the same distance. We recall that because X is much more sparsely observed than Y , we expect to see 10 some sensitivity of the analyses of X to the treatment of the cross-covariance terms. The figure confirms this expectation. A comparison of the results for configurations S1 and S2 suggests that ignoring the cross-covariances is a better strategy than to use them without localization. This conclusion does not hold once a univariate localization is applied to the marginal covariances, as using configuration S3 produces worse results 15 than applying no localization at all (S1). Figure 4 also shows that the distribution of the state estimation error is less sensitive to the choice of localization strategy for the larger values of support. Of all localization schemes, S4 with β = 0.1 performs best regardless of the localization radius: the distribution of the state estimation error is narrow with a mean value that is lower than those 20 for the other configurations of the localization scheme. For this choice of localization scheme and β, the Askey function produces smaller errors than the Gaspari-Cohn function, particularly, for smaller localization radii. Figure 5 is the same as Fig. 4, except for variable Y rather than for variable X . A striking feature of the results shown in this figure is that the Askey function clearly 25 performs better than the Gaspari-Cohn function. Another obvious conclusion is that using a smaller localization radius (a lower value of support) is clearly advantageous for the estimation of Y . This result is not surprising, considering that Y is densely observed and its spatial variability is much higher than that of X . In contrast to the results for variable X , configuration S3 produces much more accurate estimates of variable Y than do configurations S1 or S2. In addition, configuration S4 performs only slightly better, and only for the lowest value of support, than does configuration S3.

5
The latter observations indicate that the marginal covariances play a more important role than do the cross-covariances in the estimation of the densely observed Y . The proper filtering of the marginal covariances can thus greatly increase the accuracy of the estimates of Y . In other words, the densely observed Y is primarily estimated based on observations of Y . Hence, the low signal-to-noise ratio for the sample estimate of 10 the marginal covariances for Y greatly limits the value of the observations of Y at longer distances. Figure 6 is the same as Fig. 4, except for the case of a fully observed state. By comparing the two figures, we see that the analysis is far less sensitive to the localization radius in the fully observed case than in the partially observed case. As can be ex- 15 pected, the state estimates are also more accurate in the fully observed case. In the fully observed case, localization strategy S3 performs much better than do strategies S1 and S2 and similarly to S4. This result indicates that in the fully observed case, X is primarily analyzed based on observations of X , making the analysis of X more sensitive to the localization of the marginal covariances than to the localization of the 20 cross-covariances. Similar to the partially observed case, the Askey function tends to perform better than the Gaspari-Cohn function, but the differences between the accuracy of the state estimates for the two filter functions are negligible, except for the shortest localization radius.
Finally, Fig. 7 shows the distribution of the errors for variable Y in the fully observed 25 case. The best results are obtained by using a short localization radius with the Askey function, even though the variability of the error is relatively large in that case. The fact that localization strategies S3 and S4 perform similarly well shows that the estimates of the cross-covariances do not play an important role in this case; that is, X is pri- marily estimated based on observations of X and Y is dominantly estimated based on observations of Y . Figures 7-10 show the results for the 40-member ensemble. We use an inflation factor of 1.01, because the optimal value of the inflation factor is typically smaller for a larger ensemble. Figure 7 shows that the ensemble size has little effect on the es- 5 timates of X in the case of the partially observed state. For variable Y in the partially observed case (Fig. 8) and both variables X and Y in the fully observed case (Figs. 9 and 10), however, the best results are obtained with a larger localization radius than in the case of the 20-member ensemble. This behavior is expected, as a larger ensemble can more accurately estimate the weaker covariances associated with the longer 10 distances. As for the 20-member ensemble, the localization schemes using the Askey function perform better than those using the Gaspari-Cohn function.

Discussion
The central argument of this paper is that using a univariate localization function for the localization of both the marginal and the cross-covariances in an EnKF scheme may 15 lead to a rank deficient estimate of the background covariance matrix. We suggested two different approaches for the construction of positive-definite filtered estimates of the background covariance matrix. One of them takes advantage of the knowledge of a proper univariate localization function, whereas the other uses a multivariate extension of the Askey function. The results of our numerical experiments show that a math-20 ematically proper localization function often leads to improved state estimates. The results of the numerical experiments also suggest that of the two approaches we introduced, the one based on the Askey function produces more accurate state estimates than that based on the Gaspari-Cohn function. Introduction The Gaspari-Cohn covariance function with a localization constant c = 25 (support the Askey covariance function with a support parameter c = 50 and various shape s.