NPGNonlinear Processes in GeophysicsNPGNonlin. Processes Geophys.1607-7946Copernicus GmbHGöttingen, Germany10.5194/npg-22-645-2015Expanding the validity of the ensemble Kalman filter without the intrinsic need for inflationBocquetM.bocquet@cerea.enpc.frhttps://orcid.org/0000-0003-2675-0347RaanesP. N.HannartA.CEREA, Joint laboratory École des Ponts ParisTech and EDF R&D, Université Paris-Est, Champs-sur-Marne, FranceNansen Environmental and Remote Sensing Center, Bergen, NorwayMathematical Institute, University of Oxford, Oxford, UKIFAECI, CNRS-CONICET-UBA, Buenos Aires, ArgentinaM. Bocquet (bocquet@cerea.enpc.fr)3November201522664566228June201524July20157October20158October2015This work is licensed under a Creative Commons Attribution 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by/3.0/This article is available from https://npg.copernicus.org/articles/22/645/2015/npg-22-645-2015.htmlThe full text article is available as a PDF file from https://npg.copernicus.org/articles/22/645/2015/npg-22-645-2015.pdf
The ensemble Kalman filter (EnKF) is a powerful data assimilation method
meant for high-dimensional nonlinear systems. But its implementation requires
somewhat ad hoc procedures such as localization and inflation. The recently
developed finite-size ensemble Kalman filter (EnKF-N) does not
require multiplicative inflation meant to counteract sampling errors. Aside
from the practical interest in avoiding the tuning of inflation in perfect
model data assimilation experiments, it also offers theoretical insights and
a unique perspective on the EnKF. Here, we revisit, clarify and correct
several key points of the EnKF-N derivation. This simplifies the use of the
method, and expands its validity. The EnKF is shown to not only rely on the
observations and the forecast ensemble, but also on an implicit prior
assumption, termed hyperprior, that fills in the gap of missing
information. In the EnKF-N framework, this assumption is made explicit
through a Bayesian hierarchy. This hyperprior has so far been chosen to be
the uninformative Jeffreys prior. Here, this choice is revisited to
improve the performance of the EnKF-N in the regime where the analysis is
strongly dominated by the prior. Moreover, it is shown that the EnKF-N can be
extended with a normal-inverse Wishart informative hyperprior that introduces
additional information on error statistics. This can be identified as a
hybrid EnKF–3D-Var counterpart to the EnKF-N.
Introduction
The ensemble Kalman filter (EnKF) has become a popular data
assimilation method for high-dimensional geophysical systems and
references therein. The flow dependence of the forecast error
used in the analysis is its main strength, compared to schemes using static
background statistics such as 3D-Var and 4D-Var.
However, to perform satisfyingly, the EnKF may require the use or inflation
and/or localization, depending on the data assimilation system setup.
Localization is required in the rank-deficient regime, in which the limited
size of the ensemble leads to an empirical error covariance matrix of overly
small rank, as is often the case in realistic high-dimensional systems
. It can also be useful in a
rank-sufficient context in the presence of non-Gaussian/nonlinear effects.
Inflation is a complementary technique meant to increase the variances
diagnosed by the EnKF . It is usually intended
to compensate for an underestimation of uncertainty. This underestimation can
be caused either by sampling error, an intrinsic deficiency of the EnKF
system, or model error, an extrinsic deficiency.
A variant of the EnKF, called the finite-size ensemble Kalman filter
(EnKF-N), has been introduced in and .
It has subsequently been successfully applied in and
in an ensemble variational context. It has been shown to
avoid the need for the multiplicative inflation usually needed to counteract
sampling errors. In particular, it avoids the costly chore of tuning this
inflation.
The EnKF-N is derived by assuming that the ensemble members are drawn from
the same distribution as the truth, but makes no further assumptions about
the ensemble's accuracy. In particular, the EnKF-N, unlike the traditional
EnKFs, does not make the approximation that the sample first- and
second-order moments coincide with the actual moments of the prior (which
would be accessible if the ensemble size N were infinite).
Through its mathematical derivation, the scheme underlines the missing
information besides the observations and the ensemble forecast, an issue that
is ignored by traditional EnKFs. This missing information is explicitly
compensated for in the EnKF-N using a so-called hyperprior. In
, a simple choice was made for this hyperprior, namely the
Jeffreys prior, which is meant to be as non-informative as possible. While
the EnKF-N built on the Jeffreys prior often performs very well with
low-order models, it may fail in specific dynamical regimes because a finer
hyperprior is needed . Other choices were made in the
derivation of the EnKF-N that remain only partly justified or insufficiently
clear.
The objective of this paper is to clarify several of those choices, to answer
several questions raised in the above references, and to advocate the use of
improved or new hyperpriors. This should add to the theoretical understanding
of the EnKF, but also provide a useful algorithm. Specifically, the EnKF-N
allows the development of data assimilation systems under perfect model
conditions without worrying about tuning the inflation. In the whole paper,
we will restrict ourselves to perfect model conditions.
In Sect. , the key ideas and algorithms of the EnKF-N are recalled
and several aspects of the approach are clarified. It is shown that the
redundancy in the EnKF-centered perturbations leads to a subtle but important
correction to the EnKF-N when the analysis is performed in the affine space
defined by the mean state and the ensemble perturbations. In
Sect. , the ensemble update step of the EnKF-N is revisited
and clarified. In Sect. , the nonlinearity of the ensemble
forecast step and its handling by the EnKF-N, and more generally
multiplicative inflation, are discussed. The corrections to the EnKF-N are
illustrated with numerical experiments in Sect. .
Sections and discuss modifying or even
changing the hyperprior. In Sect. , we discuss caveats
of the method in regimes where the posterior ensemble is drawn to the prior
ensemble. Simple alternatives to the Jeffreys hyperprior are proposed.
Finally, a class of more informative priors based on the normal-inverse
Wishart distribution and permitting one to incorporate additional information
into error statistics is introduced and theoretically discussed in
Sect. . Conclusions are given in Sect. .
The finite-size ensemble Kalman filter (EnKF-N)
The key ideas of the EnKF-N are presented and clarified in this section.
Additional insights into the scheme and why it is successful are also given.
Marginalizing over potential priors
(later Boc11) recognized that the ensemble mean
x‾ and ensemble error covariance matrix P used
in the EnKF may be different from the unknown first- and second-order moments
of the true error distribution, xb and B, where
B is a positive definite matrix. The mismatch is due to the finite
size of the ensemble that leads to sampling errors, partially induced by the
nonlinear ensemble propagation in the forecast step (see
Sect. for a justification). Figure
illustrates the effect of sampling error when the prior is assumed Gaussian
and reliable, whereas the prior actually stems from an uncertain sampling
using the ensemble.
The EnKF-N prior accounts for the uncertainty in xb and
B. Denote E=x1,x2,…,xN the ensemble of size N formatted as an M×N
matrix where M is the state space dimension, x‾=E1/N the ensemble mean where 1=(1,⋯,1)T, and X=E-x‾1T the perturbation matrix. Hence,
P=XXT/(N-1) is the empirical covariance
matrix of the ensemble. Marginalizing over all potential xb
and B, the prior of x reads
px|E=∫dxbdBpx|E,xb,Bpxb,B|E.
The symbol dB corresponds to the Lebesgue measure on all
independent entries ∏i≤jMd[B]ij, but the
integration is restricted to the cone of positive definite matrices. Since
p(x|E,xb,B) is conditioned on the
knowledge of the true prior statistics and assumed to be Gaussian, it does
not depend on E, so that
px|E=∫dxbdBpx|xb,Bpxb,B|E.
Bayes' rule can be applied to p(xb,B|E),
yielding
p(x|E)=1p(E)∫dxbdBp(x|xb,B)p(E|xb,B)p(xb,B).
Assuming independence of the samples, the likelihood of the ensemble
E can be written as
p(E|xb,B)=∏n=1Np(xn|xb,B).
The last factor, p(xb,B), is the hyperprior. This
distribution represents our beliefs about the forecasted filter statistics,
xb and B, prior to actually running any filter.
This distribution is termed a hyperprior because it represents a prior for
the background information in the first stage of a Bayesian hierarchy.
Schematic of the traditional standpoint on the analysis of the EnKF
(top row), what it actually does using a Gaussian prior sampled from
three particles (middle row), and using a predictive prior accounting for the
uncertainty due to sampling (bottom row). The full green line represents the
Gaussian observation error prior pdfs, and the dashed blue lines represent
the Gaussian/predictive priors if known, or estimated from an ensemble, or
obtained from a marginalization over multiple potential error statistics. The
dotted red curves are the resulting analysis pdfs.
Assuming one subscribes to this EnKF-N view of the EnKF, it shows that
additional information is actually required in the EnKF, in addition to the
observations and the prior ensemble that are potentially insufficient to make
an inference.
A simple choice was made in Boc11 for the hyperprior: the Jeffreys prior is
an analytically tractable and uninformative hyperprior of the form
pJ(xb,B)∝B-M+12,
where B is the determinant of the background error
covariance matrix B of dimension M×M.
Predictive prior
With a given hyperprior, the marginalization over xb and
B, Eq. (), can in principle be carried out to
obtain p(x|E). We choose to call it a predictive prior to comply with the traditional view that sees it as a prior before
assimilating the observations. Note, however, that statisticians would rather
call it a predictive posterior distribution as the outcome of a
first-stage inference of a Bayesian hierarchy, where E is the
data.
Using Jeffreys' hyperprior, Boc11 showed that the integral can be obtained
analytically and that the predictive prior is a multivariate T distribution:
p(x|E)∝x-x‾x-x‾TN-1+εNP-N2,
where |.| denotes the determinant and εN=1+1/N. The
determinant is computed in the space generated by the perturbations of the
ensemble so that it is not singular. This distribution has fat tails, thus
accounting for the uncertainty in B. The factor εN is
a result of the uncertainty in xb; if xb
were known to coincide with the ensemble mean x‾, then
εN would be 1 instead. For a Gaussian process,
εNP is an unbiased estimator of the squared error of
the ensemble mean x‾, where
εN stems from the uncertain xb, which does not
coincide with x‾. In the derivation of Boc11, the
εNP correction comes from integrating out on
xb. Therefore, εN can be seen as an inflation
factor on the prior covariance matrix that should actually apply to any type
of EnKF.
This non-Gaussian prior distribution can be seen as an average over Gaussian
distributions weighted according to the hyperprior. It can be shown that
Eq. () can be re-arranged:
p(x|E)∝1+x-x‾T(εNP)†x-x‾N-1-N2,
where P† is the Moore–Penrose inverse of P.
In comparison, the traditional EnKF implicitly assumes that the hyperprior is
δ(B-P)δ(xb-x‾), where δ is a Dirac multidimensional
distribution. In other words, the background statistics generated from the
ensemble coincide with the true background statistics. As a result, one
obtains in this case the Gaussian prior:
p(x|E)∝exp-12x-x‾TP†x-x‾.
Analysis
Consider a given analysis step of the data assimilation cycle. The
observation vector is denoted y of dimension d. In a Bayesian
analysis, p(x|y)=p(y|x)p(x)/p(y); the
likelihood p(y|x) is decoupled from the prior probability
density function (pdf) p(x). In the EnKF-N framework we are
interested in p(x|y,E). Bayes' formula then reads
p(x|y,E)=p(y|x,E)p(x|E)p(y|E).
But y does not depend on E when conditioned on x:
p(y|x,E)=p(y|x). As a consequence, Bayes'
formula now simply reads within the EnKF-N framework as
p(x|y,E)=p(y|x)p(x|E)p(y|E).
This is at odds with the ill-founded claim by Boc11 that the likelihood still
depends on E. This expression clarifies one of the issues raised
in Boc11.
Let us recall and further discuss the analysis step of the EnKF-N for state
estimation. For the sake of simplicity, the observational error distribution
is assumed Gaussian, unbiased, with covariance matrix R. The
observation operator will be denoted H. Because the predictive prior
Eq. () is non-Gaussian, the analysis is performed through a
variational optimization similarly to the maximum likelihood filter
, rather than by matrix algebra as in traditional EnKFs.
Working in ensemble space, states are parameterized by vectors w of
size N such that
x=x‾+Xw.
There is at least one “gauge” degree of freedom in w due to the
fact that x is invariant under w↦w+λ1, where λ is an arbitrary scalar. This is the result of
the linear dependence of the centered perturbation vectors.
For reference, with these notations, the cost function of the ensemble
transform Kalman filter ETKF, based on
Eq. () reads as
J(w)=12y-H(x‾+Xw)R2+N-12wΠw2,
where ‖z‖R2=zTR-1z and Πw is the
orthogonal projector onto the row space of X. Algebraically,
Πw=X†X where
X† is the Moore–Penrose inverse of X.
Equation () is the direct result of the substitution into
Eq. () of x by w using
Eq. (). As explained by , one can add the
term wIN-Πw2
to the cost function without altering the minimum. Denoting ‖z‖2=zTz, this leads to
J(w)=12y-H(x‾+Xw)R2+N-12w2.
The added term has been labelled the gauge fixing term by Boc11 using
standard physics terminology. The EnKF-N cost function in Boc11 is
J(w)=12y-H(x‾+Xw)R2+N2lnεN+w2.
It is the result of the substitution of x by w using
Eq. () into Eq. (), and of the addition of
the gauge fixing term albeit inside the logarithm, which was justified by
extending the idea of and the monotonicity of the logarithm.
The restriction of x to the ensemble subspace is an approximation
inherent in the traditional EnKFs. By virtue of the hyperprior, it is not
necessarily part of the EnKF-N. However, it is quite justified assuming the
ensemble tracks the unstable subspace of the dynamics. When the ensemble is
of limited size and cannot span the full range of uncertain directions, such
as in high-dimensional systems, this ensemble transform representation can be
made local .
A cost function in state space could be rigorously derived from the prior
Eq. () following Boc11. The cost function
J(w) was obtained from the substitution x(w)=x‾+Xw in the state space cost function,
which, however, ignores the Jacobian of this transformation. Hence, it would
be preferable to directly obtain the probability density of the prior as a
function of w, which requires some mathematical development compared
to the immediate substitution in the cost function.
From a probabilistic standpoint, the logarithm of the determinant of the
Jacobian matrix should be added to the cost function since
lnpw(w)=lnpx(x(w))+ln∂x(w)∂w.
Had the transformation w↦x(w) been nonlinear, the
cost function would have been impacted see for
instance. However, the standard ensemble transform is
linear, which should result in an irrelevant constant. Unfortunately, because
of the gauge degree(s) of freedom of the perturbations, the transformation is
not injective and therefore singular, and the determinant of the
transformation is zero, yielding an undefined constant. Hence, the issue
should be addressed more carefully. It will turn out in the following section
that the cost function should be amended in the non-quadratic case.
Accounting for the gauge degrees of freedom of the ensemble transform
Let us denote Ñ≤min(N-1,M) the rank of
X. The number of gauge degrees of freedom is then g≡N-Ñ. The most common case encountered when applying the EnKF to
high-dimensional systems is that the rank of X is N-1≪M,
that is to say g=1, because X1=0. A
nonsingular ensemble transform is obtained by restricting w to
N⟂ the orthogonal complement of the null space,
N, of X. Hence, the ensemble transform
T:N⟂⟶T(N⟂)w̃⟼T(w̃)=Xw̃
is nonsingular. This amounts to fixing the gauge at zero. With this
restriction to N⟂, the prior of the ETKF defined over
N⟂ is
p(w̃)∝exp-N-12w̃2,
whereas the prior pdf of the EnKF-N is
p(w̃)∝εN+w̃2-N2.
In principle, the analysis can be performed in N⟂ using
reduced variables wr∈RÑ, looking
for an estimate of the form x=x‾+Xrwr, where Xr
would stand for a reduced perturbation matrix. To do so, let us introduce the
singular value decomposition of the initial perturbation matrix: X=UΣVT, with U∈RM×Ñ such that UTU=IÑ, Σ is a diagonal positive
matrix in RÑ2, and V∈RN×Ñ is such that
VTV=IÑ. The reduced
perturbation matrix Xr is then simply given by
Xr=UΣ. However, the change
of variable w↦wr would prevent us from using
the elegant symmetric formalism of the ensemble transform Kalman filter
because the perturbation matrix Xr is not centered.
Moreover, the new perturbations, Xr, are non-trivial
linear combinations of the initial perturbations, X. It is likely
to generate imbalances with nonlinear dynamics. Indeed, it is unlikely that
the displacement of the ensemble in the analysis would be minimized, as
opposed to what happens with the ETKF when the transform matrix is chosen
symmetric . We applied this change of variable to a standard
ETKF and tested it numerically with the Lorenz-95 low-order model
. We obtained much larger displacements and intermittent
instabilities that require more inflation.
Hence, we wish to fix the gauge while keeping the initial perturbations as
much as possible. To do so, the definition of the prior pdfs defined on
N⟂ is extended to the full ensemble space RN=N⟂⊕N, while maintaining their correct
marginal over N⟂. For the EnKF, we can fix the gauge by
choosing
p(w)∝exp-N-12w2,
as in Eq. (), which has indeed the correct marginal since
p(w) factorizes into independent components for N and
N⟂. For the EnKF-N, we can fix the gauge while keeping the
symmetry by choosing
p(w)∝εN+w2-N+g2.
It can be seen that this pdf has the correct marginal by integrating out on
N, using the change of variable w-w̃↦εN+w̃2(w-w̃).
The use of these extended pdfs in the analysis is justified by the fact that
the Bayesian analysis pdf p(w|y) in ensemble space has the
correct marginal over N⟂. Indeed, if p(y|w)=p(y|x=x‾+Xw) is the
likelihood in ensemble space that does not depend on w̃,
then the marginal of the Bayesian analysis pdf p(w|y)∝p(y|w)p(w) is consistently given by
p(w̃|y)∝p(y|w̃)p(w̃). We conclude that it is
possible to perform an analysis in terms of the redundant w in place
of w̃.
As opposed to the Gaussian case, the form of pdf Eq. () brings in
a change in the EnKF-N when the analysis is performed in ensemble space. The
appearance of g in the exponent is due to a non-trivial Jacobian
determinant when passing from the ungauged to the gauged variables, a
minimalist example of the so-called Faddeev–Popov determinant
. This consideration generates a modification of the
EnKF-N cost function when using Eq. () as the predictive prior.
Henceforth, we shall assume g=1, which will always be encountered in the
rest of the paper. Consequently, the modified EnKF-N has the following cost
function:
J(w)=12y-Hx‾+XwR2+N+12lnεN+w2,
which replaces Eq. (). This modification, g=0→1, as
compared with Boc11, will be enforced in the rest of the paper. Such a change
will be shown to significantly impact the numerical experiments in
Sect. .
Update of the ensemble
The form of the predictive prior also has important consequences for the
EnKF-N theory. First of all, the pdfs Eqs. () or () are
multivariate T distributions, and more specifically multivariate Cauchy
distributions. They are proper, i.e., normalizable to 1, but have neither
first-order nor second-order moments.
Laplace approximation
Conditioned on B, both the prior and the posterior are Gaussian
provided the observation error distribution is Gaussian, which is assumed for
the sake of simplicity. Without this conditioning, however, they are both a
(continuous) mixture of candidate Gaussians in the EnKF-N derivation.
Therefore, the posterior p(w|y)∝p(y|w)p(w) should be interpreted with caution. As was done
in Boc11, its mode can in principle be safely estimated. However, its moments
do not generally exist. They exist only if the likelihood
p(y|w) enables it. Even when they do exist, they do not carry
the same significance as for Gaussians.
Hence, the analysis wa is safely defined using the EnKF-N
Cauchy prior as the most likely w of the posterior pdf. But, using
the mean and the error covariance matrix of the posterior is either
impossible or questionable because as explained above they may not exist.
One candidate Gaussian that does not involve integrating over the hyperprior
is the Laplace approximation of the posterior seefor
instance, which is the Gaussian approximation fitted to the pdf
in the neighborhood of wa. This way, the covariance matrix
of the Laplace distribution is obtained as the Hessian of the cost function
at the minimum wa. Refining the covariance matrix from the
inverse Hessian is not an option since the exact covariance matrix of the
posterior pdf may not exist. This is a counterintuitive argument against
looking for a better approximation of the posterior covariance matrix rather
than the inverse Hessian.
Once a candidate Gaussian for the posterior has been obtained, the updated
ensemble of the EnKF-N is obtained from the Hessian, just as in the ETKF. The
updated ensemble is
Ea=xa1T+Xa,xa=x‾+Xwa,
where xa is the analysis in state space;
wa is the argument of the minimum of Eq. (). The
updated ensemble of perturbations Xa is given by
Xa=N-1XHa-1/2U,
where U is an arbitrary orthogonal matrix satisfying
U1=1 and where
Ha is the Hessian of Eq. (),
Ha=YTR-1Y+(N+1)εN+waTwaIN-2wawaTεN+waTwa2,
with Y=HX and H the tangent linear
of H. The algorithm of this so-called primal EnKF-N is recalled by
Algorithm 1. Note that the algorithm can handle nonlinear observation
operators since it is based on a variational analysis similarly to the
maximum likelihood ensemble filter of . We will choose
U to be the identity matrix in all numerical illustrations of this
paper, and in particular Sect. , in order to minimize the
displacement in the analysis .
Theoretical equivalence between the primal and dual approaches
Boc11 showed that the functional Eq. () is generally non-convex
but has a global minimum. Yet, the cost function is only truly non-quadratic
in the direction of the radial degree of freedom w of
w, because the predictive prior is elliptical. This remark led
(later BS12) to show, assuming H is linear or
linearized, that the minimization of Eq. () can be performed
simply by minimizing the following dual cost function over
]0,(N+1)/εN]:
D(ζ)=12δTR+Yζ-1YT-1δ+εNζ2+N+12lnN+1ζ-N+12,
where δ=y-H(x‾). Its global
minimum can easily be found since ζ↦D(ζ) is a
scalar cost function. The variable ζ is conjugate to the square radius
w2. It can be seen as the number of effective degrees
of freedom in the ensemble. Once the argument of the minimum of
D(ζ), ζa, is computed, the analysis for
w can be obtained from the ETKF-like cost function
J(w)=12y-H(x‾+Xw)R2+ζa2w2
with the solution
wa=YTR-1Y+ζaIN-1YTR-1δ=YTζaR+YYT-1δ.
Based on this effective cost function, an updated set of perturbations can be
obtained:
Xa=N-1XHa-12UwithHa=YTR-1Y+ζaIN.
As a consequence, the EnKF-N with an analysis performed in ensemble space can
be seen as an ETKF with an adaptive optimal inflation factor
λa applied to the prior distribution, and related to
ζa by λa=(N-1)/ζa.
Provided one subscribes to the EnKF-N formalism, this tells us that sampling
errors can be cured by multiplicative inflation. This is supported
by , who experimentally showed that multiplicative
inflation is well suited to account for sampling errors, whereas additive
inflation is better suited to account for model errors in a meteorological
context. Other efficient adaptive inflation methods have been proposed by,
e.g., , , ,
, , ,
, and for broader uses including extrinsic
model error. Nevertheless, for the experiments described in
Sect. , they are not as successful with the specific goal of
accounting for sampling errors as the EnKF-N.
Equation (), on which the results of BS12 are based, is only an
approximation, because it does not use the Hessian of the complete cost
function Eq. (). Only the diagonal term of the Hessian of the
background term is kept:
Hb≃N+1εN+wa2IN,
which can be simply written Hb≃ζaIN using ζa=N+1εN+wa2 shown in BS12 to be one of the optimum
conditions. The off-diagonal rank-one correction,
-2(N+1)-1ζa2wawaT, has been neglected. This approximation is
similar to that of the Gauss–Newton method, which is an approximation of the
Newton method where the Hessian of the cost function to be minimized is
approximated by the product of first-order derivative terms and by neglecting
second-order derivative terms. The approximation actually consists in
neglecting the co-dependence of the errors in the radial (w) and angular (w/w) degrees of
freedom of w.
Since the dual EnKF-N is meant to be equivalent to the primal EnKF-N when the
observation operator is linear, the updated ensemble should actually be based
on Eq. (), which can also be written as
Xa=N-1XHa-12U-2ζa2N+1wawaT,
with Ha=YTR-1Y+ζaIN, and compared to the approximation
Eq. () used in BS12. The algorithm of this so-called dual
EnKF-N is recalled in Algorithm 2 and includes the correction. With
Eq. (), the dual scheme is strictly equivalent to the primal
scheme provided that H is linear, whereas it is only approximately so with
Eq. ().
The co-dependence of the radial and angular degrees of freedom exposed by the
dual cost function is further explored in Appendix .
Cycling of the EnKF-N and impact of model nonlinearity
We have discussed and amended the analysis step of the EnKF-N. To complete
the data assimilation cycle, the ensemble must be forecasted between
analyses. The cycling of the EnKF-N can be summarized by the following
diagram:
In accounting for sampling error, the EnKF-N framework differs quite
significantly from that of , , and
. Focusing on the bias of the EnKF gain and precision
matrix, these studies are geared towards single-cycle corrections. By
contrast, the EnKF-N enables the likelihood to influence the estimation of
the posterior covariance matrix. This can be seen by writing and recognizing
the posterior as a non-uniform mixture of Gaussians, as for the prior. The
inclusion of the likelihood is what makes the EnKF-N equipped to handle the
effects of model nonlinearity and the sequentiality of data assimilation.
Assume that an ensemble square root Kalman filter is applied to linear
forecast and observation models, and further assume that the ensemble is big
enough to span the unstable and neutral subspace. In this case, it was shown
that inflation or localization are unnecessary to regularize the error
covariance matrix . Sampling errors, if
present, can be ignored in this case. Therefore, it is inferred from this
result that inflation is actually compensating for the mis-estimation of
errors generated by model nonlinearity. Following this line of thought, Boc11
hypothesized that the finite-size scheme actually accounts for the error
generated in the nonlinear deformation of the ensemble in the forecast step
of the EnKF. What happens to the EnKF-N when the model gets more linear is
addressed in Sect. .
A recent study by confirms and clarifies this
suggestion. The authors show that the nonlinear evolution of the error in the
extended Kalman filter generates additional errors unaccounted for by the
extended Kalman filter linear propagation of the error. In a specific
example, they are able to avoid the need for inflation with the 40-variable
Lorenz-95 model using a total of 24 perturbations (14 for the unstable
and neutral subspace and 10 for the main nonlinear corrections). We checked
that the same root mean square errors as shown in Table II of
can be achieved by the EnKF-N and the optimally tuned
EnKF with an ensemble of size N=24. This reinforces the idea that the
EnKF-N accounts, albeit within ensemble space, for the error generated by
nonlinear corrections inside and outside the ensemble subspace. Additionally,
note that the EnKF-N does not show any sign of divergence in the regime
studied by even for much stronger model nonlinearity.
To picture the impact of inflation on the fully cycled EnKF, let us consider
the simplest possible, one-variable, perfect, linear model xk+1=αxk, with k the time index. If α2>1, the model is unstable, and
stable if α2<1. In terms of uncertainty quantification,
multiplicative inflation is meant to increase the error covariances so as to
account for mis-estimated errors. Here, we apply the inflation to the prior
at each analysis step since the EnKF-N implicitly does it. Let us denote
bk the forecast/prior error variance, r the static observation error
variance and ak the error analysis variance. ζ plays the same role
as in the EnKF-N scheme, so that a uniform inflation is
ζ-12. Sequential data assimilation implies the following
recursions for the variances:
ak-1=ζbk-1+r-1andbk+1=α2ak,
whose asymptotic solution (a≡a∞) is
ifα2<ζ:a=0andifα2≥ζ:a=1-ζ/α2r.
Now, consider a multivariate model that is the collection of several
independent one-variable models with as many growth factors α. In the
absence of inflation, ζ=1, the stable modes, α2<1, converge to
a perfect analysis (a=0), whereas the unstable modes, α2>1,
converge to a finite error (a>0) that grows with the instability of the
modes, as expected. When inflation is used, ζ<1; the picture changes
but mostly affects the modes close to neutral (see Fig. ).
The threshold is displaced and the modes with finite asymptotic errors now
include a fraction of the stable modes. The strongly unstable modes are much
less impacted.
In spite of its simplicity and its linearity, this model makes the link
between the EnKF-N, multiplicative inflation and the dynamics.
and have argued that, in the absence of model error,
systematic error of the EnKF comes from the error transported from the
unstable subspace to the stable subspace by the effect of nonlinearity.
Unaccounted error would accumulate on the stable modes close to neutrality.
As seen above, the use of the EnKF-N, or multiplicative inflation on the
prior, precisely acts on these modes by increasing their error statistics
without affecting the most unstable modes that mainly drive the performance
of the EnKF.
Numerical experiments
Twin experiments using a perfect model and the EnKF-N have been carried out
on several low-order models in previous studies. In many cases the EnKF-N, or
its variant with localization (using domain localization), were reported to
perform on the Lorenz-63 and Lorenz-95 models as well as the ETKF but with
optimally tuned uniform inflation. With a two-dimensional forced turbulence
model, driven by the barotropic vorticity advection equation, it was found to
perform almost as well as the ETKF with optimally tuned uniform inflation
, although the local EnKF-N has not yet been thoroughly
tested with this model.
Analysis error variance when applying sequential data assimilation
to xk+1=αxk with (ζ=0.75, dashed line) or without
(ζ=1, full line) multiplicative inflation on the prior, as a function
of the model growth α. We chose r=1.
The choice of εN has remained a puzzle in these experiments. It
has been reported that the Lorenz-63 model required εN=1+1/N,
whereas the Lorenz-95 model required εN=1, seemingly owing to
the larger ensemble size. It was also previously reported that domain
localization of the EnKF-N with both models required εN=1+1/N.
In the present study, we have revisited those experiments using the
correction g=0→1 of Sect. , sticking with the
theoretical value εN=1+1/N and the same ensemble sizes. This
essentially reproduced the results of the best choice for εN in
each case. For these low-order models, this solved a puzzle: there is no need
to adjust εN=1+1/N. Hence, the EnKF-N in the subsequent
experiments uses the correction g=0→1 and
εN=1+1/N.
Average analysis RMSE for the primal EnKF-N, the dual EnKF-N, the
approximate EnKF-N, and the EnKF with uniform optimally tuned inflation,
applied to the Lorenz-95 model, as a function of the time step between
updates. The finite-size EnKFs are based on Jeffreys'
hyperprior.
Figure summarizes the corrections of Sects.
and . It also illustrates the equivalence between the primal
and dual EnKF-N. It additionally shows the performance of the dual EnKF-N
with the approximate Hessian used in BS12, and the performance of the
ensemble square root Kalman filter with optimally tuned uniform inflation.
The Lorenz-95 low-order model is chosen for this illustration
. Details about the model can be found in their article. A
twin experiment is performed, with a fully observed system (H=Id,
where d=M=40), an observation error variance matrix
R=Id that is also used to generate synthetic
observations from the truth. The ensemble size is N=20. The time interval
between observation updates Δt is varied, which changes the
nonlinearity strength. Varying the magnitude of a model's nonlinearity is
highly relevant because, as explained in Sect. , model
nonlinearity is the underlying cause of the need for inflation, in this
rank-sufficient context (N=20). We plot the mean analysis root mean square
error (RMSE) between the analysis state and the truth state. To obtain a
satisfying convergence of the statistics, the RMSEs are averaged over 105
cycles, after a spin-up of 5×103 cycles.
The performances of the primal and dual EnKF-N are indistinguishable for the
full Δt range. The dual EnKF-N with approximate Hessian hardly
differs from the EnKF-N, i.e., using Eq. () in place of
Eq. (). However, it is slightly suboptimal for Δt=0.05
by about 5 %.
Similar experiments have been conducted with the Lorenz-63 model
, the Lorenz-05II model , and the
Kuramato–Shivashinski model . These
experiments have yielded the same conclusions.
The additional numerical cost of using the finite-size formalism based on
Jeffreys' hyperprior is now compared to the analysis step of an ensemble
Kalman filter or of an ensemble Kalman smoother based on the
ensemble-transform formulation. The computational cost depends on the type of
method. Let us first discuss non-iterative methods, such as the ETKF or a
smoother based on the ETKF. If the singular value decomposition (SVD) of
R-12Y has already been obtained, the dual
approach can be used and the additional cost of the EnKF-N, or EnKS-N, is due
to the minimization of the dual cost function Eq. (), which is
negligible. This is indeed the case in all our experiments where the SVD has
been obtained in order to compute the inverse in the state update
Eq. () or the inverse square root in the perturbation update
Eqs. () or (). If the data assimilation is
iterative (for significantly nonlinear models) such as the maximum likelihood
ensemble filter or the iterative ensemble Kalman
smoother , then the primal approach of the finite-size
scheme can be made to coincide with the iterative scheme. Examples of such
integrated schemes are given in and .
The additional cost is often negligible except if the number of expected
iterations is small, which is the case if the models are weakly nonlinear.
However, in this case, the finite-size correction is also expected to be
small, with an effective inflation value close to 1.
Moreover, it is important to notice that the perturbations update as given by
Eq. () can induce a significant extra numerical cost as compared
to the update of an ETKF. Indeed, the SVD used to compute Eq. ()
cannot be directly used to compute Eq. (), which might require
another SVD. However, using the approximate scheme that consists in
neglecting the off-diagonal term does not require the additional SVD. Even if the
off-diagonal term is included in the Hessian, the inverse square root of the
Hessian could be computed from the original SDV through a Sherman–Morisson
update because the off-diagonal term is of rank one.
Let us finally mention that no significant additional storage cost is
required by the scheme.
Performance in the prior-driven regime
The EnKF-N based on the Jeffreys hyperprior was found to fail in the limit
where the system is almost linear but remains nonlinear (BS12). This regime
is rarely explored with low-order models, but it is likely to be encountered
in less homogeneous, more realistic applications.
Figure a illustrates this failure. It extrapolates the
results of Fig. to very small time intervals between updates
where the dynamics are quasi-linear. As Δt decreases the RMSE of the
optimal inflation EnKF decreases as one would expect, while the RMSE of the
EnKF-N based on the Jeffreys prior increases.
In this regime, the EnKF-N has great confidence in the prior as any filter
would do. Therefore, the innovation-driven term becomes less important than
the prior term Db(ζ)=εNζ2+N+12lnN+1ζ-N+12 in the dual cost
function Eq. (), so that its mode ζa tends to the
mode of Db(ζ), which is ζa=(N+1)/εN=N. Note that an inflation of 1 corresponds to
ζ=N-1. Hence, in this regime, even for moderately sized innovations,
there is deflation. The failure of the EnKF-N was empirically fixed in BS12
by capping ζa to prevent deflation.
Average analysis RMSE for the EnKF-N with Jeffreys' hyperprior, with
the EnKF-N based on the Dirac–Jeffreys hyperprior, with the EnKF-N based on
the Jeffreys hyperprior but enforcing schemes R1 or R2, and the EnKF with
uniform optimally tuned inflation, applied to the Lorenz-95 model, as a
function of the time step between update (top), and as a function of the
forcing F of the Lorenz-95 model (bottom). The analysis ensemble spread of
the EnKF-N based on the Dirac–Jeffreys hyperprior is also
shown.
More generally, we believe the problem is to be encountered whenever the
prior largely dominates the analysis (prior-driven regime). This is bound to
happen when the observations are too few and too sparsely distributed, which
could occur when using domain localization, and whenever they are unreliable
compared to the prior. Quasi-linear dynamics also fit this description, the
ratio of the observation precision to the prior precision becoming small
after a few iterations.
This failure may not be due to the EnKF-N framework. It may be due to an
inappropriate choice of candidate Gaussian posterior as described in
Sec. . Or it may be due to an inappropriate choice of
hyperprior in this regime. Although it seems difficult to devise a hyperprior
that performs optimally in all regimes, we can suggest two adjustments to
Jeffreys' hyperprior in this prior-driven regime.
Capping of the inflation
Here, deflation is avoided by capping ζ. Firstly, we build the desired
dual cost function. Instead of minimizing D(ζ) over
]0,(N+1)/εN], it is minimized over ]0,ζ‾], with
0≤ζ‾≤(N+1)/εN, which defines the dual cost
function. ζ‾ is a tunable bound that is meant to be fixed
over a wide range of regimes. Following a similar derivation to Appendix A of
BS12, one can show that the background term of the primal cost function
corresponding to this dual cost function is
ifw2≤N+1ζ‾-εN:Jb(w)=ζ‾2εN+w2+N+12lnN+1ζ‾-N+12;ifw2>N+1ζ‾-εN:Jb(w)=N+12lnεN+w2.
The dual and primal cost functions can both be shown to be convex. There is
no duality gap, which means, with our definitions of these functions, that
the minimum of the dual cost function is equal to the minimum of the primal
cost function. By construction, in the small innovation range, i.e.,
w2≤(N+1)/ζ‾-εN, the
EnKF-N, endowed with this new hyperprior, corresponds to the ETKF
, with an inflation of the prior by (N-1)/ζ‾≥1. Since the hyperprior assumed in the regime of small
w is p(xb,B)=δ(B-ζ‾P), this could be called the
Dirac–Jeffreys hyperprior.
Even with the Dirac–Jeffreys hyperprior, it is still necessary to introduce
a tiny amount of inflation through ζ‾ in the quasi-linear
regime. This might prove barely relevant in a high-dimensional realistic
system, as it was for the sensitive low-order models that we tested the
scheme with. Even with Lorenz-95, an instability develops over very long
experimental runs in the absence of this residual inflation. This still
remains a theoretical concern. Moreover, we could not find a rigorous
argument to support avoiding deflation in all regimes and hence the capping.
That is why we propose an alternative solution in the following.
Smoother schemes in the prior-driven regime
In the limit of R getting very large, the observations cannot
carry information, and the ensemble should not be updated at all; i.e., it
should be close to the prior ensemble, with an inflation of 1
(ζ=N-1). Outside of this regime, we do not see any fundamental reason
to constrain ζ to be smaller than N-1. A criterion to characterize
this regime would be
ψ=1N-1TrYTR-1Y,
which computes the ratio of the prior variances to the observation error
variances. When ψ tends to zero, the analysis should be dominated by the
prior and ζ should tend to N-1. When ψ drifts away from zero, we
do not want to alter the hyperprior and the EnKF-N scheme, even if it implies
deflation. We found several schemes that satisfy these constraints. Two of
them, denoted R1 and R2, consist in modifying εN into
εN′ and yield a well-behaved mode of the background part of the
dual cost function ζb=argminζDb(ζ):
R1:εN′=εN1-1Ne-ψ⇒ζb=N-e-ψ;R2:εN′=N+1NNN-111+ψ⇒ζb=NN-1N11+ψ.
The point of these formulae is to make ζb tend to N-1 (no
inflation) when the criterion ψ tends to zero. On the other hand, when
ψ gets bigger, ζb tends to N, i.e., to the original
dual cost function's behavior dictated by Jeffreys' hyperprior. The
implementation of these schemes is straightforward for any of the
Algorithms 1 or 2, since only εN needs to be modified either in
the dual or primal cost functions.
Numerical illustrations
The performance of the Dirac–Jeffreys EnKF-N where we choose
(N-1)/ζ‾=1.005, and of the EnKF-N with the hyperprior
corrections (R1) and (R2), are illustrated with a twin experiment on the
Lorenz-95 model in the quasi-linear regime. Also included are the EnKF-N with
the Jeffreys prior and the ensemble square root Kalman filter with optimally
tuned inflation. The RMSEs are plotted as a function of Δt in
[0.01,0.5] in Fig. a.
Another way to make a data assimilation system based on the Lorenz-95 model
more linear, rather than decreasing Δt, is to decrease the forcing
parameter to render the model more linear. Figure b
illustrates this when F is varied from 4 (linear) to 12 (strongly
nonlinear), with Δt=0.05 and the same setup as in
Sect. . As anticipated, the EnKF-N based on Jeffreys'
hyperprior fails for F<7.5. However, the EnKF-N based on the
Dirac–Jeffreys hyperprior and the EnKF-N with schemes R1 and R2 show
performances equivalent to the EnKF with optimally tuned inflation. We note a
slight underperformance of the EnKF-N in the very strongly chaotic regimes
compared to the optimally tuned EnKF. We have also checked that these good
performances also apply to the Lorenz-63 model.
The spread of the ensemble for the Dirac–Jeffreys EnKF-N has also been
plotted in Fig. a and b. The value of the spread is
consistent with the RMSE except in significantly nonlinear regimes such as
when Δt>0.15 and F=8, or to a lesser extent when Δt=0.05 and F>8. In those nonlinear regimes and with such non-iterative
EnKFs, the Gaussian error statistics approximation is invalidated so that the
RMSE could differ significantly from the ensemble spread.
Informative hyperprior, covariance localization and hybridization
So far, the EnKF-N has relied on a noninformative hyperprior. In this section
we examine, mostly at a formal level, the possibility of accounting for
additional, possibly independent, information on the error statistics, like a
hybrid EnKF–3D-Var is meant to . A single
numerical illustration is intended since extended results would involve many
more developments and would be very model dependent.
In a perfect model context, we observed that uncertainty in the variances
usually addressed by inflation could be taken care of by the EnKF-N based on
Jeffreys' hyperprior. However, it does not take care of the correlation (as
opposed to variance) and rank-deficiency issues, which are usually addressed
by localization. Localization has to be superimposed onto the finite-size
scheme to build a local EnKF-N without the intrinsic need for inflation
. Nonetheless, by marginalizing over limited-range
covariance matrices (Sect. 5 of ), we also argued that
the use of an informative hyperprior would produce covariance localization
within the EnKF-N framework. A minimal example where the hyperprior is
defined over B matrices that are positive diagonal, and hence very
short-ranged, was given and supported by a numerical experiment. Hence, it is
likely that the inclusion of an informative prior is a way to elegantly
impose localization within the EnKF-N framework.
An informative hyperprior is the normal-inverse Wishart (NIW) pdf:
pNIWxb,B∝B-M+2+ν2×exp-κ2xb-xcB2-12TrB-1C.
It is convenient because, with this hyperprior, Eq. ()
remains analytically integrable. The location state xc, the
scale matrix C, which is assumed to be full-rank, κ and
ν are hyperparameters of the distribution from which the true error
moments xb and B are drawn. The pdf
pNIW is proper only if ν>M-1, but this is not an
imperative requirement provided that the integral in Eq. ()
is proper.
The resulting predictive prior can be deduced from ,
Sect. 3.6:
p(x|E)∝1+N+κN+κ+1x-x^κNN+κxc-x‾xc-x‾T+XXT+C2-12(N+1+ν),
where x^=(κxc+Nx‾)/(N+κ). From these expressions,
xc could be interpreted as some climatological state and
C would be proportional to some static error covariance matrix,
which could be estimated from a prior, long and well-tuned EnKF run. They
could also be parameterized by tunable scalars that could be estimated by a
maximum likelihood principle .
A subclass of hyperpriors is obtained when the degree of freedom
xc is taken out, leading to the inverse Wishart (IW)
distribution
pIW(xb,B)∝B-M+1+ν2exp-12TrB-1C,
and to the predictive prior
p(x|E)∝1+NN+1x-x‾XXT+C2-12(N+ν).
Jeffreys' hyperprior is recovered from the IW hyperprior in the limit where
ν→0 and C→0, well within the
region ν≤M-1 where the IW pdf is improper. Note that the use of an IW
distribution was advocated owing to its natural conjugacy in a remarkable
paper by where a hierarchical stochastic EnKF was first
proposed and developed.
Because the scale matrix C is assumed to be full rank, updating in
state space is preferred to an analysis in ensemble space. Based on the
marginals Eqs. () and (), the
Jb term of the analysis cost function is of the form
Jb(x)=γ2lnεN+x-x^Γ2withΓ=XXT+C^.
In the case of the NIW hyperprior, one has
γ=N+1+ν,εN=1+1/(N+κ),C^=C+κNN+κxc-x‾xc-x‾T.
In the case of the IW hyperprior, one has
γ=N+ν,εN=1+1/N,x^=x‾,C^=C.
We observe that the Jb term is formally similar to that
of the EnKF-N with Jeffreys' hyperprior that is directly obtained in state
space from Eq. (). Hence, the sequential data assimilation
schemes built from the NIW and IW hyperpriors formally follow that of the
EnKF-N. But, to do so, the analysis must be written in state space, whereas
it has been expressed in ensemble space so far.
Primal analysis and dual analysis
The primal analysis in state space is obtained from xa=argminxJ(x), where
J(x)=Jo(x)+Jb(x)=12y-H(x)R2+γ2lnεN+x-x^Γ2.
For the dual analysis, we further assume that the observation operator H is
linear – and hence denoted H – for the primal/dual
correspondence to be exact. The derivation of the dual cost function follows
that of BS12. The following Lagrangian is introduced to separate the radial
and angular degrees of freedom of x:
L(x,ρ,ζ)=Jo(x)+ζ2x-x^Γ2-ρ+γ2lnεN+ρ,
where ζ is a Lagrange multiplier. The saddle-point equations of this
Lagrangian are
ρa=xa-x^Γ2,ρa=γζa-εN,xa=x^+ΓHTζaR+HΓHT-1δ^withδ^=y-Hx^.xa, ρa, and ζa are the
saddle-point values of the variables. Using these saddle-point equations, it
can be shown that the minimization of Eq. () is equivalent to
the minimization of the following scalar dual cost function over
]0,γ/εN],
D(ζ)=Lxa,ρa,ζ=12δ^TR+ζ-1HΓHT-1δ^+εNζ2+γ2lnγζ-γ2,
a mild generalization of Eq. (). As in BS12, ζ is
interpreted as an effective size of the ensemble as seen by the analysis.
Note that, in this context, it could easily be larger than N-1 if the added
information content of the informative hyperprior is significant.
State space update of the ensemble perturbations
Recall that the square root ensemble update corresponding to
Eq. () and Jeffreys' hyperprior is
Xa=N-1XYTR-1Y+ζaIN-2ζa2N+1wawaT-12U.
Note that covariance localization cannot be implemented in ensemble space
using Eq. (). To make the covariance matrix explicit, we wish to
write this in state space. Firstly, from Eq. (),
wa can be written as wa=YTz, where z=ζaR+YYT-1δ.
Then, by the matrix shift lemma that asserts that Af(BA)=f(AB)A for any two
matrices A and B of compatible sizes and f an
analytic function
Assuming f(x)=∑k=0∞akxk, one
has Af(BA)=∑k=0∞akA(BA)k=∑k=0∞ak(AB)kB=f(AB)A.
, we can turn this right-transform into a
left-transform
Let A be a diagonalizable, not necessarily
symmetric, matrix A=ΩΛΩ-1 with
Λ diagonal. If Λ≥0,
then the square root matrix A12 is defined by
ΩΛ12Ω-1.
:
Xa=N-1ζaIM+XYTR-1-2ζa2N+1zzTH-12XU.
When ζa=N-1 and z=0, one recovers the
ensemble square root Kalman update formula written in state space:
Xa=IM+PHTR-1H-12X. Note that we could absorb
-2ζa2N+1zzT into
R using the Sherman–Morrison formula, leading to an effective
observation error covariance matrix Re that is bigger
than R (using the order of the positive symmetric matrices). To
superimpose localization on this Jeffreys hyperprior EnKF-N, a Schur product
can easily be applied to XYT in
Eq. (), while the transformation still applies to the initial
perturbations X without any explicit truncation.
Here, however, we wish to obtain a similar left-transform but for the NIW
EnKF-N. The Hessian of the primal cost function Eq. () is
H=HTR-1H+γΓ-1εN+x-x^Γ2-2γΓ-1x-x^x-x^TΓ-1εN+x-x^Γ22,
yielding at the minimum
Ha=HTR-1H+ζaΓ-1-2ζa2γΓ-1xa-x^xa-x^TΓ-1≡HTR-1H+ζaΓe-1,
where the correction term has been absorbed into an effective symmetric
positive definite matrix Γe. Henceforth,
Γ will stand for Γe, and
any correction term is assumed to have been absorbed into
C^ in Γ. Decomposing
ζa-1Γ, which is the effective background
error covariance matrix, into as many modes as required
ζa-1Γ=ZZT
and applying Eq. (), it is not difficult to obtain a square root
matrix of the analysis error covariance matrix Pa:
Pa12=ζaIM+ΓHTR-1H-12Γ12.
However, this does not constitute a limited-size ensemble of perturbations,
since Pa12 is full-rank as C was
assumed full-rank. To obtain an ensemble update of N perturbations, the
scale matrix C^ in Γ=XXT+C^ can be projected onto the
ensemble space generated by the initial perturbations. Then, ΠXC^ΠX replaces C^, where
ΠX is the orthogonal projector on the columns of X,
ΠX=XX†. Following
, we can form an effective set of perturbations
Xc that satisfy
XcXcT=XXT+ΠXC^ΠX=XIN+X†C^XT†XTby usingXc=XIN+X†C^XT†12,
or alternatively a left-transform equivalent formula that is
obtained from the matrix shift lemma
Xc=IM+XX†C^XXT†12X=IM+ΠXC^ΠXXXT†12X.
Substituting this Xc into
Γ12 in Eq. (), we finally obtain
an update of the perturbations X as a new set of perturbations of
the same size N:
Xa=N-1ζaIM+ΓHTR-1H-12IM+XX†C^XXT†12XU.
Covariance localization and EnKF-3D-Var hybridization
The state space formulation of the analysis enables covariance localization,
which was not possible in ensemble space. To regularize
P=XXT/(N-1) by covariance
localization, one can apply a Schur product with a short-range correlation
matrix Θ. In that case, Eq. () is
unchanged but with Γ=C^+Θ∘XXT, with
∘ the Schur product symbol. Note that this type of covariance
localization is not induced by the hyperprior, but is superimposed to the
EnKF-N whatever its hyperprior. The state update is obtained from
Eqs. () and () by letting HΓHT⟶Θ∘YYT+HC^HT or ΓHT⟶Θ∘XYT+C^HT.
An alternative is to use the α control variables
. A mathematically equivalent cost function to
Eq. () but with Γ=C^+Θ∘XXT is
Jδx,αn=Jox^+δx+∑n=1Nαn∘xn-x‾+γ2lnεN+δxC^2+∑n=1NαnΘ2.
The αnn=1,…,N are N ancillary
control vectors of size M related to the dynamical errors, whereas δx is a control vector of size M related to the background errors.
The control vector x is related to αn and δx by identifying x with the argument of
Jo in the cost function. This expression of the cost
function is obtained by first passing from Eqs. ()
to (), then along the lines of . It can
be seen from the cost function that the EnKF-N based on the NIW hyperprior
yields a generalization of the EnKF–3D-Var hybrid data assimilation method
to the EnKF-N framework.
Average analysis RMSE as a function of (α,β) for the
EnKF-N based on the IW hyperprior, without inflation or enforced
localization, for ensemble sizes of N=20 (left) and of N=10 (right). The
RMSEs above 1, i.e., worse than an analysis based only on observations, are
in white.
Moreover, the above derivation suggests the following perturbation update
needed to complete the NIW EnKF-N scheme:
Xa=N-1ζaIM+C^HT+Θ∘XYTR-1H-12×IM+C^Θ∘XXT†12XU.
Numerical illustration
Here we wish to illustrate the use of the EnKF-N based on the IW hyperprior.
We consider again the same numerical setup as in Sect. with
the Lorenz-95 model. The ν hyperparameter and the C scale
matrix are chosen to be
ν=1+Nα1-α,C=β1-βIM,
with α and β two real parameters in the interval [0,1[.
Synthetic experiments are performed for a wide range of (α,β)
couples for two sizes of the ensemble N=20, which is bigger than the
dimension of the unstable and neutral subspace (14), which, for traditional
EnKFs, would not require localization but inflation, and N=10, which, for
traditional EnKFs, would require both localization and inflation. We do not
use inflation since it is meant to be accounted for by the finite-size
scheme. We do not superimpose domain or covariance localization. Analysis
RMSEs are computed for each run and reported in Fig. .
This is a preliminary experiment. In particular, we do not perform any
optimization of α and β based for instance on empirical Bayesian
estimation. For N=20, we barely note any improvement in terms of RMSEs due
to the use of the NIW hyperprior as compared to the EnKF-N based on Jeffreys'
hyperprior, i.e., (α,β)=(0,0). However, we observe that for
N=10, localization is naturally enforced via the hyperprior due to a
mechanism known in statistics as shrinkage. Although there is no
dynamical tuning of α and β, and even though the choice for
C is gross, good RMSEs can be obtained. A RMSE of 0.33 is
achieved for (α,β)=(0.50,057) as compared to a typical analysis
RMSE of 0.20 for the EnKF-N with optimally tuned, superimposed
localization. Interestingly, the average optimal effective size in this case
is ζa=15, above the unstable subspace dimension, validating
its potential use as a diagnostic.
Conclusions
In this article, we have
revisited the finite-size ensemble Kalman filter, or EnKF-N. The scheme
offers a Bayesian hierarchical framework to account for the uncertainty in
the forecast error covariance matrix of the EnKF that is inferred from a
limited-size ensemble. We have discussed, introduced additional arguments
for, and sometimes improved several of the key steps of the EnKF-N
derivation. Our main findings are the following.
A proper account of the gauge degrees of freedom in the redundant ensemble
of perturbations and the resulting analysis led to a small but important
modification of the ensemble transform-based EnKF-N analysis cost function
(g=0→1, as seen in Eq. ).
Consequently, the marginal posterior distribution of the system state is
a Cauchy distribution, which is proper but does not have first- and
second-order moments. Hence, only the maximum a posteriori estimator is
unambiguously defined. Moreover, this suggests that the Laplace approximation
should be used to estimate the full posterior.
The modification g=0→1 frees us from the inconvenient tweaking
of εN to 1 or to 1+1N: now, only
εN=1+1N is required.
The connection to dynamics has been clarified. It had already been assumed
that the EnKF-N compensates for the nonlinear deformation of the ensemble in
the forecast step. This conjecture was substantiated here by arguing that the
effect of the nonlinearities is similar to sampling error, thus explaining
why multiplicative inflation, and the EnKF-N in particular, can compensate
for it.
The ensemble update of the dual EnKF-N was amended to offer a perfect
equivalence with the primal EnKF-N. It was shown that the additional term in
the posterior error covariance matrix accounts for the error co-dependence
between the angular and the radial degrees of freedom. However, this
correction barely affected the numerical experiments we tested it with.
The EnKF-N based on Jeffreys' hyperprior led to unsatisfying performance
in the limit where the analysis is largely driven by the prior, especially in
the regime where the model is almost (but not) linear. We proposed two new
types of schemes that rectify the hyperprior. These schemes have been
successfully tested on low-order models, meaning that the performance of the
EnKF-N becomes as good as the ensemble square root Kalman filter with
optimally tuned inflation in all the tested dynamical regimes.
As originally mentioned in Boc11, the EnKF-N offers a
broad framework to craft variants of the EnKF with alternative hyperpriors.
Inflation was shown to be addressed by a noninformative hyperprior, whereas a
localization seems to require an informative hyperprior. Here, we showed that
choosing the informative normal-inverse Wishart distribution as a hyperprior
for xb, B leads to a formally similar EnKF-N,
albeit expressed in state space rather than ensemble space. The EnKF-N based
on this informative hyperprior is a finite-size variant of the hybrid
EnKF-3D-Var. It has a potential for tuning the balance between static and
dynamical errors. Moreover, we showed on a preliminary numerical experiment
that localization can be naturally carried out through shrinkage induced by
the scale matrix of the normal-inverse Wishart hyperprior.
With the corrections and new interpretations on the EnKF-N based on Jeffreys'
hyperprior, we have obtained a practical and robust tool that can be used in
perfect model EnKF experiments in a wide range of conditions without the
burden of tuning the multiplicative inflation. This has saved us a lot of
computational time in recent published methodological studies.
An EnKF-N based on an informative hyperprior, the normal-inverse Wishart
distribution, has been described and its equations derived. We plan to
evaluate it thoroughly in extensive numerical experiments. Several optional
uses of the method are contemplated. Hyperparameters xc,
C, ν and κ could be diagnosed from the statistics of a
prior well-tuned data assimilation run. Empirical Bayesian approaches could
then be used to objectively balance the static errors and the dynamical
errors. Alternatively, the hyperparameters could be estimated online in the
course of the EnKF, rather than being obtained from prior statistics, using a
more systematic empirical Bayesian approach.
The EnKF-N is not designed to handle model error, which is critical for
realistic applications. Other adaptive inflation techniques currently in
operation would be more robust in such contexts. We are working on a
consistent merging of the finite-size approach that accounts for sampling
errors and of a multiplicative inflation scheme designed to account for model
error.
Coupling of the radial and angular degrees of freedom
Section separately identified angular and radial degrees
of freedom in the EnKF-N cost function. This led to the dual cost function,
and an alternative interpretation of the EnKF-N as an adaptive inflation
scheme that accounts for sampling errors.
Here we wish to interpret the contributions in the Hessian
Eq. () that come from the angular and from the radial degrees
of freedom. To do so, we study the evidence p(y), i.e., the
likelihood of the observation vector, as estimated from the EnKF-N. This
evidence is usually computed by marginalizing over all possible model states,
which reads in our case as
p(y)=∫RNdwp(y|w)p(w)=AN∫RNdwe-12y-H(x‾+Xw)R2-N+12lnεN+w2,
where AN=ΓN+12εNN-122N2πN+12|R| is a normalization constant. This integral is also
called the partition function of the system in statistical physics since it
sums up the contributions of all possible states to the evidence. To untangle
the angular and radial degrees of freedom, we apply the following identity
for any α>0 and β>0 to the prior:
α-β=1Γ(β)∫-∞∞dte-αet+βt.
Additionally assuming here that the observation operator is linear, we obtain
p(y)=BN∫RN+1dwdte-12δ-YwR2-12etw2-12etεN+N+12t,
where BN=2N+12ΓN+12AN.
The main contribution to the evidence can be estimated by using the Laplace
method to estimate this integral. Let us denote L(w,t)
minus the argument of the exponential in the integrant. If the saddle point
of L(w,t) is (w⋆,t⋆), and if its
Hessian at the saddle point is Hw,t(w⋆,t⋆), then an estimate of the evidence is p(y)≃BN(2π)N+1Hw,t(w⋆,t⋆)e-L(w⋆,t⋆).
The normalization by the Hessian represents a correction due to Gaussian
fluctuations of the variables (w,t) around the saddle point. The
saddle-point conditions are
w=YTR-1Y+etIN-1YTR-1δ,et=N+1εN+w2,
which are equivalent to the dual EnKF-N saddle-point equations (BS12). The
Hessian is
Hw,t(w⋆,t⋆)=YTR-1Y+et⋆INet⋆w⋆et⋆w⋆N+12.
Hence, the integral is dominated by the saddle-point solution found in the
dual EnKF-N derivation. It corresponds to a standard ETKF analysis with a
prior correction by the et⋆ factor. Moreover, the fluctuations are
due to the standard ETKF fluctuations
YTR-1Y+et⋆IN,
with additional corrections due to the radial degree of freedom. When
computing a precision matrix Hw for the variables
w from the Hessian Eq. () using the Schur
complement, i.e., the precision on the w variables conditioned on the
knowledge of t⋆, we find
Hw(w⋆,t⋆)=YTR-1Y+et⋆IN-2N+1e2t⋆w⋆w⋆T,
which coincides with Eq. (). This says that the correction
-2(N+1)-1ζ2wawaT in
Eq. () is due to the fluctuation of ζ(=et) and its
coupling to the angular degrees of freedom.
Acknowledgements
We are grateful to two anonymous reviewers and to the editor, Zoltan Toth,
for their valuable and helpful suggestions to improve this paper. This study
is a contribution to INSU/LEFE project DAVE.
Edited by: Z. Toth
Reviewed by: two anonymous referees
References
Anderson, J. L.: An adaptive covariance inflation error correction algorithm
for ensemble filters, Tellus A, 59, 210–224, 2007.
Anderson, J. L. and Anderson, S. L.: A Monte Carlo Implementation of the
Nonlinear Filtering Problem to Produce Ensemble Assimilations and Forecasts,
Mon. Weather Rev., 127, 2741–2758, 1999.
Bishop, C. H., Etherton, B. J., and Majumdar, S. J.: Adaptive Sampling with
the Ensemble Transform Kalman Filter. Part I: Theoretical Aspects, Mon.
Weather Rev., 129, 420–436, 2001.
Bishop, C. M. (Ed.): Pattern Recognition and Machine Learning,
Springer-Verlag New-York Inc, 2006.Bocquet, M.: Ensemble Kalman filtering without the intrinsic need for
inflation, Nonlin. Processes Geophys., 18, 735–750,
10.5194/npg-18-735-2011, 2011.Bocquet, M. and Sakov, P.: Combining inflation-free and iterative ensemble
Kalman filters for strongly nonlinear systems, Nonlin. Processes Geophys.,
19, 383–399, 10.5194/npg-19-383-2012, 2012.Bocquet, M. and Sakov, P.: Joint state and parameter estimation with an
iterative ensemble Kalman smoother, Nonlin. Processes Geophys., 20, 803–818,
10.5194/npg-20-803-2013, 2013.
Bocquet, M. and Sakov, P.: An iterative ensemble Kalman smoother, Q. J.
Roy. Meteor. Soc., 140, 1521–1535, 2014.
Brankart, J.-M., Cosme, E., Testut, C.-E., Brasseur, P., and Verron, J.:
Efficient adaptive error parameterization for square root or ensemble Kalman
filters: application to the control of ocean mesoscale signals, Mon. Weather
Rev., 138, 932–950, 2010.
Buehner, M.: Ensemble-derived stationary and flow-dependent background-error
covariances: Evaluation in a quasi-operational NWP setting, Q. J. Roy.
Meteor. Soc., 131, 1013–1043, 2005.
Evensen, G.: Data Assimilation: The Ensemble Kalman Filter, 2nd Edn.,
Springer-Verlag Berlin Heildelberg, 2009.
Fletcher, S. J. and Zupanski, M.: A data assimilation method for log-normally
distributed observational errors, Q. J. Roy. Meteor. Soc., 132, 2505–2519,
2006.
Furrer, R. and Bengtsson, T.: Estimation of high-dimensional prior and
posterior covariance matrices in Kalman filter variants, J. Multivariate
Anal., 98, 227–255, 2007.
Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., and
Rubin, D. B.: Bayesian data analysis, 3rd Edn., Taylor & Francis, Boca
Raton, 2014.
Gurumoorthy, K. S., Grudzien, C., Apte, A., Carrassi, A., and Jones,
C. K. R. T.: Rank deficiency of Kalman error covariance matrices in linear
perfect model, preprint, arXiv:1503.05029, 2015.
Hamill, T. M. and Snyder, C.: A Hybrid Ensemble Kalman Filter–3D Variational
Analysis Scheme, Mon. Weather Rev., 128, 2905–2919, 2000.
Hamill, T. M., Whitaker, J. S., and Snyder, C.: Distance-dependent filtering
of background error covariance estimates in an ensemble Kalman filter, Mon.
Weather Rev., 129, 2776–2790, 2001.
Hannart, A. and Naveau, P.: Estimating high dimensional covariance matrices:
A new look at the Gaussian conjugate framework, J. Multivariate Anal., 131,
149–162, 2014.
Houtekamer, P. L. and Mitchell, H. L.: A sequential ensemble Kalman filter
for atmospheric data assimilation, Mon. Weather Rev., 129, 123–137, 2001.
Hunt, B. R., Kostelich, E. J., and Szunyogh, I.: Efficient data assimilation
for spatiotemporal chaos: A local ensemble transform Kalman filter,
Physica D, 230, 112–126, 2007.
Kuramato, Y. and Tsuzuki, T.: On the formation of dissipative structures in
reaction-diffusion systems: Reductive Perturbation Approach, Progress of
Theoretical Physics, 54, 687–699, 1975.
Li, H., Kalnay, E., and Miyoshi, T.: Simultaneous estimation of covariance
inflation and observation errors within an ensemble Kalman filter, Q. J. Roy.
Meteor. Soc., 135, 523–533, 2009.
Liang, X., Zheng, X., Zhang, S., Wu, G., Dai, Y., and Li, Y.: Maximum
likelihood estimation of inflation factors on error covariance matrices for
ensemble Kalman filter assimilation, Q. J. Roy. Meteor. Soc., 138,
263–273, 2012.
Lorenc, A. C.: The potential of the ensemble Kalman filter for NWP –
a comparison with 4D-Var, Q. J. Roy. Meteor. Soc., 118, 3183–3203, 2003.
Lorenz, E. N.: Deterministic nonperiodic flow, J. Atmos. Sci., 20, 130–141,
1963.
Lorenz, E. N.: Designing Chaotic Models, J. Atmos. Sci., 62, 1574–1587, 2005.
Lorenz, E. N. and Emanuel, K. E.: Optimal sites for supplementary weather
observations: simulation with a small model, J. Atmos. Sci., 55, 399–414,
1998.
Miyoshi, T.: The Gaussian Approach to Adaptive Covariance inflation and Its
Implementation with the Local Ensemble Transform Kalman Filter, Mon. Weather
Rev., 139, 1519–1535, 2011.
Myrseth, I. and Omre, H.: Hierarchical Ensemble Kalman Filter, SPE J., 15,
569–580, 2010.
Ng, G.-H. C., McLaughlin, D., Entekhabi, D., and Ahanin, A.: The role of
model dynamics in ensemble Kalman filter performance for chaotic systems,
Tellus A, 63, 958–977, 2011.
Ott, E., Hunt, B. R., Szunyogh, I., Zimin, A. V., Kostelich, E. J., Corazza,
M., Kalnay, E., Patil, D. J., and Yorke, A.: A local ensemble Kalman filter
for atmospheric data assimilation, Tellus A, 56, 415–428, 2004.Palatella, L. and Trevisan, A.: Interaction of Lyapunov vectors in the
formulation of the nonlinear extension of the Kalman fiter, Phys. Rev. E, 91,
042905, 10.1103/PhysRevE.91.042905, 2015.
Pham, D. T., Verron, J., and Roubaud, M.: A Singular Evolutive Extended
Kalman Filter for Data Assimilation in Oceanography, J. Marine Syst., 16,
323–340, 1998.
Raanes, P. N., Carrassi, A., and Bertino, L.: Extending the square root
method to account for additive forecast noise in ensemble methods, Mon.
Weather Rev., 143, 3857–38730, 2015.
Sacher, W. and Bartello, P.: Sampling Errors in Ensemble Kalman Filtering.
Part I: Theory, Mon. Weather Rev., 136, 3035–3049, 2008.
Sakov, P. and Bertino, L.: Relation between two common localisation methods
for the EnKF, Comput. Geosci., 15, 225–237, 2011.
Sakov, P. and Oke, P. R.: Implications of the Form of the Ensemble
Transformation in the Ensemble Square Root Filters, Mon. Weather Rev., 136,
1042–1053, 2008.
Sivashinsky, G. I.: Nonlinear analysis of hydrodynamic instability in laminar
flames – I. Derivation of basic equations, Acta Astronaut., 4, 1177–1206,
1977.
van Leeuwen, P. J.: Comment on Data Assimilation Using an Ensemble Kalman
Filter Technique, Mon. Weather Rev., 127, 1374–1377, 1999.
Wang, X. and Bishop, C. H.: A Comparison of Breeding and Ensemble Transform
Kalman Filter Ensemble Forecast Schemes, J. Atmos. Sci., 60, 1140–1158,
2003.Wang, X., Hamill, T. M., and Bishop, C. H.: A comparison of Hybrid Ensemble
Transform Kalman-Optimum Interpolation and Ensemble Square Root Filter
Analysis Schemes, Mon. Weather. Rev., 135, 1055–1076, 2007a.
Wang, X., Snyder, C., and Hamill, T. M.: On the Theoretical Equivalence of
Differently Proposed Ensemble-3DVAR Hybrid Analysis Schemes, Mon. Weather
Rev., 135, 222–227, 2007b.
Whitaker, J. S. and Hamill, T. M.: Evaluating Methods to Account for System
Errors in Ensemble Data Assimilation, Mon. Weather. Rev., 140, 3078–3089,
2012.Ying, M. and Zhang, F.: An adaptive covariance relaxation method for ensemble
data assimilation, Q. J. Roy. Meteor. Soc., 10.1002/qj.2576, online
first, 2015.
Zheng, X. G.: An adaptive estimation of forecast error covariance parameters
for Kalman filtering data assimilation, Adv. Atmos. Sci., 26, 154–160, 2009.
Zinn-Justin, J.: Quantum Field Theory and Critical Phenomena, International
Series of Monographs on Physics, Clarendon Press, Oxford, 2002.
Zupanski, M.: Maximum Likelihood Ensemble Filter: Theoretical Aspects, Mon.
Weather Rev., 133, 1710–1726, 2005.