Usually data assimilation methods evaluate observation-model misfits using
weighted

This paper proposes the adaptation of variational data assimilation for the use of such a measure. It provides a short introduction of optimal transport theory and discusses the importance of a proper choice of scalar product to compute the cost function gradient. It also extends the discussion to the way the descent is performed within the minimization process.

These algorithmic changes are tested on a nonlinear shallow-water model, leading to the conclusion that optimal transport-based data assimilation seems to be promising to capture position errors in the model trajectory.

Understanding and forecasting the evolution of a given system is a crucial
topic in an ever-increasing number of application domains. To achieve this
goal, one can rely on multiple sources of information, namely observations of
the system, numerical model describing its behavior and additional
a priori knowledge such as statistical information or previous
forecasts. To combine these heterogeneous sources of observation it is common
practice to use so-called data assimilation methods (e.g., see
reference books

The estimation of the different elements to be sought, the control vector, is performed using data assimilation through the comparison between the observations and their model counterparts. The control vector should be adjusted such that its model outputs would fit the observations, while taking into account that these observations are imperfect and corrupted by noise and errors.

Data assimilation methods are divided into three distinct classes. First,
there is statistical filtering based on Kalman filters. Then, there are variational
data assimilation methods based on optimal control theory. More recently
hybrids of both approaches have been developed

Thus, the cost function contains the misfit between the data (a priori and observations) and their control and model counterparts.
Minimizing the cost function aims at reaching a compromise in which these
errors are as small as possible. The errors can be decomposed into amplitude
and position errors. Position errors mean that the structural elements are
present in the data, but misplaced. Some methods have been proposed in order
to deal with position errors

A distance has to be chosen in order to compare the different data and
measure the misfits. Usually, a Euclidean distance is used, often weighted to
take into account the statistical errors. But Euclidean distances have
trouble capturing position errors. This is illustrated in Fig.

Wasserstein (

Optimal transport theory has been pioneered by

Optimal transport has a wide spectrum of applications: from pure mathematical
analysis on Riemannian spaces to applied economics; from functional
inequalities

Actual use of optimal transport in a variational data assimilation has been
proposed by

The goal of the paper is to perform variational data assimilation with a cost function written with the Wasserstein distance. It may be extended to other type of data assimilation methods such as filtering methods, which largely exceeds the scope of this paper.

The present paper is organized as follows: first, in Sect.

This section deals with the presentation of the variational data assimilation
concepts and method on the one hand and optimal transport and Wasserstein
distance concepts, principles and main theorems on the other hand. Section

This paper focuses on variational data assimilation in the framework of
initial state estimation. Let us assume that a system state is described by a
variable

Data assimilation aims to find a good estimate of

The distances to the observations

Euclidean distances, such as the

The essentials of optimal transport theory and Wasserstein distance required for data assimilation are presented.

We define, in this order, the space of mass functions where the Wasserstein distance is defined, then the Wasserstein distance and finally the Wasserstein scalar product, a key ingredient for variational assimilation.

We consider the case where the observations can be represented as positive
fields that we will call “mass functions”. A mass function is a
nonnegative function of space. For example, a grey-scaled image is a mass
function; it can be seen as a function of space to the interval

Let

Let us remark here that, in the mathematical framework of optimal transport,
mass functions are continuous and they are called “probability densities”.
In the data assimilation framework the concept of probability densities is
mostly used to represent errors. Here, the positive functions we consider
actually serve as

Given the set of all transportations between two mass functions, the optimal transport is the one minimizing the kinetic energy. A
transportation between two mass functions

Let us be clear here that the time

The Wasserstein distance

A remarkable property is that the optimal velocity field

A remarkable property of the Kantorovich potential allows the computation of the
Wasserstein distance, which is the Benamou–Brenier formula (see

The classical example for optimal transport is the transport of
Gaussian mass functions. For

Finally, a few words should be said about the numerical computation of the
Wasserstein distance. In one dimension, the optimal transport

For two- or three-dimensional problems, there exists no general formula for
the Wasserstein distance and more complex algorithms have to be used, such as
the (iterative) primal-dual one

The scalar product between two functions is required for data assimilation
and optimization: as we will recall later, the scalar product choice is used
to define the gradient value. This paper will consider the classical

Let us first recall that the Euclidean, or

The Wasserstein inner product

This section is our main contribution. First, we will consider the Wasserstein distance to compute the observation term of the cost function; second, we will discuss the choices of the scalar product and the gradient descent method and their impact on the assimilation algorithm efficiency.

In the framework of Sect.

The variables

As for the classical

To find the minimum of

If

The associated gradients are respectively denoted as

The following theorem allows the computation of both gradients of

For

(A proof of this Theorem can be found in Appendix

The adjoint

Note that the no-flux boundary condition assumption for

The minimizer of

We will now explain how to adapt the gradient descent to the optimal
transport framework. With the Wasserstein gradient Eq. (

The inconveniences of this iteration are twofold. First, for

For the gradient iteration, we choose the geodesic starting from

The comparison of Eqs. (

Comparison of iteration Eqs. (

Let us recall that in the data assimilation vocabulary, the word “analysis” refers to the minimizer of the cost function at the end of the data assimilation process.

In this section the analyses resulting from the minimization of the
Wasserstein cost function defined previously in Eq. (

The experiments are all one-dimensional and

Only a single variable is controlled. This variable

In this paper we chose to work in the twin experiments framework. In this
context the true state, denoted

Both the Wasserstein Eq. (

The first example involves a linear evolution model as

As expected in the introduction, see e.g., Fig.

The issue of amplitude of the analysis of

Both of the algorithms (DG2) and (DG#) give the
same analysis – the minimum of

Decreasing of

As a conclusion of this first test case, we managed to write and minimize a
cost function which gives a relevant analysis, contrary to what we obtain
with the classical Euclidean cost function, in the case of position errors. We
also noticed that the success of the minimization of

Further results are shown when a nonlinear model is used in place of

The true state is

Data assimilation is performed by minimizing either the

In Fig.

Figure

The conclusion of this second test case is that, even with nonlinear models, our Wasserstein-based algorithm can give interesting results in the case of position errors.

In this section, a noise in position and shape has been added in the
observations. This type of noise typically occurs in images from satellites.
For example, Fig.

Analyses of this noisy experiment using

For the

For the Wasserstein cost function, analyses

This example shows that the Wasserstein cost function is more robust than

We showed through some examples that, if not taken into account, position errors can lead to unrealistic initial conditions when using classical variational data assimilation methods. Indeed, such methods use the Euclidean distance which can behave poorly under position errors. To tackle this issue, we proposed instead the use of the Wasserstein distance to define the related cost function. The associated minimization algorithm was discussed and we showed that using descent iterations following Wasserstein geodesics leads to more consistent results.

On academic examples the corresponding cost function produces an analysis lying close to the Wasserstein average between the true and background states, and therefore has the same shape as them, and is well fit to correct position errors. This also gives more realistic predictions. This is a preliminary study and some issues have yet to be addressed for realistic applications, such as relaxing the constant-mass and positivity hypotheses and extending the problem to 2-D applications.

Also, the interesting question of transposing this work into the filtering community (Kalman filter, EnKF; particle filters; etc.) raises the issue of writing a probabilistic interpretation of the Wasserstein cost function, which is out of the scope of our study for now.

In particular the important theoretical aspect of the representation of error
statistics still needs to be thoroughly studied. Indeed classical
implementations of variational data assimilation generally make use of

No data sets were used in this article.

To prove the Theorem section, one first needs to differentiate
the Wasserstein distance. The following lemma from

Let

Proof of the Theorem section.
Let

To get the Wasserstein gradient of

The last equality comes from Stokes' theorem and from the fact that

The authors declare that they have no conflict of interest.

The authors would like to thank the anonymous reviewers and the editor, whose comments helped to improve the paper, and Christopher Eldred for his editing. Nelson Feyeux is supported by the Région Rhône Alpes Auvergne through the ARC3 Environment PhD fellowship program. Edited by: Olivier Talagrand Reviewed by: two anonymous referees