Journal topic
Nonlin. Processes Geophys., 27, 209–234, 2020
https://doi.org/10.5194/npg-27-209-2020
Nonlin. Processes Geophys., 27, 209–234, 2020
https://doi.org/10.5194/npg-27-209-2020

Research article 16 Apr 2020

Research article | 16 Apr 2020

# Data-driven versus self-similar parameterizations for stochastic advection by Lie transport and location uncertainty

Data-driven versus self-similar parameterizations for stochastic advection by Lie transport and location uncertainty
Valentin Resseguier1, Wei Pan2, and Baylor Fox-Kemper3,4 Valentin Resseguier et al.
• 1Lab, SCALIAN DS, Espace Nobel, 2 Allée de Becquerel, 35700 Rennes, France
• 2Department of Mathematics, Imperial College London, London SW7 2AZ, UK
• 3Department of Earth, Environmental and Planetary Sciences (DEEPS), Brown University, Providence, RI 02912, USA
• 4Institute at Brown for Environment and Society (IBES), Brown University, Providence, RI 02912, USA

Correspondence: Valentin Resseguier (valentin.resseguier@scalian.com)

Abstract

Stochastic subgrid parameterizations enable ensemble forecasts of fluid dynamic systems and ultimately accurate data assimilation (DA). Stochastic advection by Lie transport (SALT) and models under location uncertainty (LU) are recent and similar physically based stochastic schemes. SALT dynamics conserve helicity, whereas LU models conserve kinetic energy (KE). After highlighting general similarities between LU and SALT frameworks, this paper focuses on their common challenge: the parameterization choice. We compare uncertainty quantification skills of a stationary heterogeneous data-driven parameterization and a non-stationary homogeneous self-similar parameterization. For stationary, homogeneous surface quasi-geostrophic (SQG; QG) turbulence, both parameterizations lead to high-quality ensemble forecasts. This paper also discusses a heterogeneous adaptation of the homogeneous parameterization targeted at a better simulation of strong straight buoyancy fronts.

1 Introduction

Geophysical fluid dynamics and other turbulent flows cover a wide range of spatial and temporal scales (see, e.g. Fox-Kemper2018). However, numerical simulations and observations are limited in the range of scales that can be simulated or observed. Accordingly, deterministic subgrid models or parameterizations and regularizations (e.g. nugget in optimal interpolation, Tandeo et al.2014) are incorporated into numerical simulations and measurement processes. The limited resolution of these numerical simulations introduces unknown errors which are continuously amplified during the course of the simulation and which compound with the measurement errors to the extent that the simulations model the true chaotic nature of fluid dynamics. Improvements in accuracy of these simulations can sometimes be made by judicious use of data assimilation (DA) techniques. In the data assimilation process, observations (data) are integrated into the model state of the numerical simulation limited by imperfect physics and imperfect initial conditions. DA proceeds by alternating between forecast and analysis cycles. In each analysis cycle, observations are combined with the results from a prediction model (the forecast) to produce a so-called analysis. The analysis is expected to be more accurate than either predictions based on physical models without incorporating observations or predictions from physics-free interpolations of the observations. In order to optimize the analysis cycle, confidence in both the observations and the prediction needs to be quantified. Observational accuracy depends on the instrumental precision and sampling; and well-known uncertainty quantification (UQ) techniques are available to estimate the uncertainty associated with these limitations. In contrast, the uncertainty due to imperfect model physics is more difficult to quantify because often a “perfect model”, exact solution, or direct numerical simulation is not available or feasible for the intended application.

Historically introduced to address the limited range of scales simulated (e.g. Orszag1970; Leith1971), stochastic subgrid parameterization is now a widely used UQ tool for assessing the impact of imperfect physics in numerical simulations. In particular, it is generally thought that building such stochastic parameterizations from physical principles promises accuracy and robustness . Several authors have addressed the issue of stochastic subgrid parameterization evaluation through averaging or homogenization procedures (e.g. Majda–Timofeyev–Vanden-Eijnden – MTV – method; see Majda et al.1999). Correlated additive and multiplicative noises usually play a key role in these approaches (e.g. Gottwald and Melbourne2013). , , and highlight the relevance of skew-symmetric and multiplicative noises in geophysical fluid dynamic applications. Mimetic symmetries built into the fluid dynamics and stochastic subgrid model formulations may preserve some important physical invariants – a desired properties of many UQ models and DA algorithms . Other stochastic subgrid parameterizations involve memory terms, justified by the Mori–Zwanzig equations .

The dynamics under location uncertainty (LU) are a type of systematic stochastic subgrid parameterization introduced by and for the theory of well-posedness of stochastic partial differential equations and by for more applied purposes. More theoretical results about models under location uncertainty are discussed in . This framework has been successfully applied to uncertainty quantification , geophysics , attractor exploration , and optical flow .

A similar class of models called stochastic advection by Lie transport (SALT) was first derived in Holm (2015) for the time-homogeneous subgrid parameterization and extended to the inhomogeneous case in . In the following, we will refer to this class of models as SALT. The approach used is Hamilton's variation principle with the constraint that fluid parcels move according to a Eulerian stochastic subgrid model; see Holm (2015) and . A local well-posedness result for the 3D SALT Euler equations is proved in , and a global well-posedness result for the 2D case is proved in . Numerical studies for the implementation of the SALT subgrid parameterization are discussed in for a 2D Euler model and two-layer 2D QG dynamics. A first study on employing SALT dynamics for data assimilation is provided in .

In both LU and SALT frameworks, the velocity is decomposed into a random large-scale component, w, and a time-uncorrelated component, $\mathbit{\sigma }\stackrel{\mathrm{˙}}{\mathbit{B}}=\mathbit{\sigma }\mathrm{d}{\mathbit{B}}_{t}/\mathrm{d}t$. The latter is Gaussian (conditionally on some large-scale quantities), correlated in space, with possible heterogeneities and anisotropy. Hereafter, this unresolved velocity component will further be assumed to be solenoidal. To parameterize those spatial correlations, we apply an infinite-dimensional linear operator, σ, to a d-dimensional space–time white noise, $\stackrel{\mathrm{˙}}{\mathbit{B}}$. Holm and coauthors rely on a different but equivalent notation. It directly explicates the spectral decomposition of the time-uncorrelated velocity spatial covariance. Specifically, the small-scale velocity reads $\mathbit{\sigma }\left(\mathbit{x}\right)\stackrel{\mathrm{˙}}{\mathbit{B}}={\sum }_{p}{\mathbit{\xi }}_{p}\left(\mathbit{x}\right)\mathrm{d}{W}_{p}\left(t\right)/\mathrm{d}t$, where (Wp)p is a set of independent 1D Brownian motions and (ξp)p is a set of orthogonal functions.

That unresolved velocity is the central object of both SALT and LU approaches. The statistics of this velocity – defined either by σ or (ξp)p – constitute the parameterization of the random model. In this paper, we discuss several choices for this parameterization and dedicate significant discussion to the physical principles and assumptions about the flow regime underlying each choice. To better motivate this work, we will highlight the strong links and some formal differences between the LU and the SALT approaches. However, a full – theoretical and numerical – comparison of LU and SALT is beyond the scope of this paper.

The paper is structured as follows.

• Section 2.1 focuses on parametric non-data-driven choices. Specifically, we propose a non-stationary and tuning-free improvement of the work of based on self-similar assumptions and possible heterogeneous modulation. Links with the Smagorinsky subgrid parameterization and non-linear energy fluxes are also discussed.

• Section 2.2 discusses a data-driven stationary but possibly heterogeneous parameterization implemented by . The method relies on decomposition of empirical orthogonal functions (EOF) also known as proper orthogonal decomposition (POD) or principal component analysis (PCA).

• Section 3 begins by presenting the deterministic and stochastic surface quasi-geostrophic (SQG) models, their numerical setup, and simulation parameters. These are followed by detailed discussions of uncertainty quantification skills in Sect. 3.4.

• Sections 4 and 5 concludes this work.

Additionally, in Appendix A we include a mathematical summary of LU and SALT formulations for the interested readers. The aim here is to highlight the similarities and differences between the two approaches. Their differing sets of invariants are listed.

The choice of surface quasi-geostrophic flow is convenient, since the SALT and LU versions of the SQG dynamics coincide; thus the results presented apply to both SALT and LU models. Furthermore, present interest in oceanic submesoscale motions for which SQG dynamics are often a reasonable approximation and the fact that SQG offers the possibility of reconstructing subsurface flow properties from surface satellite observations have resulted in a recent surge in the use of the SQG equations for geophysics. Accordingly, the discussion can focus on the parameterization choices in this timely application.

2 Parameterization of SALT and LU models: the statistics of the unresolved velocity

## 2.1 Parametric and self-similar model for the unresolved velocity

In this section, we propose parametric non-data-driven models for the unresolved velocity statistics. Similar to the Kraichnan model , introduce an unresolved velocity which is homogeneous and isotropic. Even though this method leads to good numerical results, some parameters need to be tuned. To overcome this drawback, we propose here a parameter-free improvement based on what we call the absolute diffusivity spectral density (ADSD). New subgrid statistics will be defined at each time step from the resolved velocity kinetic energy (KE) spectrum. Finally, we also propose a heterogeneous modulation of the previous method based on the local energy flux and discuss the link with Smagorinsky-like subgrid parameterizations.

### 2.1.1 Parameter-free and non-stationary homogeneous model

We first define the statistics of the small-scale velocity through what we call the ADSD. Then, we explicate how to sample the unresolved velocity from this ADSD.

In , the absolute diffusivity (i.e. KE times correlation time; ) of the unresolved velocity is twice the variance tensor trace, $\mathrm{tr}\left(\mathbit{a}\right)=\mathbb{E}{∥\mathbit{\sigma }\mathrm{d}{\mathbit{B}}_{t}∥}^{\mathrm{2}}/\mathrm{d}t$, whereas the unresolved kinetic energy is tr(a)∕dt. Clearly, this kinetic energy has no physical meaning. Indeed, it depends on the simulation time step, and one should have the possibility to choose the time step as close as possible to zero. Thus, the unresolved velocity amplitude is specified through an absolute diffusivity rather than through a KE. In the mathematics literature of homogenization, Kubo-type formulas may be seen as what physicists call absolute diffusivities. More generally, since the variance of a time-continuous white noise is infinite, it is more relevant to deal with absolute diffusivity rather than kinetic energy in order to describe the statistics of the time-uncorrelated velocity. Thus, keeping a spectral approach, we define – for any spatiotemporal field – an absolute diffusivity spectral density denoted A(κ) at the wavenumber κ. We will rely on this ADSD rather than on the KE spectrum, E(κ). Since the absolute diffusivity is the variance multiplied by the correlation time, it is naturally to define the ADSD as follows:

$\begin{array}{}\text{(1)}& A\left(\mathit{\kappa }\right)\stackrel{\mathrm{△}}{=}E\left(\mathit{\kappa }\right)\mathit{\tau }\left(\mathit{\kappa }\right),\phantom{\rule{0.25em}{0ex}}\phantom{\rule{0.25em}{0ex}}\text{where}\phantom{\rule{0.25em}{0ex}}\phantom{\rule{0.25em}{0ex}}\mathit{\tau }\left(k\right)=\frac{\mathrm{1}/\mathit{\kappa }}{{v}_{\mathit{\kappa }}}=\frac{\mathrm{1}/\mathit{\kappa }}{\sqrt{\mathit{\kappa }E\left(\mathit{\kappa }\right)}}\end{array}$

is the eddy turnover time at the scale 1∕κ and vκ is at characteristic velocity at this scale. Accordingly, we have

$\begin{array}{}\text{(2)}& A\left(\mathit{\kappa }\right)={\mathit{\kappa }}^{-\mathrm{3}/\mathrm{2}}{E}^{\mathrm{1}/\mathrm{2}}\left(\mathit{\kappa }\right).\end{array}$

If in addition we assume a KE self-similar distribution,

$\begin{array}{}\text{(3)}& E\left(\mathit{\kappa }\right)={C}^{\mathrm{2}}{\mathit{\kappa }}^{-s},\end{array}$

and we obtain

$\begin{array}{}\text{(4)}& A\left(\mathit{\kappa }\right)=C{\mathit{\kappa }}^{-r},\end{array}$

where $r=\left(s+\mathrm{3}\right)/\mathrm{2}$.

We aim at defining the unresolved velocity ADSD from the large-scale velocity. For this purpose, we will assume the self-similar model (Eq. 4) is valid at all spatial scales. At each time step, we compute the ADSD of the large-scale velocity, Aw, from Eq. (2). Then, we fit the coefficients C and r of Eq. (4), as illustrated by Fig. 1. Let us denote these coefficients with Cw and rw. Note that they are time-dependent because they depend on w which is. More precisely, we estimate the coefficients Cw and rw in a wavenumber interval which approximately represents a inertial range of fully resolved scales (i.e. before the spectrum roll-off). In the left panel of Fig. 1, this interval is delimited by two vertical lines.

Figure 1Buoyancy field (m s−2(a, b), KE spectrum (m2 s${}^{-\mathrm{2}}/$(rad m−1)) (c, d), and ADSD (m2 s${}^{-\mathrm{1}}/$(rad m−1)) (e, f) of the resolved velocity, at t=0 d (a, ce) and t=15 d (b, d, f), in blue; ADSD of the unresolved velocity (without a multiplicative constant), in green; and slope $-\frac{\mathrm{5}}{\mathrm{3}}$ (c, d) and corresponding ADSD (e, f) in a black solid line. The two dashed vertical lines define the interval where coefficients Cw and rw are fitted. The right-hand-side dashed vertical line is fixed (see Appendix B for its specification). The left-hand-side dashed line corresponds to the energy-injecting scale. This scale is estimated – at each time step – as corresponding to the maximum of the compensated KE spectrum (i.e. KE spectrum divided by a theoretical spectrum). The dashed oblique line is the resulting fit (it is set to meet Aw in the left bound of the fitting interval). The unresolved velocity is still restricted to a narrow spectral band, specifically between the dotted dashed line and the right end of the plot. Nevertheless, the form of its spectrum varies in time and is determined by the spectrum of the large-scale velocity. This particular initial condition corresponds to case 2 studied by at a resolution of 1282.

From there, we can define the unresolved velocity ADSD in such a way that the total velocity – resolved plus unresolved – meets Eq. (4) at small spatial scales with the same coefficients (Cw and rw). Since the two velocity components are not correlated, the total ADSD is the sum of the ADSD of each velocity component. In the inertial range, this reads

$\begin{array}{}\text{(5)}& A\left(\mathit{\kappa }\right)=\underset{\begin{array}{c}\text{Leading at}\\ \text{large scales}\end{array}}{\underbrace{{A}_{\mathbit{w}}\left(\mathit{\kappa }\right)}}+\underset{\begin{array}{c}\text{Leading at}\\ \text{small scales}\end{array}}{\underbrace{{A}_{\mathbit{\sigma }\stackrel{\mathrm{˙}}{\mathbit{B}}}\left(\mathit{\kappa }\right)}}\approx {C}_{\mathbit{w}}{\mathit{\kappa }}^{-{r}_{\mathbit{w}}}.\end{array}$

Therefore, the unresolved ADSD is chosen to compensate the resolved ADSD roll-off – introduced by the deterministic subgrid parameterization – at small scales. Specifically, the unresolved ADSD is set to

$\begin{array}{}\text{(6)}& {A}_{\mathbit{\sigma }}\stackrel{\mathrm{˙}}{\mathbit{B}}\left(\mathit{\kappa }\right)=max\left(\mathrm{0},{C}_{\mathbit{w}}{\mathit{\kappa }}^{-{r}_{\mathbit{w}}}-{A}_{\mathbit{w}}\left(\mathit{\kappa }\right)\right){f}_{\mathrm{BP}}^{\mathrm{2}}\left(\mathit{\kappa }\right).\end{array}$

As previously, ${f}_{\mathrm{BP}}^{\mathrm{2}}$ is a band-pass filter between κm and κM. In practice, we set κM to the theoretical resolution,

$\begin{array}{}\text{(7)}& {\mathit{\kappa }}_{M}=\frac{\mathit{\pi }}{\mathrm{\Delta }x},\end{array}$

and κm to the effective resolution, which is estimated as follows:

$\begin{array}{}\text{(8)}& {\mathit{\kappa }}_{m}={\left(\frac{\mathrm{ln}\left(\mathrm{0.95}\right)}{\mathrm{ln}\left(\mathrm{0.1}\right)}\right)}^{\mathrm{1}/p}{\mathit{\kappa }}_{M},\end{array}$

where p is the order of the Laplacian used as a deterministic subgrid tensor (i.e. $\frac{Db}{Dt}=-\mathit{\nu }\left(-\mathrm{\Delta }{\right)}^{p}b$). The justification of the above formula is left in Appendix B. Compared to the work of , the value of κm is less critical. Indeed, Eq. (6) implies a weaker unresolved ADSD at larger scales where the resolved ADSD, Aw, is stronger. This softens the threshold effect introduced by the band-pass filter fBP.

In practice, we set an upper bound for the estimation of rw. Without this upper bound, a concentration of energy at relatively large wavenumbers – scales smaller than κm – in the resolved fields can become unstable. Indeed, this localized energy concentration would decrease the rw estimation, and hence increase the unresolved ADSD ${\mathbf{A}}_{\mathbit{\sigma }\stackrel{\mathrm{˙}}{\mathbit{B}}}$ through Eq. (6) at large wavenumbers – larger than κm. This implies a larger noise intake, which can induce a larger concentration of energy at relatively large wavenumbers in the resolved fields, resulting in a positive feedback loop. To prevent these unphysical instabilities, the slope rw is bounded.

We have proposed a way to compute the unresolved ADSD ${A}_{\mathbit{\sigma }\stackrel{\mathrm{˙}}{\mathbit{B}}}$ from the large-scale velocity statistics. From that ADSD, it is possible to sample the unresolved velocity, as explained in the following.

A 2D homogeneous divergence-free small-scale velocity can be constructed by filtering a 1D white noise $\stackrel{\mathrm{˙}}{B}$ with a 2D divergence-free filter $\stackrel{\mathrm{˘}}{\mathbit{\sigma }}={\mathbf{\nabla }}^{\perp }{\stackrel{\mathrm{˘}}{\mathit{\psi }}}_{\mathit{\sigma }}$ .

$\begin{array}{}\text{(9)}& \mathbit{\sigma }\stackrel{\mathrm{˙}}{\mathbit{B}}=\stackrel{\mathrm{˘}}{\mathbit{\sigma }}\star \stackrel{\mathrm{˙}}{B}={\mathbf{\nabla }}^{\perp }{\stackrel{\mathrm{˘}}{\mathit{\psi }}}_{\mathit{\sigma }}\star \stackrel{\mathrm{˙}}{B},\end{array}$

where denotes a spatial convolution. This velocity field can be easily sampled in Fourier space.

$\begin{array}{}\text{(10)}& \stackrel{\mathrm{^}}{\mathbit{\sigma }\stackrel{\mathrm{˙}}{\mathbit{B}}}\left(\mathbit{k}\right)=i{\mathbit{k}}^{\perp }\stackrel{\mathrm{^}}{{\stackrel{\mathrm{˘}}{\mathit{\psi }}}_{\mathit{\sigma }}}\left(\mathbit{k}\right)\stackrel{\mathrm{^}}{\stackrel{\mathrm{˙}}{B}}\left(\mathbit{k}\right)=\frac{\mathrm{1}}{\sqrt{\mathrm{\Delta }t}}i{\mathbit{k}}^{\perp }\stackrel{\mathrm{^}}{{\stackrel{\mathrm{˘}}{\mathit{\psi }}}_{\mathit{\sigma }}}\left(‖\mathbit{k}‖\right)\stackrel{\mathrm{^}}{\frac{\mathrm{d}{B}_{t}}{\sqrt{\mathrm{\Delta }t}}}\left(\mathbit{k}\right),\end{array}$

where ${\stackrel{\mathrm{^}}{\mathrm{d}B}}_{t}$ is the spatial Fourier transform of dBt and $\frac{\mathrm{d}{B}_{t}}{\sqrt{\mathrm{\Delta }t}}$ is a discrete scalar white-noise process of unit variance in space and time. Thus, the small-scale velocity is defined by the Fourier transform of the streamfunction kernel, $\stackrel{\mathrm{^}}{{\stackrel{\mathrm{˘}}{\mathit{\psi }}}_{\mathit{\sigma }}}$.

In order to link the unresolved ADSD to the kernel $\stackrel{\mathrm{˘}}{\mathbit{\sigma }}$ which defines the unresolved velocity, we note that

$\begin{array}{}\text{(11)}& \begin{array}{rl}{A}_{\mathbit{\sigma }\stackrel{\mathrm{˙}}{\mathbit{B}}}\left(\mathit{\kappa }\right)& ={E}_{\mathbit{\sigma }\stackrel{\mathrm{˙}}{\mathbit{B}}}\left(\mathit{\kappa }\right)\mathrm{d}t=\frac{\mathrm{1}}{\mathit{\mu }\left(\mathrm{\Omega }\right)}\underset{\left[\mathrm{0},\mathrm{2}\mathit{\pi }\right]}{\oint }\mathrm{d}{\mathit{\theta }}_{\mathbit{k}}\mathit{\kappa }{∥\stackrel{\mathrm{^}}{\stackrel{\mathrm{˘}}{\mathbit{\sigma }}}\left(t,\mathit{\kappa }\right)∥}^{\mathrm{2}}\\ & =\mathrm{2}\mathit{\pi }{\mathit{\kappa }}^{\mathrm{3}}{\left|\stackrel{\mathrm{^}}{{\stackrel{\mathrm{˘}}{\mathit{\psi }}}_{\mathit{\sigma }}}\left(t,\mathit{\kappa }\right)\right|}^{\mathrm{2}},\end{array}\end{array}$

where μ(Ω) is the surface of the spatial domain Ω and θk is the angle of the wave vector k. From Eqs. (6)–(11), we can finally express the unresolved velocity as follows:

$\begin{array}{}\text{(12)}& \begin{array}{rl}& \stackrel{\mathrm{^}}{\mathbit{\sigma }\stackrel{\mathrm{˙}}{\mathbit{B}}}\left(t,\mathbit{k}\right)\\ & =\frac{\mathrm{1}}{\sqrt{\mathrm{\Delta }t}}i{\mathbit{k}}^{\perp }\stackrel{\mathrm{^}}{{\stackrel{\mathrm{˘}}{\mathit{\psi }}}_{\mathit{\sigma }}}\left(t,‖\mathbit{k}‖\right)\stackrel{\mathrm{^}}{\frac{\mathrm{d}{B}_{t}}{\sqrt{\mathrm{\Delta }t}}}\left(\mathbit{k}\right),\end{array}\text{(13)}& \begin{array}{rl}& =\frac{\mathrm{1}}{\sqrt{\mathrm{\Delta }t}}i{\mathbit{k}}^{\perp }\sqrt{\frac{max\left(\mathrm{0},{C}_{\mathbit{w}}‖\mathbit{k}{‖}^{-{r}_{\mathbit{w}}}-{A}_{\mathbit{w}}\left(‖\mathbit{k}‖\right)\right)}{\mathrm{2}\mathit{\pi }‖\mathbit{k}{‖}^{\mathrm{3}}}}\\ & \phantom{\rule{0.25em}{0ex}}\phantom{\rule{0.25em}{0ex}}\phantom{\rule{0.25em}{0ex}}\phantom{\rule{0.25em}{0ex}}{f}_{\mathrm{BP}}\left(‖\mathbit{k}‖\right)\stackrel{\mathrm{^}}{\frac{\mathrm{d}{B}_{t}}{\sqrt{\mathrm{\Delta }t}}}\left(\mathbit{k}\right).\end{array}\end{array}$

Again the simulated unresolved velocity ADSD is physically relevant, while the KE spectrum is not. Indeed, the simulated unresolved velocity ADSD is expected to match the true (time-correlated) unresolved velocity ADSD, whereas the KE spectra of the simulated and true unresolved velocities differ. Indeed, the true unresolved velocity correlation time spectral distribution τ(κ) is not restricted to the time step dt.

The tuning-free Eq. (12) is attractive because it is not a priori restricted to stationary flows or to a particular type of 2D incompressible dynamics. In order to appreciate that, we can numerically compare it with the previous method of . For stationary, fully developed turbulence and after including a tuning stage to optimize the match, both parameterizations give approximately the same results. Therefore, in order to perceive improvements between the two simulations, we must either work with non-stationary flows or with erroneous tuning. This latter scenario is of course the meaningful one for the application of such models to the real world, where turbulence is non-stationary and heterogeneous, and so region-by-region and season-by-season tuning is impossible to do with quality checks in place.

We believe that non-stationarity may yield a fair, yet idealized, comparison. So, we here compare the two parameterizations in a non-stationary case. Figure 2 below shows simulated buoyancy fields initialized with “case 2” of and corresponding errors. After 2 d of advection, there is no turbulence yet, and a 128×128 resolution is sufficient to correctly resolve every scale. Therefore, no stochastic subgrid parameterization is needed. The self-similar method automatically adapts to the situation, whereas the method of introduces spurious buoyancy isoline roughness by randomly folding the isolines. Accordingly, the method of introduces more errors.

Figure 2Buoyancy field (m s−2(a, b, c) and corresponding normalized error (dimensionless) (d, e) after 1 d of advection for the 1024×1024 reference deterministic simulation (a), the 128×128 LU simulation with the self-similar parameterization (b, d), and the 128×128 LU simulation with the method of  (c, e).

### 2.1.2 Heterogeneity

Our stochastic parameterization randomly folds tracer isolines. This process is often desirable. For instance, it can trigger physically relevant instabilities, such as the filament instabilities highlighted by . After these instabilities have been randomly triggered, eddies are formed by non-linear processes. In a similar way, Figs. 3 and 4 show that our stochastic dynamics enable a more-realistic eddy distribution than deterministic simulations. However, a homogeneous small-scale velocity may also perturb the tracer isolines which should remain still (i.e. which remain still in high-resolution deterministic simulations), e.g. sharp, straight, and coherent fronts. Figure 4 also highlights this drawback. A typical application of this problem in more realistic flow simulations is the simulation of jets like the Gulf Stream and regions of the Antarctic Circumpolar Current. These real-world jets are associated with diffusivity suppression , and this effect is not present in our formulation so far. If we seek to preferentially perturb some tracer gradients, a heterogeneous small-scale velocity is required. Note that the heterogeneity discussed here needs to be non-stationary and thus cannot be represented by the stationary EOF presented later in this paper. Besides, in a small ensemble of realizations, relevant heterogeneity of the small scales may make the spreading more accurate for UQ and enable comparable ensemble forecast accuracy with fewer members. We here propose a possible heterogeneous version of the previous method.

Figure 3Presented from left to right are the buoyancy field (m s−2), KE spectrum (m2 s${}^{-\mathrm{2}}/$(rad m−1)), ADSD (m2 s${}^{-\mathrm{1}}/$(rad m−1)), and variance tensor at t=15 d for (from top to bottom) a 10242-resolution deterministic dynamics and a 1282-resolution deterministic dynamics. This particular initial condition corresponds to case 2 studied by at a resolution of 1282. The low-resolution deterministic dynamics shows too many filaments and not enough eddies.

Figure 4Presented from left to right are the buoyancy field (m s−2), KE spectrum (m2 s${}^{-\mathrm{2}}/$(rad m−1)), ADSD (m2 s${}^{-\mathrm{1}}/$(rad m−1)), and variance tensor at t=15 d for 1282-resolution stochastic dynamics with (from top to bottom) homogeneous unresolved velocity, unresolved velocity modulated by $‖\mathbf{\nabla }b{‖}^{\mathrm{1}/\mathrm{4}}$, and unresolved velocity modulated by ${\mathit{ϵ}}_{\mathrm{F}}^{\mathrm{1}/\mathrm{6}}$. This particular initial condition corresponds to case 2 studied by at a resolution of 1282. The low-resolution stochastic simulations do not show too many filaments and show enough eddies. Among the stochastic simulations, the energy-flux modulation better preserves the sharp straight fronts (e.g. front from (0 m, 2×105 m) to (2.5×105 m, 5×105 m)).

In order to obtain a heterogeneous model of the unresolved velocity, we need a heterogeneous version of the ADSD (Eq. 4). Since the wavenumber, κ, cannot depend on the position, x, the constant, Cw, and/or the spectrum slope, r, should do. A spatially varying spectrum slope is probably difficult to estimate. Hence, we restrict ourselves to a spatially varying constant, Cw, and a spatially homogeneous spectrum slope. The constant may also varies with the time and the wavenumber. According to the Kolmogorov theory (e.g. Frisch1995) and Eq. (3):

$\begin{array}{}\text{(14)}& {C}_{\mathbit{w}}\left(\mathbit{x},t,\mathit{\kappa }\right)=\text{cst.}{\mathit{ϵ}}_{\mathrm{F}}^{p}\left(\mathbit{x},\mathit{\kappa }\right),\end{array}$

where ϵF is the energy flux through the spatial scales and $p=\mathrm{1}/\mathrm{3}$ for a SQG flow (cst. is an abbreviation for constant). More specifically, the energy flux describes the energy moving from scales larger than 1∕κ toward smaller scales and can be computed as follows:

$\begin{array}{}\text{(15)}& {\mathit{ϵ}}_{\mathrm{F}}\left(\mathbit{x},t,\mathit{\kappa }\right)\stackrel{\mathrm{△}}{=}\stackrel{\mathrm{‾}}{{q}_{\mathit{\kappa }}^{<}{\left(\left(\mathbit{w}\mathbf{\cdot }\mathbf{\nabla }\right)q\right)}_{\mathit{\kappa }}^{<}},\end{array}$

where q is the transported (up to possible source terms) quantity, ${g}_{\mathit{\kappa }}^{<}$ is the low-pass filtered version of g (setting to zero the Fourier modes of g which have frequencies larger than κ), and $\stackrel{\mathrm{‾}}{•}$ stands for the spatial average (Frisch1995). For a SQG flow, q corresponds to the buoyancy normalized by the stratification: $q=b/N$. The energy flux is essentially a third-order moment. It is very important because it describes the cascade of the flow by non-linear energy transfers (Frisch1995).

If the energy flux through scale is understood locally in space (as indeed , also assumes), Eq. (14) provides a natural parameterization of the unresolved velocity heterogeneities. We simply modulate the unresolved ADSD (Eq. 6) by the heterogeneous ratio ${\mathit{ϵ}}_{\mathrm{F}}^{p}/\stackrel{\mathrm{‾}}{{\mathit{ϵ}}_{\mathrm{F}}^{p}}$, averaged over the resolved inertial range wavenumbers.

$\begin{array}{}\text{(16)}& \begin{array}{rl}\stackrel{\mathrm{^}}{\stackrel{\mathrm{̃}}{\mathbit{\sigma }\stackrel{\mathrm{˙}}{\mathbit{B}}}}\left(t,\mathbit{k}\right)& \phantom{\rule{0.25em}{0ex}}=\frac{\mathrm{1}}{\sqrt{\mathrm{\Delta }t}}i{\mathbit{k}}^{\perp }\sqrt{\frac{max\left(\mathrm{0},{C}_{\mathbit{w}}‖\mathbit{k}{‖}^{-{r}_{\mathbit{w}}}-{A}_{\mathbit{w}}\left(‖\mathbit{k}‖\right)\right)}{\mathrm{2}\mathit{\pi }‖\mathbit{k}{‖}^{\mathrm{3}}}}\\ & {f}_{\mathrm{BP}}\left(‖\mathbit{k}‖\right)\phantom{\rule{0.33em}{0ex}}\stackrel{\mathrm{^}}{\frac{\mathrm{d}{B}_{t}}{\sqrt{\mathrm{\Delta }t}}}\left(\mathbit{k}\right),\end{array}\text{(17)}& \mathbit{\sigma }\stackrel{\mathrm{˙}}{\mathbit{B}}\left(t,\mathbit{x}\right)=\mathcal{P}\mathit{\left\{}\underset{\begin{array}{c}\text{Heterogeneous}\\ \text{modulation}\end{array}}{\underbrace{\sqrt{\frac{{\mathit{ϵ}}_{\mathrm{F}}^{p}\left(t,\mathbit{x}\right)}{\stackrel{\mathrm{‾}}{{\mathit{ϵ}}_{\mathrm{F}}^{p}}\left(t\right)}}}}\underset{\begin{array}{c}\text{Homogeneous}\\ \text{velocity}\end{array}}{\underbrace{\stackrel{\mathrm{̃}}{\mathbit{\sigma }\stackrel{\mathrm{˙}}{\mathbit{B}}}\left(t,\mathbit{x}\right)}}\mathit{\right\}},\end{array}$

where $\mathcal{P}={\mathbf{I}}_{\mathrm{d}}-{\mathrm{\Delta }}^{-\mathrm{1}}\mathbf{\nabla }{\mathbf{\nabla }}^{T}$ is the projector onto the space of free-divergence functions. This parameterization is physically meaningful, since, locally in space, a stronger direct cascade at large scales (larger ϵF and thus larger Cw) suggests that the unresolved velocity (large) should maintain this cascade by folding smaller-scale tracer structures. Furthermore, considering that the energy flux is a third-order structure makes this parameterization relevant to differentiate between strait fronts and curved structures (e.g. eddies). Indeed, at least three points are needed to define a curvature and differentiate between these structures. Figure 4 confirms that this modulation enables a more accurate spatial distribution of the stochastic folding.

In order to keep a divergence-free velocity and the ensuing properties (e.g. energy conservation), the modulated velocity is projected onto the space of free-divergence functions, using the operator 𝒫. Because of that we do not consider the advection correction ${\mathbit{w}}^{*}-\mathbit{w}=-\frac{\mathrm{1}}{\mathrm{2}}\left(\mathbf{\nabla }\mathbf{\cdot }\mathbf{a}{\right)}^{T}$ of the LU formalism. Indeed, here the variance tensor has the simple form $\mathbf{a}=\frac{\mathrm{1}}{d}\text{tr}\left(\mathbf{a}\right){\mathbf{I}}_{\mathrm{d}}$. As such, the advection correction is a gradient field and is hence removed by the projection onto the space of free-divergence functions.

Although relevant, the comparison of Figs. 3 and 4 – about front dynamics – remains qualitative. For a quantitative demonstration, adapted metrics should be used. Indeed, simple isotropic, homogeneous second-order statistics (e.g. KE spectra) cannot distinguish between eddies and filaments. Bi-spectra may overcome this drawback, since they express three-point statistics. This quantitative analysis would necessitate studies of the metrics themselves. Therefore, these analyses will be addressed in future work.

Many other closures rely on the Kolmogorov model (Eqs. 314): in particular, the famous Smagorinsky model and its variants provide a path to developing deterministic and dissipative scale-aware subgrid models. Typically, these models result in a Laplacian dissipation which involves a heterogeneous eddy diffusivity or viscosity coefficient, νSm. Aside from their heuristic theoretical justification, these Smagorinsky-type subgrid terms are formally equivalent to the turbulent dissipation of our stochastic model: $\mathbf{\nabla }\mathbf{\cdot }\left(\frac{\mathrm{1}}{\mathrm{2}}\mathbf{a}\mathbf{\nabla }q\right)$ , where a=2νSmId and q is a transported quantity. This similarity suggests that a Smagorinsky-type model could provide a good estimate for our variance tensor and thus for the heterogeneity of the unresolved velocity.

The goal of the Smagorinsky model is to optimize the KE spectrum by targeting a specific turbulent diffusive scale adapted to the simulation resolution. A turbulent dissipation coefficient expression can be derived from the Kolmogorov model (Eqs. 3 and 14) and the closure,

$\begin{array}{}\text{(18)}& {\mathit{ϵ}}_{\mathrm{F}}={\mathit{ϵ}}_{\mathrm{D}},\end{array}$

where ϵD is the dissipation, a second-order moment related to the molecular or turbulent diffusion. To develop a Smagorinsky-type model, the resolved flux of the energy-like conserved invariant is equated to the dissipation of the energy invariant such that

$\begin{array}{}\text{(19)}& {\mathit{ϵ}}_{\mathrm{D}}={\mathit{\nu }}_{\mathrm{Sm}}‖\mathbf{\nabla }q{‖}^{\mathrm{2}}.\end{array}$

From there, one can obtain an eddy diffusivity or viscosity coefficient proportional to $‖\mathbf{\nabla }q{‖}^{h}$. For an SQG flow, the exponent is $h=\mathrm{1}/\mathrm{2}$, where q=b is the buoyancy .

In the Kolmogorov theory of homogeneous and stationary turbulence, the energy flux is a constant of the flow, and the closure (Eq. 19) is an exact result of a simple energy budget over an ensemble mean. Indeed, there is no accumulation of energy at any scales in a stationary regime. This closure is very useful since the dissipation (Eq. 19) is generally much simpler to compute than the energy flux. Nevertheless, in every flow realization, the energy flux and the diffusion vary with space and time, and they do not match each other locally. For instance, a strong straight front of an SQG flow involves a large dissipation but no energy cascade because the velocity is aligned with front (see Eq. 15). Moreover, in any bounded, limited resolution situation, the inertial cascade range is limited so the energy flux through scale varies with the wavenumber, especially outside the inertial range.

The discrepancy between energy flux and dissipation is not so much of an issue for the Smagorinsky model because its aim is the optimization of a second-order statistics at small scales. Unfortunately, this closure cannot be used to simplify the modulation computation in our parametric random model. Indeed, the stochastic dynamics relies on processes – such as folding – associated with higher-order statistics. Figure 4 (middle right) illustrates this statement. The following unresolved velocity parameterization is used there:

$\begin{array}{}\text{(20)}& \begin{array}{rl}\stackrel{\mathrm{^}}{\stackrel{\mathrm{̃}}{\mathbit{\sigma }\stackrel{\mathrm{˙}}{\mathbit{B}}}}\left(t,\mathbit{k}\right)& \phantom{\rule{0.25em}{0ex}}=\frac{\mathrm{1}}{\sqrt{\mathrm{\Delta }t}}i{\mathbit{k}}^{\perp }\sqrt{\frac{max\left(\mathrm{0},{C}_{\mathbit{w}}‖\mathbit{k}{‖}^{-{r}_{\mathbit{w}}}-{A}_{\mathbit{w}}\left(‖\mathbit{k}‖\right)\right)}{\mathrm{2}\mathit{\pi }‖\mathbit{k}{‖}^{\mathrm{3}}}}\\ & {f}_{\mathrm{BP}}\left(‖\mathbit{k}‖\right)\stackrel{\mathrm{^}}{\frac{\mathrm{d}{B}_{t}}{\sqrt{\mathrm{\Delta }t}}}\left(\mathbit{k}\right),\end{array}\text{(21)}& \mathbit{\sigma }\stackrel{\mathrm{˙}}{\mathbit{B}}\left(t,\mathbit{x}\right)=\mathcal{P}\mathit{\left\{}\underset{\begin{array}{c}\text{Heterogeneous}\\ \text{modulation}\end{array}}{\underbrace{\frac{‖\mathbf{\nabla }b{‖}^{\mathrm{1}/\mathrm{4}}\left(t,\mathbit{x}\right)}{\sqrt{\stackrel{\mathrm{‾}}{‖\mathbf{\nabla }b{‖}^{\mathrm{1}/\mathrm{2}}}\left(t\right)}}}}\underset{\begin{array}{c}\text{Homogeneous}\\ \text{velocity}\end{array}}{\underbrace{\stackrel{\mathrm{̃}}{\mathbit{\sigma }\stackrel{\mathrm{˙}}{\mathbit{B}}}\left(t,\mathbit{x}\right)}}\mathit{\right\}}.\end{array}$

Along sharp straight fronts, the dissipation will be larger. Accordingly, the associated modulation $‖\mathbf{\nabla }b{‖}^{\mathrm{1}/\mathrm{4}}$ enhances the stochastic folding where one would need it to weaken.

## 2.2 Data-driven model for the unresolved velocity

In this section, we detail a procedure proposed by for the estimation of the (weighted) EOFs, (ξk)k, involved in the unresolved velocity definition,

$\begin{array}{}\text{(22)}& \mathbit{\sigma }\stackrel{\mathrm{˙}}{\mathbit{B}}=\sum _{k}{\mathbit{\xi }}_{k}d{W}_{k}/\mathrm{d}t.\end{array}$

If we denote λk as the L2 norm of those EOFs, their normalized versions $\left({\mathbit{\xi }}_{k}={\mathbit{\xi }}_{k}/{\mathit{\lambda }}_{k}{\right)}_{k}$ are the eigenvectors of the self-adjoint operator,

$\begin{array}{}\text{(23)}& \mathbit{f}↦\underset{\mathrm{\Omega }}{\int }\left[\mathbit{\sigma }\left(\mathbit{x}\right){\mathbit{\sigma }}^{T}\left(\mathbit{y}\right)/\mathrm{d}t\right]\mathbit{f}\left(\mathbit{y}\right)\mathrm{d}\mathbit{y},\end{array}$

defined by the small-scale velocity covariance [σ(x)σT(y)∕dt]. (λk∕dt)k is the set of eigenvalues of this operator.

The data-driven methods of relies on Lagrangian paths defined at two “resolutions”. The first paragraph of this section defines these two types of Lagrangian paths. Then, we propose several ways to identify the time increments of the infinite-dimensional Brownian motion, σBt, from these Lagrangian paths. Preprocessing of the increments are needed in order to meet some structural assumptions. After this, we relate the increment covariance to the EOFs.

### 2.2.1 Preliminary definitions

We introduce two types of velocity field:

• a high-resolution velocity, v, on a fine mesh-grid (5122),

• a low-resolution velocity, $\stackrel{\mathrm{‾}}{\mathbit{v}}$, on a coarse mesh-grid (642). This velocity field is a spatially low-pass-filtered version of f.

Then, two types of Lagrangian path are defined:

• a “high-resolution flow”, Xij(t0,t), defined by the high-resolution velocity, u:

$\begin{array}{}\text{(24)}& \frac{\mathrm{d}{\mathbit{X}}_{ij}}{\mathrm{d}t}\left(t\right)=\mathbit{u}\left(t,{\mathbit{X}}_{ij}\left({t}_{\mathrm{0}},t\right)\right)\phantom{\rule{0.25em}{0ex}}\phantom{\rule{0.25em}{0ex}}\text{and}\phantom{\rule{0.25em}{0ex}}\phantom{\rule{0.25em}{0ex}}{\mathbit{X}}_{ij}\left({t}_{\mathrm{0}},{t}_{\mathrm{0}}\right)={\mathbit{x}}_{ij},and\end{array}$

with ${\mathbit{x}}_{ij}=\left(i\mathrm{\Delta }x,j\mathrm{\Delta }y\right)$.

• a “low-resolution flow”, ${\stackrel{\mathrm{‾}}{\mathbit{X}}}_{ij}\left({t}_{\mathrm{0}},t\right)$, defined by the low-resolution velocity $\stackrel{\mathrm{‾}}{\mathbit{u}}$:

$\begin{array}{}\text{(25)}& \frac{\mathrm{d}{\stackrel{\mathrm{‾}}{\mathbit{X}}}_{ij}}{\mathrm{d}t}\left({t}_{\mathrm{0}},t\right)=\stackrel{\mathrm{‾}}{\mathbit{u}}\left(t,{\stackrel{\mathrm{‾}}{\mathbit{X}}}_{ij}\left({t}_{\mathrm{0}},t\right)\right)\phantom{\rule{0.25em}{0ex}}\phantom{\rule{0.25em}{0ex}}\text{and}\phantom{\rule{0.25em}{0ex}}\phantom{\rule{0.25em}{0ex}}{\stackrel{\mathrm{‾}}{\mathbit{X}}}_{ij}\left({t}_{\mathrm{0}},{t}_{\mathrm{0}}\right)={\mathbit{x}}_{ij}.\end{array}$

### 2.2.2 Candidate for the increment realization

In order to estimate the EOFs, (ξi)i, involved in Eq. (A5), we assume that we can observe increments

$\begin{array}{}\text{(26)}& \mathbit{\sigma }\mathrm{\Delta }{\mathbit{B}}_{m\mathrm{\Delta }T}=\mathbit{\sigma }\left({\mathbit{B}}_{\left(m+\mathrm{1}\right)\mathrm{\Delta }T}-{\mathbit{B}}_{m\mathrm{\Delta }T}.\right)\end{array}$

We will interpret the following residual flow increments as a realization of the above:

$\begin{array}{}\text{(27)}& \begin{array}{rl}{\stackrel{\mathrm{̃}}{\mathrm{\Delta }\mathbit{X}}}_{ij}^{m}& \stackrel{\mathrm{△}}{=}{\mathbit{X}}_{ij}\left(m\mathrm{\Delta }T,\left(m+\mathrm{1}\right)\mathrm{\Delta }T\right)\\ & -{\stackrel{\mathrm{‾}}{\mathbit{X}}}_{ij}\left(m\mathrm{\Delta }T,\left(m+\mathrm{1}\right)\mathrm{\Delta }T\right).\end{array}\end{array}$

### 2.2.3 Preprocessing

The increments are supposed to be centred and divergence free (as $\mathbf{\nabla }\mathbf{\cdot }{\mathbit{\xi }}_{i}=\mathrm{0}$, i). Therefore, after computing the residual flow increments, ${\left({\stackrel{\mathrm{̃}}{\mathrm{\Delta }\mathbit{X}}}_{ij}^{m}\right)}_{m}$, they are centred,

$\begin{array}{}\text{(28)}& \mathrm{\Delta }{{\mathbit{X}}^{\prime }}_{ij}^{m}\stackrel{\mathrm{△}}{=}{\stackrel{\mathrm{̃}}{\mathrm{\Delta }\mathbit{X}}}_{ij}^{m}-\stackrel{\mathrm{^}}{\mathbb{E}}\mathit{\left\{}{\stackrel{\mathrm{̃}}{\mathrm{\Delta }\mathbit{X}}}_{ij}^{m}\mathit{\right\}},\end{array}$

with the estimator

$\begin{array}{}\text{(29)}& \stackrel{\mathrm{^}}{\mathbb{E}}\mathit{\left\{}{\stackrel{\mathrm{̃}}{\mathrm{\Delta }\mathbit{X}}}_{ij}^{m}\mathit{\right\}}\stackrel{\mathrm{△}}{=}\frac{\mathrm{1}}{N}\sum _{m=\mathrm{0}}^{N-\mathrm{1}}{\stackrel{\mathrm{̃}}{\mathrm{\Delta }\mathbit{X}}}_{ij}^{m},\end{array}$

and projected onto the space of divergence-free functions,

$\begin{array}{}\text{(30)}& \mathrm{\Delta }{\mathbit{X}}_{ij}^{m}\stackrel{\mathrm{△}}{=}\mathcal{P}\mathit{\left\{}\mathrm{\Delta }{{\mathbit{X}}^{\prime }}_{ij}^{m}\mathit{\right\}},\phantom{\rule{0.25em}{0ex}}\phantom{\rule{0.25em}{0ex}}\text{with}\phantom{\rule{0.25em}{0ex}}\phantom{\rule{0.25em}{0ex}}\mathcal{P}={\mathbf{I}}_{\mathrm{d}}-{\mathrm{\Delta }}^{-\mathrm{1}}\mathbf{\nabla }{\mathbf{\nabla }}^{T}.\end{array}$

### 2.2.4 Covariance, quadratic covariation, and EOF

Then, we can define the EOFs by an estimate of the spatial covariance of the residual flow increments (averaging over the time index, m).

$\begin{array}{}\text{(31)}& \begin{array}{rl}& \sum _{k}{\mathbit{\xi }}_{k}\left({\mathbit{x}}_{ij}\right){{\mathbit{\xi }}_{k}}^{T}\left({\mathbit{x}}_{pq}\right)\\ & \phantom{\rule{0.25em}{0ex}}\phantom{\rule{0.25em}{0ex}}\phantom{\rule{0.25em}{0ex}}=\mathbit{\sigma }\left({\mathbit{x}}_{ij}\right){\mathbit{\sigma }}^{T}\left({\mathbit{x}}_{pq}\right)\end{array}\text{(32)}& \phantom{\rule{0.25em}{0ex}}\phantom{\rule{0.25em}{0ex}}\phantom{\rule{0.25em}{0ex}}\approx \frac{\mathrm{1}}{\left(N-\mathrm{1}\right)\mathrm{\Delta }T}\sum _{m=\mathrm{0}}^{N-\mathrm{1}}\mathrm{\Delta }{\mathbit{X}}_{ij}^{m}{\left(\mathrm{\Delta }{\mathbit{X}}_{pq}^{m}\right)}^{T}\end{array}$

In order to properly define the EOFs, we must add the following orthogonal constraint:

$\begin{array}{}\text{(33)}& \underset{\mathrm{\Omega }}{\int }{\mathbit{\xi }}_{i}\left(\mathbit{x}{\right)}^{T}{\mathbit{\xi }}_{j}\left(\mathbit{x}\right)\mathrm{d}\mathbit{x}=\mathrm{0}\phantom{\rule{0.25em}{0ex}}\phantom{\rule{0.25em}{0ex}}\text{if}\phantom{\rule{0.25em}{0ex}}\phantom{\rule{0.25em}{0ex}}i\ne j.\end{array}$

Finally, after estimating the (ξk)k offline, the ensemble forecast can generated online with Eq. (22).

3 Numerical simulations and uncertainty quantification

## 3.1 Surface quasi-geostrophic model

### 3.1.1 Deterministic SQG

This is a simplified model to describe the ocean surface dynamics at mesoscales (i.e. horizontal length scale of the order of 100 km) . The buoyancy, b, is transported at the ocean surface by a horizontal velocity field,

$\begin{array}{}\text{(34)}& \frac{Db}{Dt}=S,\end{array}$

where $\frac{D}{Dt}$ stands for the horizontal (deterministic or stochastic) material derivative and S represents possible sources and sinks. As the potential vorticity is assumed to be zero in the fluid interior, the Fourier transform of the velocity streamfunction, $\stackrel{\mathrm{^}}{\mathit{\psi }}$, is related to the Fourier transform of the buoyancy, $\stackrel{\mathrm{^}}{b}$, by the following SQG relationship:

$\begin{array}{}\text{(35)}& \stackrel{\mathrm{^}}{\mathit{\psi }}=\frac{\mathrm{1}}{N‖\mathbit{k}‖}\stackrel{\mathrm{^}}{b},\end{array}$

where N is the stratification and k is the wave vector.

### 3.1.2 Stochastic SQG

The LU and SALT versions of the SQG dynamics are formally similar to the deterministic model. However, the buoyancy transport (Eq. 34) has to be understood in the stochastic sense (Eq. A4). Furthermore, the SQG relationship (Eq. 35) must be interpreted with

$\begin{array}{}\text{(36)}& {\mathbit{w}}^{*}={\mathbf{\nabla }}^{\perp }\mathit{\psi }={\mathbf{\nabla }}^{\perp }\left(-\mathrm{\Delta }{\right)}^{-\mathrm{1}/\mathrm{2}}\left(b/N\right),\end{array}$

in the SALT context, and $\mathbit{w}={\mathbf{\nabla }}^{\perp }\mathit{\psi }={\mathbf{\nabla }}^{\perp }\left(-\mathrm{\Delta }{\right)}^{-\mathrm{1}/\mathrm{2}}\left(b/N\right)$, in the LU one. Besides, in the LU SQG, the advecting drift w has to be divergence-free. We enforce this constraint by projecting the drift correction $\left({\mathbit{w}}^{*}-\mathbit{w}\right)$ onto the space of free-divergence functions. So, we have

$\begin{array}{}\text{(37)}& \begin{array}{rl}{\mathbit{w}}^{*}& =\mathcal{P}\left({\mathbit{w}}^{*}-\mathbit{w}\right)+\mathbit{w}=-\frac{\mathrm{1}}{\mathrm{2}}\mathcal{P}{\left(\mathbf{\nabla }\mathbf{\cdot }\mathbf{a}\right)}^{T}\\ & +{\mathbf{\nabla }}^{\perp }\left(-\mathrm{\Delta }{\right)}^{-\mathrm{1}/\mathrm{2}}\left(b/N\right),\end{array}\end{array}$

where $\mathcal{P}={\mathbf{I}}_{\mathrm{d}}-{\mathrm{\Delta }}^{-\mathrm{1}}\mathbf{\nabla }{\mathbf{\nabla }}^{T}$. Indeed, the SQG model is derived from a QG model with a transport of the buoyancy at the surface. Then, the potential vorticity (PV) is assumed to be zero inside the fluid. In the SALT framework, the PV reads $\mathrm{0}={\text{PV}}_{\mathrm{slt}}={\mathbf{\nabla }}^{\perp }\mathbf{\cdot }{\mathbit{w}}^{*}+f+{\left({f}_{\mathrm{0}}/N\right)}^{\mathrm{2}}{\partial }_{z}^{\mathrm{2}}\mathit{\psi }$, ${\mathbit{w}}^{*}={\mathbf{\nabla }}^{\perp }\mathit{\psi }$ and $b={f}_{\mathrm{0}}{\partial }_{z}\mathit{\psi }$. For the LU dynamics, $\mathrm{0}={\text{PV}}_{\mathrm{lu}}={\mathbf{\nabla }}^{\perp }\mathbf{\cdot }\mathbit{w}+f+{\left({f}_{\mathrm{0}}/N\right)}^{\mathrm{2}}{\partial }_{z}^{\mathrm{2}}\mathit{\psi }$, $\mathbit{w}={\mathbf{\nabla }}^{\perp }\mathit{\psi }$ and $b={f}_{\mathrm{0}}{\partial }_{z}\mathit{\psi }$.

Nevertheless, the slight difference between the SQG SALT and the SQG LU models are not considered in this section since we neglect the advection correction of the LU framework $\left({\mathbit{w}}^{*}-\mathbit{w}\right)$. So, we simulate the stochastic transport (Eq. 34) coupled with the SQG relation (Eq. 36). The unresolved velocity statistics encoded in σ are specified either by the self-similar method of Sect. 2.1.1 or by the data-driven method of Sect. 2.2.

### 3.1.3 Flow simulation

We perform a high-resolution simulation (5122 spatial grid) of the deterministic SQG model with the following initial condition:

$\begin{array}{}\text{(38)}& \begin{array}{rl}b\left(\mathbit{x},t=\mathrm{0}\right)& =\mathrm{0.2}{B}_{\mathrm{0}}\mathrm{cos}\left(\frac{\mathrm{2}\mathit{\pi }}{L}\left(x+y\right)\right)\\ & +F\left(\mathbit{x}-{\mathbit{x}}_{\mathrm{00}}^{\mathrm{0}}\right)+F\left(\mathbit{x}-{\mathbit{x}}_{\mathrm{10}}^{\mathrm{0}}\right)\\ & -F\left(\mathbit{x}-{\mathbit{x}}_{\mathrm{01}}^{\mathrm{0}}\right)-F\left(\mathbit{x}-{\mathbit{x}}_{\mathrm{11}}^{\mathrm{0}}\right),\end{array}\text{(39)}& \text{with}\phantom{\rule{0.25em}{0ex}}\phantom{\rule{0.25em}{0ex}}F\left(\mathbit{x}\right)\phantom{\rule{0.25em}{0ex}}\phantom{\rule{0.25em}{0ex}}={B}_{\mathrm{0}}\mathrm{exp}\left(-\frac{\mathrm{1}}{\mathrm{2}}{\left(\frac{\mathrm{15}}{L}\right)}^{\mathrm{2}}\left({x}^{\mathrm{2}}+{\left(\frac{y}{\mathrm{2}}\right)}^{\mathrm{2}}\right)\right),\text{(40)}& \text{and}\phantom{\rule{0.25em}{0ex}}\phantom{\rule{0.25em}{0ex}}{\mathbit{x}}_{ij}^{\mathrm{0}}\phantom{\rule{0.25em}{0ex}}\phantom{\rule{0.25em}{0ex}}=\frac{L}{\mathrm{4}}\left(\begin{array}{c}\mathrm{1}\\ \mathrm{1}\end{array}\right)+\frac{L}{\mathrm{2}}\left(\begin{array}{c}i\\ j\end{array}\right).\end{array}$

The source term, S, involves a hyperviscosity, a linear drag, and an additive stationary forcing such that

$\begin{array}{}\text{(41)}& S\stackrel{\mathrm{△}}{=}-{\mathit{\nu }}_{\text{HV}}{\mathrm{\Delta }}^{\mathrm{4}}b-\frac{\mathrm{1}}{{\mathit{\tau }}_{\mathrm{F}}}b+{A}_{\mathrm{F}}\mathrm{sin}\left({k}_{\mathrm{F}}^{x}x\right)\mathrm{sin}\left({k}_{\mathrm{F}}^{y}y\right).\end{array}$

The parameters of the simulations are summed up in Table 1. The influence of the initial condition remains for about a month. Then, the turbulence is maintained by the forcing as illustrated by Fig. 5.

Table 1Parameters of the simulation.

Figure 5Buoyancy (m s−2) (left), KE spectrum (m2 s${}^{-\mathrm{2}}/$(rad m−1)) (middle), and ADSDs (m2 s${}^{-\mathrm{1}}/$(rad m−1)) (right) at t=0, 30, 50 and 70 d of advection, for the deterministic SQG model at a resolution of 5122.

From t=50 d to t=100 d, the EOFs of the data-driven method are learned. From t=100 d to t=130 d, the deterministic high-resolution simulation is used as a reference. In this time interval, several stochastic low-resolution (642) simulations are performed and compared. These simulations are initialized at t=100 d with the reference high-resolution simulation projected at low resolution (i.e. keeping only the Fourier modes associated with a coarse-resolution grid).

## 3.2 Learning and analysis of the EOF

Here, we describe the convergence of the EOF estimation. The accuracy of this estimation is in particular a function of the number of snapshots used and of the Lagrangian advection time, ΔT. We first describe the convergence with the number of snapshots and then – as a remark – the convergence with the Lagrangian advection time, ΔT.

The EOFs are learned between t=50 d and t=100 d from 12 465 spatial fields of residual flow increments. Even though a large number of snapshots are used, the correlation time of the residual flow increments is about 1 d. This is not negligible compared to the estimation time window: 50 d. Accordingly, the EOFs are not fully converged. But, we expect this convergence to be sufficient for the present work. Moreover, for real applications at this spatial scale, we expect the unresolved velocity statistics to be non-stationary on temporal scales larger 50 d. Thus, learning a unresolved velocity stationary statistical representation on a larger time window might be difficult in practice. Therefore, even if our EOF estimation is not converged, it represents a realistic test case.

Remark 1. The integration time of the flows, ΔT, is a critical parameter for the definition of the EOF. Indeed, for small advection time, ΔT, the length of an increment of a Lagrangian flow path is proportional to the velocity and to the advection time. This is the so-called ballistic regime . In particular, residual flow increments squared norm would be proportional to ΔT2; the variance tensor estimator would be proportional to ΔT; and the estimated EOFs would be proportional to $\sqrt{\mathrm{\Delta }T}$. With a larger advection time, ΔT, the Lagrangian velocity decorrelates – along the flow path – from the initial Lagrangian velocity. When the advection time becomes larger than the correlation time of the Lagrangian velocity, the flow path begins to act as a Brownian motion . The length of a displacement scales as $\sqrt{\mathrm{\Delta }T}$; i.e. the Lagrangian velocity acts as a white noise in time. This is the so-called diffusive regime. Figure 6 illustrates this convergence with the average tensor ${a}_{\mathrm{0}}=\frac{\mathrm{1}}{d}\stackrel{\mathrm{‾}}{\mathit{tr}\left(\mathbf{a}\right)}$ with ΔT. In this paper, the Lagrangian advection time ΔT is computed from a CFL (Courant–Friedrichs–Lewy condition) at the coarse resolution of 642 – about 300 s. Although it corresponds to the ballistic regime (i.e. the flow increments are correlated), this choice is coherent with the work of and gives very good UQ results. Moreover, the residual flow increments – and hence the EOFs – are spatially aliased, since a large part of those increments lives at small spatial scales but are spatially sampled on a coarse spatial grid, xij. Figure 7 confirms this idea. If all the $\mathrm{2}×{\mathrm{64}}^{\mathrm{2}}-\mathrm{1}=\mathrm{8191}$ EOFs are considered, the ADSD of the unresolved velocity reveals a strong spatial aliasing.

Figure 6Estimation of the spatial average of the trace of the variance tensor divided by d=2, $\frac{\mathrm{1}}{Md}{\sum }_{q=\mathrm{1}}^{M}\mathrm{tr}\left(\mathbf{a}\left({\mathbit{x}}_{q}\right)\right)=\frac{\mathrm{1}}{Md}{\sum }_{q,i}‖{\mathbit{\xi }}_{i}\left({\mathbit{x}}_{q}\right){‖}^{\mathrm{2}}$, as a function of the Lagrangian advection time ΔT, for method 1 (described in Sect. 2.2) (blue plot) and for method 2 (the flow increments defined by the integration of the small-scale velocity along the high-resolution flow: ${\stackrel{\mathrm{̃}}{\mathrm{\Delta }\mathbit{X}}}_{ij}^{m}={\int }_{k\mathrm{\Delta }T}^{\left(k+\mathrm{1}\right)\mathrm{\Delta }T}\left(\mathbit{u}-\stackrel{\mathrm{‾}}{\mathbit{u}}\right)\left(t,{\mathbit{X}}_{ij}\left(m\mathrm{\Delta }T,t\right)\right)\mathrm{d}t$) (black plot). The red plot relies on another method not described here. The bottom plot is an enhancement of the top plot. For low ΔT values, the Lagrangian velocities remain highly correlated during the interval [0,ΔT], i.e. during the Lagrangian advection. It is the ballistic regime where the flow increments scale linearly in time: $‖{\stackrel{\mathrm{̃}}{\mathrm{\Delta }\mathbit{X}}}_{ij}^{m}‖\propto \mathrm{\Delta }T$. In that regime, methods 1 and 2 are very similar, since the high-resolution, Xij(mΔT,t), and low-resolution, ${\stackrel{\mathrm{‾}}{\mathbit{X}}}_{ij}\left(m\mathrm{\Delta }T,t\right)$, flows remain close to each other. Above ΔT=1 d, the final and initial small-scale Lagrangian velocities, $\left(\mathbit{u}-\stackrel{\mathrm{‾}}{\mathbit{u}}\right)$, are decorrelated. It is the diffusive regime where the flow increments scale as the square root of time: $‖{\stackrel{\mathrm{̃}}{\mathrm{\Delta }\mathbit{X}}}_{ij}^{m}‖\propto \sqrt{\mathrm{\Delta }T}$. This is the relevant regime for the estimation of the EOF, since we meet the fundamental assumption of our model: a Brownian behaviour for the small-scale flow. For method 1, the diffusive regime is not visible, at least in this advection time window, and the EOF estimation is theoretically not possible. In contrast, method 2 provides a converged estimator of the variance tensor for an advection time ΔT>1 d.

Both the data-driven and the non-data driven methods exhibit large unresolved velocities at the smallest scales of the coarse resolution grid (see Fig. 7). Nevertheless, the two ADSDs are distinct. In particular, the data-driven method involves some large spatial scales. One could think that these large-scale components would disappear if the flow X and $\stackrel{\mathrm{‾}}{\mathbit{X}}$ are integrated in a Eulerian way rather than in a Lagrangian way. However, with the advection time, ΔT, being very small, few differences are expected between a Eulerian and a Lagrangian advection. New numeric experiments confirm this idea (not shown). A complete study of these effects is beyond the scope of this paper.

Figure 7Buoyancy field (m s−2(a), kinetic energy spectrum (m2 s${}^{-\mathrm{2}}/$(rad m−1)) (b), and ADSD (m2 s${}^{-\mathrm{1}}/$(rad m−1)) (c) of w, at t=105 d, ADSD of $\mathbit{\sigma }\stackrel{\mathrm{˙}}{\mathbit{B}}$ from the non-data-driven method (without a multiplicative constant); ADSD of $\mathbit{\sigma }\stackrel{\mathrm{˙}}{\mathbit{B}}$ from the data-driven method, with 2, 20, 200, 2000, and 8000 EOFs, and a slope of $-\frac{\mathrm{5}}{\mathrm{3}}$ (b); and a corresponding ADSD (c) in a black solid line. The two dashed vertical lines define the interval where coefficients Cw and rw are fitted. The dashed oblique line is the resulting fit (it is set to match Aw in the left bound of the interval).

## 3.3 One realization

We now simulate the LU SQG dynamics (equivalent to SALT SQG) (Eqs. 3436), at low resolution (642), with two possible parameterizations for the unresolved velocity: either the data-driven model (see Sect. 2.2) or the parametric and self-similar model (see Sect. 2.1). For the data-driven model, we keep 200 EOFs. This choice will be explained in the next section. For all simulations, there is a unique reference initial condition at t=100 d. This initial condition is the low-resolution (642) projection of the reference deterministic high-resolution (5122) simulation (i.e. the Fourier modes associated with large wave vectors are set to zero, and the obtained spatial field is subsampled at the low resolution). Spatial fields, KE spectra and ADSD are plotted in Figs. 8 and 10. For comparison purposes, we also show the low-resolution deterministic SQG simulations and the low-resolution (642) projection of the reference deterministic high-resolution simulation. As already pointed out by for free-decaying turbulence, the realizations of the LU dynamics are no worse than a low-resolution deterministic simulation. For the short-term simulations, our stochastic subgrid parameterizations have often weak improvements on the low-resolution simulations, even though, sometimes, the stochastic subgrid parameterization can improve the simulation. Indeed, show that the LU dynamics at a resolution of 128×128 can trigger filament instabilities by random destabilization and hence obtain a more realistic proportion of eddies and filaments. This is confirmed by Figs. 3 and 4, also at a resolution of 128×128. In Fig. 8, the resolution is coarser (64×64). Therefore, the stabilizing deterministic subgrid tensor (hyper viscosity) is stronger. This may explain an inhibition of filament instabilities here and hence less difference between deterministic and stochastic coarse simulations. Nevertheless, our main goal is not improving a single simulation. Our main goal is improving the uncertainty quantification – as developed below – without deteriorating single simulations.

At t=110 d, all the spatial fields of Figs. 8 and 9 are still similar, whereas at t=120 d, the spatial fields strongly differ. Thus, the predictability timescale – i.e. the time over which initial conditions are forgotten – is between 10 and 20 d. The larger features evolve more slowly, leading to a longer predictability timescale. This effect can be seen in the similar location across simulations of larger vortices in Fig. 10, while the filaments between the vortices have lost all coherence. In Fig. 11, the larger, more-coherent features persist in the ensemble mean of the coarse-resolution simulations, while the smaller scales cancel. It is the persistent features which are the basis for robust forecasts as will become clear in the next section, and it is the improvement of these robust features that is the goal of the parameterizations, not the improvement of individual filaments in individual ensemble members.

Figure 8Buoyancy fields (m s−2) (left), kinetic energy spectra (m2 s${}^{-\mathrm{2}}/$(rad m−1)) (middle), and ADSDs (m2 s${}^{-\mathrm{1}}/$(rad m−1)) (right) of w, in blue, ADSDs of $\mathbit{\sigma }\stackrel{\mathrm{˙}}{\mathbit{B}}$, in green, at t=110 d, for (from top to bottom) the low-resolution (642) projection of the reference deterministic high-resolution (5122) SQG dynamics; the low-resolution (642) deterministic SQG dynamic; one realization of the low-resolution (642) LU SQG dynamics (equivalent to SALT SQG) with the self-similar parameterization; and one realization of the low-resolution (642) LU SQG dynamics (equivalent to SALT SQG) with data-driven parameterization.

Figure 9Buoyancy fields (m s−2(a, b), kinetic energy spectra (m2 s${}^{-\mathrm{2}}/$(rad m−1)) (c, d), and ADSDs (m2 s${}^{-\mathrm{1}}/$(rad m−1)) (e, f) of w, at t=110 d, for ensemble mean of the low-resolution (642) LU SQG dynamics (equivalent to SALT SQG) with self-similar parameterization (a, c, e) and with data-driven parameterization (b, d, f).

Figure 10Buoyancy fields (m s−2) (left); kinetic energy spectra (m2 s${}^{-\mathrm{2}}/$(rad m−1)) (middle); and ADSDs (m2 s${}^{-\mathrm{1}}/$(rad m−1)) (right) of w, in blue, and ADSDs of $\mathbit{\sigma }\stackrel{\mathrm{˙}}{\mathbit{B}}$, in green, at t=120 d, for (from top to bottom) the low-resolution (642) projection of the reference deterministic high-resolution (5122) SQG dynamics, the low-resolution (642) deterministic SQG dynamic, one realization of the low-resolution (642) LU SQG dynamics with self-similar parameterization, and one realization of the low-resolution (642) LU SQG dynamics (equivalent to SALT SQG) with data-driven parameterization.

Figure 11Buoyancy fields (m s−2(a, b), kinetic energy spectra (m2 s${}^{-\mathrm{2}}/$(rad m−1)) (c, d), and ADSDs (m2 s${}^{-\mathrm{1}}/$(rad m−1)) (e, f) of w, at t=120 d, for the ensemble mean of the low-resolution (642) LU SQG dynamics (equivalent to SALT SQG) with self-similar parameterization (a, c, e) and with data-driven parameterization (b, d, f).

## 3.4 Uncertainty quantification

We now forecast two ensembles of 200 realizations following the stochastic SQG dynamics (Eqs. 3436), with the same unique reference initial condition at t=100 d. Again, the ensembles members evolve at low resolution (642). The first ensemble is generated with the data-driven model for the unresolved velocity (see Sect. 2.2), while the second ensemble is generated with the parametric and self-similar model for the unresolved velocity (see Sect. 2.1). For the data-driven model, we keep only 200 EOFs, since the number of EOFs cannot be larger than the ensemble size without increasing the complexity of the algorithm.

With such ensemble forecasts, we aim at representing the variety of possible behaviours of the fluid dynamic system. In particular, the spreading (i.e. the variance increase) of an ensemble is expected to make an ensemble be closer to the reference. In such a case, the standard deviation – at each point and at each time – is expected to be of the order of the bias. Figure 12 shows that both the data-driven and the self-similar parameterizations achieve this goal everywhere and for every time. The pointwise biases of the data-driven method, of the self-similar method, and of a deterministic simulation (at the same low resolution) are very similar (not shown). Therefore, we only plot the pointwise bias of the data-driven method. In Fig. 12, the represented biases and error estimations (mean ±1.96 times the pointwise standard deviation1) are normalized by the squared energy of the reference solution. Note that those relative pointwise biases increase very quickly with time.

Figure 12Normalized buoyancy bias absolute value, $|\stackrel{\mathrm{^}}{\mathbb{E}}\mathit{\left\{}b\mathit{\right\}}-{b}^{\mathrm{ref}}|$, (dimensionless) of the stochastic SQG model (left) and its estimation (1.96 times the standard deviation of the ensemble) for the data-driven method (middle) and the self-similar one (right) at a resolution of 642 at (from top to bottom) t=105, 110, 115, and 120 d. The reference is the usual SQG model at a resolution of 5122, adequately filtered and subsampled. The stochastic simulations and the reference have a common initial condition at t=100 d. Points used for the local UQ analysis of Figs. 13 and 14 are highlighted in green.

After t=105 d and especially after t=115 d, a bifurcation plays a large role in the simulation error and its estimation. This bifurcation is due to the chaotic trajectory of an eddy – centred on (525, 300 km) at t=105 d. Due to the incorrect trajectory in the ensemble mean, a large yellow spot develops in the bias images. Both ensembles capture well this variability by creating similar spots. According to the doubling of those spots, there are probably only two likely trajectories for this eddy, at least until t=118 d.

Similar UQ results have been obtained on a free-turbulence flow . This close analysis has also compared the UQ potential of LU/SALT algorithms against that of a deterministic dynamics with random initial conditions. The latter has shown an underestimation of errors by one order of magnitude.

Figure 12 offers a visual validation of our methods' UQ potential in the whole spatial domain. Nevertheless, the pointwise laws of the ensembles are not Gaussian, since we consider a non-linear evolution law with multiplicative noise. Therefore, the pointwise confidence interval is not in general a symmetric interval centred on the pointwise mean with width 2×1.96 times the pointwise standard deviation. Such an interval is only a first approximation. In order to be more accurate in our analysis, we now focus on few spatial points. There, we compute – at each time – the true ensemble-based confidence intervals at 95 % and at 50 % and compare them with the reference value. Figure 13 display the results at two grid points. Again, the results are impressively good. Similar UQ proxies have been presented by with a SALT 2D Euler dynamics with Dirichlet boundary conditions. This work also highlights the excellent UQ skills of our frameworks.

Figure 13Confidence interval at 95 % (light purple) and at 50 % (dark purple) along time, computed from the low-resolution stochastic SQG 200-member ensemble with the data-driven (a, b) and with the self-similar (c, d) parameterization, at the points (500, 250 km) (a, c) and (250, 500 km) (b, d). The high-resolution reference is superimposed in red.

To conclude this quantitative UQ analysis, we study the effect of the number of realizations needed. How many is a main issue in operational weather forecast centres, since each realization is costly. Accordingly, we start again the UQ analysis but with 20 ensemble members only. Figure 14 shows that the 97.5 % and the 2.5 % quantiles get closer, compared to the 200-ensemble-member case. It means that the probability density tail estimations – i.e. the representations of extremes – are slower to converge than the more likely values. Nevertheless, the ensembles still capture well the reference dynamics of the centre of the distribution.

Figure 14Confidence interval at 95 % (light purple) and at 50 % (dark purple) along time, computed from the low-resolution stochastic SQG 20-member ensemble with the data-driven (a, b) and with the self-similar (c, d) parameterization, at the points (500, 250 km) (a, c) and (250, 500 km) (b, d). The high-resolution reference is superimposed in red.

4 Conclusion

This paper develops the SALT and LU models which coexist in a single family of stochastic schemes. In addition to their general theoretical properties, we have discussed and compared possible parameterizations for this new scheme family.

Appendix A highlights the strong theoretical similarities between SALT and LU stochastic subgrid tensors at the heart of their parameterizations. These frameworks assume that tracers are transported by the sum of a resolved large-scale velocity and an unresolved time-uncorrelated velocity. As already mentioned in the literature, these subgrid models can conserve some but not all invariants. The SALT framework imposes helicity and circulation conservation in two and three dimensions and enstrophy conservation in two dimensions, whereas LU dynamics strictly conserves kinetic energy. Yet, for a homogeneous unresolved turbulence, we have proved that – on average – LU dynamics also conserve the helicity and circulation.

This paper mainly focuses on numerical parameterization. We have formulated and described several parameterizations which apply to both SALT and LU frameworks. Parameterization for SALT or LU means choosing a spatial covariance for the unresolved small-scale velocity (i.e. choosing σ). The stationary homogeneous self-similar parameterization of has been improved to make it non-stationary and tuning-free. Spectral properties are learned “on the fly” from the resolved large-scale velocity as is common in the practice of large-eddy simulation. A heterogeneous modulation of that parameterization – based on the energy flux through scales – is also proposed. This modulation naturally comes into play when considering Kolmogorov-like energy cascades and spectra. Because the energy flux quantifies the energy cascade, which is assumed to be constant across scales, the modulated unresolved velocity acts only where there is a large-scale energy cascade. This parameterization improvement enables a better simulation of straight strong fronts in SQG dynamics. However, the usual convenient approximation produced in the Smagorinsky diagnosis of the energy flux – the dissipation – cannot be used accurately here. We also recall the stationary heterogeneous data-driven parameterization method of . Here, small Lagrangian displacement increments are computed at distinct resolutions from high-resolution simulation outputs. The spectral decomposition of the Lagrangian displacements covariance leads to a light and convenient representation of the unresolved velocity statistics: the Lagrangian displacement EOFs.

Finally, two tuning-free parameterizations have been numerically compared: the new non-stationary homogeneous self-similar parameterization for LU and the stationary heterogeneous data-driven one of for SALT. The test case is a homogeneous and stationary forced SQG turbulence whose LU and SALT versions are equivalent. Single realizations of the LU test case are found to be – at least – as good as the result of the corresponding deterministic simulation at the same resolution. For both parameterizations, the ensemble forecasts are found to predict the right amplitudes and positions of numerical errors. The uncertainty skills of the ensemble forecasts remain impressively good even with only 20 realizations.

5 Discussion

A lot of theoretical and practical questions remain about SALT and LU schemes.

The numerical explorations of this paper have focused on SQG dynamics because – except for the neglected advection correction – SALT and LU SQG models coincide. This simplification has enabled a clearer parameterization comparison. An interesting dual study would be the comparison of distinct SALT and LU models but with a fixed parameterization (i.e. fixed σ model).

For this purpose, the simplest dynamics to consider would be 2D incompressible Navier–Stokes equations. The 3D or quasi-geostrophic (QG) dynamics would be appropriate, although they possess a larger parameter space and more costly computation. The SALT version will conserve helicity and circulation, while the LU one will conserve kinetic energy. It would be a very interesting exercise to examine the inertial cascades of energy, enstrophy, and potential enstrophy in these systems and how they are affected by the SALT versus LU assumptions. Nonetheless, even an understanding of these cascades is probably not sufficient to objectively conclude which framework is more appropriate in general, as the answer will likely depend on the application, model resolution, etc. A variety of idealized and applied numerical studies will probably be necessary to gain insight into how to optimize in a variety of settings. And even for one specific situation, the quality and skill metrics of choice are not obvious. Even in the simple cases studied here, the usual second-order statistics are probably not sufficient to discriminate between these dynamics. Furthermore, it is important to note that the exact conservation of vorticity by the SALT scheme and energy by the LU scheme may be counterproductive in certain heterogeneous settings where the small-scale features are required to exhibit systematic property transport. One interesting example is the oceanic wind-driven gyre, whose circulation magnitude depends critically on the gyre and its turbulence to relocate the source of vorticity and energy from the wind to the regions where dissipation occurs. An eddy vorticity transport across large-scale streamlines (i.e. affecting the interpretation of the Kelvin circulation theorem) is needed . A cross-streamline energy transport is also part of the system equilibration . In the real ocean, the eddies shed in the Agulhas retroflection are a classic example of organized small-scale transport (Gordon1985) as are the “eddy cannons” that fire mesoscale eddies across the Antarctic Circumpolar Current .

In the presence of turbulence heterogeneity, another distinction between SALT and LU models is the advection correction. It is also the single distinction between LU dynamics and . So far, it is not clear what constitutes a “large-scale velocity” in practice. Again, this probably depends on the situation. Even though have provided some first theoretical clues, further explorations are needed. Numerical and experimental work will probably help if one identifies and studies simple heterogeneous flows meeting the main assumption of SALT and LU models: the velocity timescale gap.

The parameterizations for SALT and LU presented in this study (e.g. the ADSD method) could be naturally extended to physical-domain-based implementation, as is an important step in evaluating schemes for operational use, e.g. . Indeed, Sect. 2.1 details this tuning-free parameterization in the Fourier space only. A convenient dual of self-similar spectra (i.e. spectra scaling as $‖\mathbit{k}{‖}^{\mathit{\alpha }}$) in the physical space is the widely used Matérn covariance . Thus, the spatial filter $\stackrel{\mathrm{˘}}{\mathbit{\sigma }}$ – appearing in $\mathbit{\sigma }\mathrm{d}{\mathbit{B}}_{t}={\stackrel{\mathrm{˘}}{\mathbit{\sigma }}}^{*}\mathrm{d}{B}_{t}$ – of a possible future 2D physical-domain-based ADSD parameterization would probably be built from the curl of a Matérn covariance.

Nevertheless, the homogeneity of the ADSD parameterization is a drawback. Heterogeneous solutions need to be developed. Unfortunately, the ADSD heterogeneous modulation of this paper remains expensive to compute, even making use of the Fourier space. Moreover, even though its third-order physical meaning and its natural association with SALT and LU is very instructive, its additive value for UQ is not clear.

The heterogeneous parameterization method of is valuable in the presence of some turbulence heterogeneities (e.g. heterogeneities induced by fixed boundary conditions). However, the practical usefulness of this parameterization remains debatable, because its heterogeneity is assumed to be stationary. Other ways for calibrating the subgrid unresolved statistics using machine learning is also on-going research.

Many parameterizations for SALT and LU are now available. Regardless of their differences, the resulting stochastic dynamics and associated UQ skill seems to be relatively independent of the particular parameterization choice. Notwithstanding, we think that learning-free heterogeneous and non-stationary parameterizations may further improve SALT and LU UQ skills and/or enable smaller ensemble size and thus more efficient computation.

Finally, the main goal of all these studies is SALT- and LU-based data assimilation. have opened the way, using a particle filter and the heterogeneous parameterization method of . Observed data from a dynamical system of O(106) degrees of freedom were successfully assimilated into the parameterized system of O(104) degrees of freedom. We expect that many other data assimilation studies will follow.

Appendix A: Theoretical motivations

Here, we briefly highlight the similarities and differences between the location uncertainty model and the stochastic Lie transport model. To simplify the comparison, we work in the Stratonovich representation and restrict ourselves to the incompressible case.

## A1 Lagrangian path

From a large-scale under-resolved point of view, SALT and LU methods assume that the fluid velocity is partially uncorrelated in time. Hence, the position of a Lagrangian particle Xt evolves in time according to

$\begin{array}{}\text{(A1)}& \frac{\mathrm{d}{\mathbit{X}}_{t}}{\mathrm{d}t}={\mathbit{w}}^{*}\left({\mathbit{X}}_{t},t\right)+\left(\mathbit{\sigma }\stackrel{\mathrm{˙}}{\mathbit{B}}\right)\left({\mathbit{X}}_{t},t\right),\end{array}$

where $\mathbit{\sigma }\stackrel{\mathrm{˙}}{\mathbit{B}}$ is time-uncorrelated and w* is the Stratonovich large-scale drift term. Formally, for a spatial domain Ω⊂ℝd, the process (tBt) is a cylindrical ${{\mathbf{I}}_{\mathrm{d}}}_{{\left({\mathcal{L}}^{\mathrm{2}}\left(\mathrm{\Omega }\right)\right)}^{d}}$-Wiener process (see , and , for more information on infinite-dimensional Wiener processes and cylindrical Id-Wiener processes). Recently, have rigorously shown that such a decomposition corresponds to the limit of a deterministic flow when the correlation time of the small-scale velocity goes to zero.

## A2 Notation correspondences

The Lagrangian equation can also be written with Itō notations as follows:

$\begin{array}{}\text{(A2)}& {\mathbit{X}}_{t+\mathrm{d}t}-{\mathbit{X}}_{t}=\mathbit{w}\left({\mathbit{X}}_{t},t\right)\mathrm{d}t+\mathbit{\sigma }\left({\mathbit{X}}_{t},t\right)\left({\mathbit{B}}_{t+\mathrm{d}t}-{\mathbit{B}}_{t}\right).\end{array}$

Note the difference in the drift term when compared with Eq. (A1). Table A1 summarizes the notations differences between SALT and LU both for Itō and Stratonovich notations.

Table A1Notation equivalences between LU and SALT approaches.

## A3 Scalar transport

For a – possibly active – scalar-tracer denoted q, the SALT prescribes the same type of evolution equation as the LU models.

$\begin{array}{}\text{(A3)}& \frac{Dq}{Dt}\left(\mathbit{x},t\right)\stackrel{\mathrm{△}}{=}\frac{\mathrm{d}}{\mathrm{d}t}{\left(q\left({\mathbit{X}}_{t},t\right)\right)}_{{|}_{{\mathbit{X}}_{t}=\mathbit{x}}}=\mathrm{0},\end{array}$

where the material derivative, $\frac{Dq}{\mathrm{d}t}$, (in Stratonovich notations) is simply

$\begin{array}{}\text{(A4)}& \frac{Dq}{Dt}=\frac{\partial }{\partial t}+{\stackrel{\mathrm{˙}}{\mathbit{x}}}_{t}\mathbf{\cdot }\mathbf{\nabla },\end{array}$

with

$\begin{array}{}\text{(A5)}& {\stackrel{\mathrm{˙}}{\mathbit{x}}}_{t}\left(\mathbit{x}\right)\stackrel{\mathrm{△}}{=}{\mathbit{w}}^{*}\left(\mathbit{x},t\right)+\left(\mathbit{\sigma }\stackrel{\mathrm{˙}}{\mathbit{B}}\right)\left(\mathbit{x},t\right).\end{array}$

In Itō form, the transport Eq. (A3) makes explicit the turbulent diffusion and the centred anti-symmetric multiplicative noise .

We again highlight that this analysis holds for both SALT and LU approaches.

## A4 Euler models

Nonetheless, the incompressible stochastic transports of velocity and vorticity differ between LU and SALT. First, the SALT approach considers the transport – through a specific Lagrangian choice and up to some forcings – of the Stratonovich large-scale linear momentum, ρw*, whereas the LU Euler derivation assumes the transport of the Itō large-scale linear momentum, ρw. Furthermore, due to its geometrical approach, the SALT Euler equation involves an additional term.

Specifically, the LU Euler equation with neither viscosity nor Coriolis force reads

$\begin{array}{}\text{(A6)}& \frac{D}{Dt}\underset{\begin{array}{c}\text{Due to the}\\ \text{transport}\\ \text{of}\phantom{\rule{0.25em}{0ex}}\mathit{\rho }\mathbit{w}\end{array}}{\underbrace{\mathbit{w}}}=-\frac{\mathrm{1}}{\mathit{\rho }}\mathbf{\nabla }p,\end{array}$

whereas the SALT version is

$\begin{array}{}\text{(A7)}& \frac{D}{Dt}\underset{\begin{array}{c}\text{Due to the}\\ \text{transport}\\ \text{of}\phantom{\rule{0.25em}{0ex}}\mathit{\rho }{\mathbit{w}}^{*}\end{array}}{\underbrace{{\mathbit{w}}^{*}}}+\underset{\begin{array}{c}\text{Additional}\\ \text{term}\end{array}}{\underbrace{\mathbf{\nabla }{\stackrel{\mathrm{˙}}{\mathbit{x}}}_{t}^{T}{\mathbit{w}}^{*}}}=-\frac{\mathrm{1}}{\mathit{\rho }}\mathbf{\nabla }p.\end{array}$

Unlike in the SALT model, the large-scale transported velocity of the LU Euler, w, differs from the large-scale transporting velocity, w*. The latter implicitly appears in the stochastic transport operator, $\frac{D}{Dt}$, of both the SALT and LU equations. Thus, in the LU equations, we can see the correction ${\mathbit{w}}^{*}-\mathbit{w}=-\frac{\mathrm{1}}{\mathrm{2}}{\left(\mathbf{\nabla }\mathbf{\cdot }\mathbit{a}\right)}^{T}$ as a modification of the large-scale advection. Note that this modification cancels for a homogeneous small-scale velocity. Otherwise, whether the Itō drift,

$\begin{array}{}\text{(A8)}& \mathbit{w}\left(\mathbit{x},t\right)\stackrel{\mathrm{△}}{=}\mathbb{E}\left({\mathbit{X}}_{t+\mathrm{d}t}-{\mathbit{X}}_{t}|{\mathbit{X}}_{t}=\mathbit{x}\right)/\mathrm{d}t,\end{array}$

or the Stratonovich drift,

$\begin{array}{}\text{(A9)}& {\mathbit{w}}^{*}\left(\mathbit{x},t\right)\stackrel{\mathrm{△}}{=}\mathbb{E}\left({\mathbit{X}}_{t+\mathrm{d}t/\mathrm{2}}-{\mathbit{X}}_{t-\mathrm{d}t/\mathrm{2}}|{\mathbit{X}}_{t}=\mathbit{x}\right)/\mathrm{d}t,\end{array}$

should be (randomly) transported is still an open question. A related question is how to interpret the large-scale velocity derived from observations or from numerical simulations. Depending on the situation, this velocity may better correspond to the Itō drift or to the Stratonovich drift. may help answer this question from a theoretical perspective.

On top of this large-scale advection difference between SALT and LU modelling, the SALT additional term, $\mathbf{\nabla }{\stackrel{\mathrm{˙}}{\mathbit{x}}}_{t}^{T}{\mathbit{w}}^{*}$, has major consequences on the dynamics invariants, as explained in Sect. A6.

## A5 Other dynamical models

The differences between LU and SALT momentum equations – Eqs. (A6) and (A7) respectively – resemble the distinctions between other classical and geophysical fluid dynamic models. Table A2 recalls some of these classical models in SALT and LU stochastic formulations. The LU Euler equations can be found in without noise and in including noise. The vorticity equation in each case is easily derived by taking the curl of the Euler equation. The LU QG equations under moderate influence of turbulence has been derived by with Itō notation. In Table A2, we have written the same equation in Stratonovich notation. The SALT Euler, vorticity and quasi-geostrophic equations are derived by Holm (2015).

Both SALT and LU assumes the stochastic transport of tracers (e.g. temperature and salinity). For QG models, it implies the transport of buoyancy, b, at the surface (z=0). Therefore, up to the drift correction ${\mathbit{w}}^{*}-\mathbit{w}$ in the streamfunction definition, SALT and potential vorticity (PV) coincide. In the SQG framework, the PV is assumed to be zero in the ocean interior (z<0). This leads to the same relationship between buoyancy and streamfunction, ψ, in LU and SALT dynamics. Therefore, in the Sect. 3 of this paper, we will choose this model to compare two parameterization (i.e. choice of σ) for SALT and LU methods in a common ground.

Table A2Some LU and SALT dynamics models with Stratonovich notations.

## A6 Invariants

To interpret these results which apply more generally, mutatis mutandis the KE is conserved by the LU scheme and its variants , while the enstrophy, its generalizations (e.g. PV), the circulation, and the helicity are conserved by the SALT scheme and its variants (Holm2015). In the specific case of homogeneous small-scale velocity, the LU dynamics also conserve circulation and helicity in average (see Appendix A7 for circulation mean conservation; the proof of helicity mean conservation is similar). However, in the homogeneous case, this stochastic dynamics increases enstrophy mean and its generalizations , while SALT models increase KE mean .

These distinctions between conservation of vorticity in the SALT approach and conservation of energy in the LU approach would be of critical importance when choosing a model for 2D or QG turbulence, as and show that the conservation of energy and enstrophy lead to turbulence typified by an inverse energy cascade at large scales and a direct (potential) enstrophy cascade at small scales . In SQG, shows that a dual cascade results from conservation of depth-integrated energy and available potential energy on level boundaries. The former is not singled out for special treatment in the SALT or LU SQG formulation, but the latter is exactly conserved in both formulations.

## A7 Kelvin theorem under location uncertainty

The Kelvin theorem describes the variation of the circulation, which is defined as follows for LU dynamics as follows:

$\begin{array}{}\text{(A10)}& \mathrm{\Gamma }=\underset{C\left(t\right)}{\oint }\mathbit{w}\left(\mathbit{x},t\right)\mathbf{\cdot }d\mathbit{l}=\underset{C\left(\mathrm{0}\right)}{\oint }\mathbit{w}\left({\mathbit{X}}_{t}\left({\mathbit{l}}_{\mathrm{0}}\right),t{\right)}^{T}{\mathbf{J}}_{M}\phantom{\rule{0.25em}{0ex}}d{\mathbit{l}}_{\mathrm{0}},\end{array}$

where l=Xt(l0) is the Lagrangian path labelled by the initial position l0, C(t)=Xt(C(0)) is a material loop at time t and ${\mathbf{J}}_{M}={\mathbf{J}}_{M}\left({\mathbit{l}}_{\mathrm{0}},t\right)={\left[\mathbf{\nabla }{\mathbit{X}}_{t}^{T}\right]}^{T}\left({\mathbit{l}}_{\mathrm{0}}\right)$ is the Jacobian matrix of the flow. The time differentiation of the circulation involves the time variation of the Jacobian matrix ($\mathrm{d}{\mathbf{J}}_{M}={\left[\mathbf{\nabla }{\stackrel{\mathrm{˙}}{\mathbit{x}}}_{t}^{T}\right]}^{T}{\mathbit{J}}_{M}$) as follows:

$\begin{array}{}\text{(A11)}& \begin{array}{rl}\frac{\mathrm{d}\mathrm{\Gamma }}{\mathrm{d}t}& =\underset{C\left(\mathrm{0}\right)}{\oint }\left({\frac{D\mathbit{w}}{Dt}}^{T}{\mathbit{J}}_{M}+{\mathbit{w}}^{T}{\left[\mathbf{\nabla }{\stackrel{\mathrm{˙}}{\mathbit{x}}}_{t}^{T}\right]}^{T}{\mathbit{J}}_{M}\right)d{\mathbit{l}}_{\mathrm{0}}\\ & =\underset{C\left(t\right)}{\oint }\left(-\frac{\mathrm{1}}{\mathit{\rho }}\mathbf{\nabla }p+\left[\mathbf{\nabla }{\stackrel{\mathrm{˙}}{\mathbit{x}}}_{t}^{T}\right]\mathbit{w}\right)\mathbf{\cdot }d\mathbit{l}\\ & =\underset{C\left(t\right)}{\oint }\left(\mathbf{\nabla }{\stackrel{\mathrm{˙}}{\mathbit{x}}}_{t}^{T}\mathbit{w}\right)\mathbf{\cdot }d\mathbit{l}.\end{array}\end{array}$

This equation is the equivalent of the Reynolds transport theorem for vorticity but in the stochastic framework. This noise term is a priori not centred, since it is a Stratonovich noise. However, in the homogeneous case, its ensemble mean cancels. Indeed, using Itō notations, we simply need to compute a quadratic cross variation. This calculus is possible in Lagrangian coordinates (i.e. when functions are composed by xXt) by noticing that $\frac{D\mathbit{w}}{Dt}$ is a term in dt, $\frac{D\mathbit{\sigma }}{Dt}=\left({\stackrel{\mathrm{˙}}{\mathbit{x}}}_{t}\mathbf{\cdot }\mathbf{\nabla }\right)\mathbit{\sigma }$, and $\frac{D{\mathbit{J}}_{M}}{Dt}=\frac{\mathrm{d}{\mathbit{J}}_{M}}{\mathrm{d}t}={\left[\mathbf{\nabla }{\stackrel{\mathrm{˙}}{\mathbit{x}}}_{t}^{T}\right]}^{T}{\mathbit{J}}_{M}$ as follows:

$\begin{array}{}\text{(A12)}& \begin{array}{rl}& \frac{\mathrm{d}}{\mathrm{d}t}\mathbb{E}\mathit{\left\{}\mathrm{\Gamma }\mathit{\right\}}\\ & =\frac{\mathrm{1}}{\mathrm{d}t}\mathbb{E}\underset{C\left(\mathrm{0}\right)}{\oint }\left({\mathbit{w}}^{T}{\left[\mathbf{\nabla }{\left(\mathbit{\sigma }\circ \mathrm{d}{\mathbit{B}}_{t}\right)}^{T}\right]}^{T}{\mathbf{J}}_{M}\right)\phantom{\rule{0.25em}{0ex}}d{\mathbit{l}}_{\mathrm{0}},\\ & =\frac{\mathrm{1}}{\mathrm{2}}\mathbb{E}\underset{C\left(\mathrm{0}\right)}{\oint }\sum _{k}\frac{D}{Dt}〈{\mathbit{w}}^{T}{\left[\mathbf{\nabla }{\mathbit{\sigma }}_{•k}^{T}\right]}^{T}{\mathbit{J}}_{M}d{\mathbit{l}}_{\mathrm{0}},{\left({\mathbit{B}}_{t}\right)}_{k}〉,\\ & =\frac{\mathrm{1}}{\mathrm{2}}\mathbb{E}\underset{C\left(\mathrm{0}\right)}{\oint }\sum _{k}{\mathbit{w}}^{T}\left(\left({\mathbit{\sigma }}_{•k}\mathbf{\cdot }\mathbf{\nabla }\right)+{\left[\mathbf{\nabla }{\mathbit{\sigma }}_{•k}^{T}\right]}^{T}\right)\\ & \phantom{\rule{0.25em}{0ex}}\phantom{\rule{0.25em}{0ex}}\phantom{\rule{0.25em}{0ex}}\phantom{\rule{0.25em}{0ex}}\phantom{\rule{0.25em}{0ex}}{\left[\mathbf{\nabla }{\mathbit{\sigma }}_{•k}^{T}\right]}^{T}{\mathbit{J}}_{M}d{\mathbit{l}}_{\mathrm{0}},\\ & =\frac{\mathrm{1}}{\mathrm{2}}\mathbb{E}\underset{C\left(t\right)}{\oint }\sum _{pqk}{w}_{q}\mathbf{\nabla }\mathbf{\cdot }{\partial }_{p}\underset{\mathrm{cst}.}{\underbrace{\left({\mathit{\sigma }}_{qk}{\mathbit{\sigma }}_{•k}\right)}}d{l}_{p},\\ & =\mathrm{0}.\end{array}\end{array}$

Therefore, the mean LU circulation is conserved in the homogeneous case.

Appendix B: Effective resolution and inertial range

Let us assume the simulated evolution law is ${D}_{t}b=-\mathit{\nu }\left(-\mathrm{\Delta }{\right)}^{p}b\mathrm{d}t$. The deterministic subgrid model $-\mathit{\nu }\left(-\mathrm{\Delta }{\right)}^{p}b$ acts, in a finite time t, as a low-pass filter. In Fourier space, this filter is

$\begin{array}{}\text{(B1)}& F\left(‖\mathbit{k}‖\right)=\mathrm{exp}\left(-\mathit{\nu }t‖\mathbit{k}{‖}^{\mathrm{2}p}\right).\end{array}$

If the hyperviscosity ν is well chosen, we may expect that at the Shannon resolution $\mathit{\pi }/\mathrm{\Delta }x={\mathit{\kappa }}_{M}$, only 10 % of the energy is left by the filter, i.e.

$\begin{array}{}\text{(B2)}& F\left({\mathit{\kappa }}_{M}\right)=\mathrm{1}/\mathrm{10}.\end{array}$

A ratio smaller than 10 % may lead to an over-damped simulation. Moreover, the precise value of this ratio does not influence much our final estimate.

We may define the effective resolution as the scale κ=κm, where the deterministic subgrid model influence is negligible. There, we may expect the filter to be equal to 95 %, i.e.

$\begin{array}{}\text{(B3)}& F\left({\mathit{\kappa }}_{m}\right)=\mathrm{95}/\mathrm{100}.\end{array}$

The ratio κmκM can then be derived from Eqs. (B1), (B2), and (B3).

Data availability
Data availability.

No experimental data were collected nor used in this study. All data used were numerically generated from model implementations. The complete code can be found at the GitHub repository https://github.com/vressegu/LU_SALT_SelfSim (last access: 27 March 2020, ).

Author contributions
Author contributions.

VR and WP decided to perform this SALT versus LU numerical study. BFK obtained the funding and supervised the project. VR and BFK designed the ADSD methodology and its heterogeneous variants. VR developed the code and performed the simulations. WP helped to reproduce the data-driven parameterization of . All authors prepared the paper.

Competing interests
Competing interests.

One of the author is currently employed by a private company named SCALIAN.

Acknowledgements
Acknowledgements.

We would like to warmly thank Darryl D. Holm, Dan Crisan, Colin Cotter, Igor Shevchenko, Bertrand Chapron, and Etienne Mémin for helpful discussions and for having enabled the collaborations which have lead to this paper.

Financial support
Financial support.

This research has been supported by the National Science Foundation Division of Ocean Sciences (grant no. 1350795), the Office of Naval Research Office of Naval Research Global (grant no. N00014-17-1-2963), and the Engineering and Physical Sciences Research Council (grant no. EP/N023781/1).

Review statement
Review statement.

This paper was edited by Wansuo Duan and reviewed by two anonymous referees.

References

Bachman, S. D., Fox-Kemper, B., and Pearson, B.: A scale-aware subgrid model for quasi-geostrophic turbulence, J. Geophys. Res.-Oceans, 122, 1529–1554, 2017. a, b, c

Berner, J., Achatz, U., Batte, L., Camara, A. D. L., Crommelin, D., Christensen, H., Colangeli, M., Dolaptchiev, S., Franzke, C., Friederichs, P., Imkeller, P., Jarvinen, H., Juricke, S., Kitsios, V., Lott, F., Lucarini, V., Mahajan, S., Palmer, T. N., Penland, C., Storch, J.-S. V., Sakradzija, M., Weniger, M., Weisheimer, A., Williams, P. D., and Yano, J.-I.: Stochastic Parameterization: Towards a new view of Weather and Climate Models, B. Am. Meteorol. Soc., 98, 565–588, 2017. a

Blumen, W.: Uniform potential vorticity flow: part I. theory of wave interactions and two-dimensional turbulence, J. Atmos. Sci., 35, 774–783, 1978. a

Blumen, W.: Wave-Interactions in Quasi-Geostrophic Uniform Potential Vorticity Flow, J. Atmos. Sci., 39, 2388–2396, 1982. a

Cai, S., Mémin, E., Dérian, P., and Xu, C.: Motion estimation under location uncertainty for turbulent fluid flows, Exp. Fluids, 59, 8, https://doi.org/10.1007/s00348-017-2458-z, 2018. a

Callies, J., Flierl, G., Ferrari, R., and Fox-Kemper, B.: The role of mixed-layer instabilities in submesoscale turbulence, J. Fluid Mech., 788, 5–41, 2016. a

Chapron, B., Dérian, P., Mémin, E., and Resseguier, V.: Large-scale flows under location uncertainty: a consistent stochastic framework, Q. J. Roy. Meteor. Soc., 144, 251–260, 2018. a

Charney, J. G.: Geostrophic Turbulence, J. Atmos. Sci., 28, 1087–1095, 1971. a

Chekroun, M. D., Kondrashov, D., and Ghil, M.: Predicting stochastic systems by noise sampling, and application to the El Niño-Southern Oscillation, P. Natl. Acad. Sci. USA, 108, 11766–11771, 2011. a

Constantin, P., Majda, A., and Tabak, E.: Formation of strong fronts in the 2-D quasigeostrophic thermal active scalar, Nonlinearity, 7, 1495–1533, 1994. a, b, c, d, e

Constantin, P., Nie, Q., and Schörghofer, N.: Front formation in an active scalar equation, Phys. Rev. E, 60, 2858, https://doi.org/10.1103/PhysRevE.60.2858, 1999. a, b, c, d

Constantin, P., Lai, M., Sharma, R., Tseng, Y., and Wu, J.: New numerical results for the surface quasi-geostrophic equation, J. Sci. Comput., 50, 1–28, 2012. a, b, c, d

Cotter, C., Crisan, D., Holm, D. D., Pan, W., and Shevchenko, I.: Modelling uncertainty using circulation-preserving stochastic transport noise in a 2-layer quasi-geostrophic model, arXiv preprint arXiv:1802.05711, 2018. a, b, c, d, e, f

Cotter, C., Crisan, D., Holm, D., Pan, W., and Shevchenko, I.: Numerically Modelling Stochastic Lie Transport in Fluid Dynamics, SIAM Multiscale Modeling & Simulation, 17, 192–232, https://doi.org/10.1137/18M1167929, 2019a. a, b, c, d, e, f, g, h, i, j, k

Cotter, C., Crisan, D., Holm, D. D., Pan, W., and Shevchenko, I.: A Particle Filter for Stochastic Advection by Lie Transport (SALT): A case study for the damped and forced incompressible 2D Euler equation, arXiv preprint arXiv:1907.11884, 2019b. a, b

Cotter, C. J., Gottwald, G. A., and Holm, D. D.: Stochastic partial differential fluid equations as a diffusive limit of deterministic Lagrangian multi-time dynamics, P. Roy. Soc. A-Math. Phy., 473, 20170388, https://doi.org/10.1098/rspa.2017.0388, 2017. a, b, c, d

Crisan, D. and Lang, O.: Well-posedness for 2D Euler equations with stochastic Lie transport noise, arXiv 1907.00451, 2018. a

Crisan, D., Flandoli, F., and Holm, D. D.: Solution properties of a 3D stochastic Euler fluid equation, J. Nonlinear Sci., 29, 813–870, 2019. a

Da Prato, G. and Zabczyk, J.: Stochastic equations in infinite dimensions, Cambridge University Press, https://doi.org/10.1017/CBO9781107295513, 1992. a

Falkovich, G., Gawedzki, K., and Vergassola, M.: Particles and fields in fluid turbulence, Rev. Mod. Phys., 73, 913, https://doi.org/10.1103/RevModPhys.73.913, 2001. a, b, c

Ferrari, R. and Nikurashin, M.: Suppression of Eddy Diffusivity across Jets in the Southern Ocean, J. Phys. Oceanogr., 40, 1501–1519, 2010. a

Flandoli, F.: The interaction between noise and transport mechanisms in PDEs, Milan J. Math., 79, 543–560, 2011. a

Fox-Kemper, B.: New Frontiers in Operational Oceanography, chap.: Notions for the Motions of the Oceans, GODAE OceanView, 27–73, https://doi.org/10.17125/gov2018, 2018. a

Fox-Kemper, B. and Menemenlis, D.: Can Large Eddy Simulation Techniques Improve Mesoscale-Rich Ocean Models?, in: Ocean Modeling in an Eddying Regime, edited by: Hecht, M. and Hasumi, H., AGU Geophysical Monograph Series, 177, 319–338, 2008. a, b

Fox-Kemper, B. and Pedlosky, J.: Wind-driven barotropic gyre I: Circulation control by eddy vorticity fluxes to an enhanced removal region, J. Mar. Res., 62, 169–193, 2004. a, b

Frank, J. and Gottwald, G.: Stochastic homogenization for an energy conserving multi-scale toy model of the atmosphere, Physica D, 254, 46–56, 2013. a

Frisch, U.: Turbulence: the legacy of AN Kolmogorov, Cambridge university press, https://doi.org/10.1017/CBO978113917066670666, 1995. a, b, c

Gawedzky, K. and Kupiainen, A.: Anomalous scaling of the passive scalar, Phys. Rev. Lett., 75, 3834–3837, 1995. a

Gay-Balmaz, F. and Holm, D. D.: Stochastic Geometric Models with Non-stationary Spatial Correlations in Lagrangian Fluid Flows, J. Nonlinear Sci., 28, 873–904, https://doi.org/10.1007/s00332-017-9431-0, 2018. a

GitHub repository: https://github.com/vressegu/LU_SALT_SelfSim, last access: 27 March 2020. a

Gordon, A. L.: Indian-Atlantic transfer of thermocline water at the Agulhas retroflection, Science, 227, 1030–1033, 1985. a

Gottwald, G. and Melbourne, I.: Homogenization for deterministic maps and multiplicative noise, P. Roy. Soc. A-Math. Phy., 469, 20130201, https://doi.org/10.1098/rspa.2013.0201, 2013. a

Hallberg, R. and Gnanadesikan, A.: The role of eddies in determining the structure and response of the wind-driven Southern Hemisphere overturning: Results from the Modeling Eddies in the Southern Ocean (MESO) project, J. Phys. Oceanogr., 36, 2232–2252, 2006. a

Held, I., Pierrehumbert, R., Garner, S., and Swanson, K.: Surface quasi-geostrophic dynamics, J. Fluid Mech., 282, 1–20, 1995. a

Holm, D. D.: Variational principles for stochastic fluid dynamics, P. Roy. Soc. A-Math. Phy., 471, 20140963, https://doi.org/10.1098/rspa.2014.0963, 2015. a, b, c, d

Isern-Fontanet, J., Chapron, B., Lapeyre, G., and Klein, P.: Potential use of microwave sea surface temperatures for the estimation of ocean currents, Geophys. Res. Lett., 33, L24608, https://doi.org/10.1029/2006GL027801, 2006. a

Isern-Fontanet, J., Lapeyre, G., Klein, P., Chapron, B., and Hecht, M. W.: Three-dimensional reconstruction of oceanic mesoscale currents from surface information, J. Geophys. Res., 113, C09005, https://doi.org/10.1029/2007JC004692, 2008. a

Janjić, T., McLaughlin, D., Cohn, S. E., and Verlaan, M.: Conservation of mass and preservation of positivity with ensemble-type Kalman filter algorithms, Mon. Weather Rev., 142, 755–773, 2014. a

Keating, S., Smith, S., and Kramer, P.: Diagnosing lateral mixing in the upper ocean with virtual tracers: Spatial and temporal resolution dependence, J. Phys. Oceanogr., 41, 1512–1534, 2011. a, b

Klein, P., Isern-Fontanet, J., Lapeyre, G., Roullet, G., Danioux, E., Chapron, B., Le Gentil, S., and Sasaki, H.: Diagnosis of vertical velocities in the upper ocean from high resolution sea surface height, Geophys. Res. Lett., 36, L12603, https://doi.org/10.1029/2009GL038359, 2009. a

Klyatskin, V.: Stochastic equations through the eye of the physicist: Basic concepts, exact results and asymptotic approximations, Elsevier, 2005. a, b

Kolmogorov, A. N.: The local structure of turbulence in incompressible viscous fluid for very large Reynolds numbers, Cr Acad. Sci. URSS, 30, 301–305, 1941. a

Kraichnan, R.: Small-scale structure of a scalar field convected by turbulence, Phys. Fluids, 11, 945–963, 1968. a

Kraichnan, R.: Anomalous scaling of a randomly advected passive scalar, Phys. Rev. Lett., 72, 1016, https://doi.org/10.1103/PhysRevLett.72.1016, 1994. a

Kraichnan, R. H.: Inertial Ranges in Two-Dimensional Turbulence, Phys. Fluids, 16, 1417–1423, 1967. a

LaCasce, J. and Mahadevan, A.: Estimating subsurface horizontal and vertical velocities from sea-surface temperature, J. Mar. Res., 64, 695–721, 2006. a

Lapeyre, G.: Surface quasi-geostrophy, Fluids, 2, 7, https://doi.org/10.3390/fluids2010007, 2017. a

Lapeyre, G. and Klein, P.: Dynamics of the upper oceanic layers in terms of surface quasigeostrophy theory, J. Phys. Oceanogr., 36, 165–176, 2006a. a

Lapeyre, G. and Klein, P.: Dynamics of the upper oceanic layers in terms of surface quasigeostrophy theory, J. Phys. Oceanogr., 36, 165–176, 2006b. a

Leith, C.: Atmospheric predictability and two-dimensional turbulence, J. Atmos. Sci, 28, 145–161, 1971. a

Lilly, J. M., Sykulski, A. M., Early, J. J., and Olhede, S. C.: Fractional Brownian motion, the Matérn process, and stochastic modeling of turbulent dispersion, Nonlin. Processes Geophys., 24, 481–514, https://doi.org/10.5194/npg-24-481-2017, 2017. a

Lim, S. and Teo, L.: Generalized Whittle–Matérn random field as a model of correlated fluctuations, J. Phys. A-Math. Theor., 42, 105202, https://doi.org/10.1088/1751-8113/42/10/105202, 2009. a

Lu, F., Lin, K. K., and Chorin, A. J.: Data-based stochastic model reduction for the Kuramoto–Sivashinsky equation, Physica D, 340, 46–57, 2017. a

Majda, A. and Kramer, P.: Simplified models for turbulent diffusion: Theory, numerical modelling, and physical phenomena, Phys. Rep., 314, 237–574, 1999. a

Majda, A., Timofeyev, I., and Vanden-Eijnden, E.: Models for stochastic climate prediction, P. Natl. Acad. Sci. USA, 96, 14687–14691, 1999. a

Majda, A. J.: Statistical energy conservation principle for inhomogeneous turbulent dynamical systems, P. Natl. Acad. Sci. USA, 112, 8937–8941, 2015. a

Margolin, L. G., Rider, W. J., and Grinstein, F. F.: Modeling turbulent flow with implicit LES, J. Turbul., 7, N15, https://doi.org/10.1080/14685240500331595, 2006. a

Mémin, E.: Fluid flow dynamics under location uncertainty, Geophys. Astro. Fluid, 108, 119–146, https://doi.org/10.1080/03091929.2013.836190, 2014. a, b

Mikulevicius, R. and Rozovskii, B.: Stochastic Navier–Stokes equations for turbulent flows, SIAM J. Math. Anal., 35, 1250–1310, 2004a. a

Mikulevicius, R. and Rozovskii, B.: Stochastic Navier–Stokes equations for turbulent flows, SIAM J. Math. Anal., 35, 1250–1310, 2004b. a

Mitchell, L. and Gottwald, G.: Data assimilation in slow fast systems using homogenized climate models, J. Atmos. Sci., 69, 1359–1377, 2012. a

Orszag, S.: Analytical theories of turbulence, J. Fluid Mech., 41, 363–386, 1970. a

Pearson, B. and Fox-Kemper, B.: Log-Normal Turbulence Dissipation in Global Ocean Models, Phys. Rev. Lett., 120, 094501, https://doi.org/10.1103/PhysRevLett.120.094501, 2018. a

Pearson, B., Fox-Kemper, B., Bachman, S. D., and Bryan, F. O.: Evaluation of scale-aware subgrid mesoscale eddy models in a global eddy-rich model, Ocean Model., 115, 42–58, 2017. a

Penland, C.: A Stochastic Approach to Nonlinear Dynamics: A Review (extended version of the article – “Noise Out of Chaos and Why it Won't Go Away”), B. Am. Meteorol. Soc., 84, 925–925, 2003. a, b

Prévôt, C. and Röckner, M.: A concise course on stochastic partial differential equations, in: Lecture Notes in Mathematics, 1905, 148 pp., Springer-Verlag, Berlin Heidelberg, 2007. a

Resseguier, V.: Mixing and fluid dynamics under location uncertainty, PhD thesis, Université Rennes 1, 2017. a, b

Resseguier, V., Mémin, E., and Chapron, B.: Geophysical flows under location uncertainty, Part I: Random transport and general models, Geophys. Astro. Fluid, 111, 149–176, https://doi.org/10.1080/03091929.2017.1310210, 2017a. a, b, c, d

Resseguier, V., Mémin, E., and Chapron, B.: Geophysical flows under location uncertainty, Part II: Quasi-geostrophic models and efficient ensemble spreading, Geophys. Astro. Fluid, 111, 177–208, https://doi.org/10.1080/03091929.2017.1312101, 2017b. a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, q

Resseguier, V., Mémin, E., and Chapron, B.: Geophysical flows under location uncertainty, Part III: SQG and frontal dynamics under strong turbulence, Geophys. Astro. Fluid, 11, 209–227, https://doi.org/10.1080/03091929.2017.1312102, 2017c. a

San, O., Staples, A. E., and Iliescu, T.: Approximate deconvolution large eddy simulation of a stratified two-layer quasigeostrophic ocean model, Ocean Model., 63, 1–20, 2013.  a

Sapsis, T. and Majda, A.: A statistically accurate modified quasilinear Gaussian closure for uncertainty quantification in turbulent dynamical systems, Physica D, 252, 34–45, 2013. a

Sardeshmukh, P. D. and Sura, P.: Reconciling non-Gaussian climate statistics with linear dynamics, J. Climate, 22, 1193–1207, 2009. a

Sardeshmukh, P. D., Compo, G. P., and Penland, C.: Need for caution in interpreting extreme weather statistics, J. Climate, 28, 9166–9187, 2015. a

Scott, R. B. and Straub, D. N.: Small Viscosity Behavior of a Homogeneous, Quasigeostrophic, Ocean Circulation Model, J. Mar. Res., 56, 1225–1258, 1998. a

Smagorinsky, J.: General circulation experiments with the primitive equation: I. The basic experiment, Mon. Weather Rev., 91, 99–165, 1963. a, b

Tandeo, P., Autret, E., Chapron, B., Fablet, R., and Garello, R.: SST spatial anisotropic covariances from METOP-AVHRR data, Remote Sens. Environ., 141, 144–148, 2014. a

Vallis, G.: Atmospheric and oceanic fluid dynamics: fundamentals and large-scale circulation, Cambridge University Press, https://doi.org/10.1017/9781107588417, 2006. a, b, c

Voosen, P.: The Earth Machine, Science, 361, 344–347, 2018. a

Wang, J., Flierl, G. R., LaCasce, J. H., McClean, J. L., and Mahadevan, A.: Reconstructing the ocean's interior from surface data, J. Phys. Oceanogr., 43, 1611–1626, 2013. a

Williams, C. K. and Rasmussen, C. E.: Gaussian processes for machine learning, in: Adaptive Computation and Machine Learning series, The MIT Press, 272 pp., 2006. a

Here, the buoyancy is not Gaussian. However, it is reasonable to believe that mean ±1.96 times the pointwise standard deviation remains a simple and convenient approximate metric to define an acceptable bias.