Formulation of Scale Transformation in a Stochastic Data Assimilation Framework

Understanding the errors caused by spatial scale transformation in Earth observations and simulations requires a 10 rigorous definition of scale. These errors are also an important component of representativeness errors in data assimilation. Several relevant studies have been conducted, but the theory of the scale associated with representativeness errors is still not well developed. We addressed these problems by reformulating the data assimilation framework using measure theory and stochastic calculus. First, measure theory is used to propose that the spatial scale is a Lebesgue measure with respect to the observation footprint or model unit, and the Lebesgue integration by substitution is used to describe the scale transformation. 15 Second, a scale-dependent geophysical variable is defined to consider the heterogeneities and dynamic processes. Finally, the structures of the scale-dependent errors are studied in the Bayesian framework of data assimilation based on stochastic calculus. All the results were presented on the condition that the scale is one-dimensional, and the variations in these errors depend on the differences between scales. This new formulation provides a more general framework to understand the representativeness error in a nonlinear and stochastic sense and is a promising way to address the spatial scale issue. 20


Introduction
The spatial scale in Earth observations and simulations refers to the observation footprint or model unit in which a geophysical variable is observed or modelled (scale is used be-low as an abbreviation for spatial scale).Scale is traditionally defined in terms of distance, which is not adequate both because distance is a one-dimensional quantity while scale generally refers to a two-or three-dimensional space and because the scale may change in a very complicated manner (for example, from an irregular observation footprint to a square observation footprint).Generally, the scale is not explicitly expressed in the dynamics of a geophysical variable, partially because a rigorous definition of scale is difficult to find, except for an intuitive conception (Goodchild and Proctor, 1997) and certain qualitative classifications of scale (Vereecken et al., 2007).This reflects the complexity of scale and consequently requires a more rigorous mathematical conceptualisation of scale.
Data assimilation could be an ideal tool to explore the scale transformation because it presents a unified and gener-F.Liu and X. Li: Formulation of scale transformation alised framework in Earth system modelling and observation (Talagrand, 1997).Geophysical data are typically observed by various Earth observations; thus, updating the observation data in a data assimilation system may result in scale transformations between the observation space and system state space.If observation operator is strongly non-linear and complex, the errors caused by the scale transformation are even more serious (Li, 2014).An important concept that is related to the scale transformation in data assimilation is "representativeness error", which is associated with the inconsistency in the spatial and temporal resolutions between states, observations and operators (Lorenc, 1986;Janjić and Cohn, 2006; van Leeuwen, 2014;Hodyss and Nichols, 2015), and the missing physical information that is related to a numerical operator compared to the ideal operator ( van Leeuwen, 2014), such as the discretisation of a continuum model or neglect of necessary physical processes.The representativeness error and instrument error make up the observation error of data assimilation.Under the Gaussian assumption, they are independent of each other (Lorenc, 1995; van Leeuwen, 2014).This study will not consider the instrument error when formulating the scale transformation in data assimilation.
Recently, approaches have been developed to assess the representativeness error.Janjić and Cohn (2006) studied the representativeness error by treating system state as the sum of resolved and unresolved portions.Bocquet et al. (2011) used a pair of operators, namely, restriction and prolongation, to connect the relationship between the finest regular scale and a coarse scale, and determined the representativeness error using a multi-scale data assimilation framework.van Leeuwen (2014) considered two complicated cases, i.e. conducting the observation vector in a finer resolution compared with system state vector and assimilating the retrieved variables.Their solutions were formulated using an agent in observation or state space, and a particle filter was proposed to treat the non-linear relationship between observations, states and retrieved values.Hodyss and Nichols (2015) also estimated the representativeness error by investigating the difference between the truth and the inaccurate value that is generated by forecasting model.
Although these approaches explored the structure of the representativeness error and offered various solutions, improvements are still necessary to investigate the exact expression of the errors caused by scale transformation in data assimilation.The authors believe that these approaches are optimal in linear systems but may not be suitable when observations are heterogeneous and sparse, or when operators are non-linear between states and observations, although the general equations in the non-linear case were given.Without taking heterogeneities and non-linear operators into account, the representativeness error cannot be fully understood.However, heterogeneity varies depending on the situation and is difficult to formulate in a general theoretical study.
Data assimilation studies based on stochastic processes (Apte et al., 2007;Miller, 2007) or a stochastic dynamic model (Miller et al., 1999;Eyink et al., 2004) have been proposed recently.Compared to deterministic models, stochastic data assimilation is more applicable in an integrated and time-continuous theoretical study (Bocquet et al., 2010) and creates an infinite sampling space of the system state (Apte et al., 2007).Although the theorems of calculus that are based on stochastic processes (or stochastic calculus) are different from those of ordinary calculus, these advantages suggest that stochastic data assimilation offers a more general framework to study scale transformation.
We attempt to explore the mathematic definitions of scale and scale transformation, and then formulate the errors caused by the scale transformation on stochastic data assimilation in a general theoretical study.The next section introduces the basic concepts and theorems of measure theory, stochastic calculus and data assimilation.In Sect.3, we present the definitions of scale and scale transformation.The posterior probability of system state is also reformulated by scale transformation in a stochastic data assimilation framework.In the final section, the contributions and deficiencies of this study are discussed.

Basic knowledge
The scale greatly depends on the geometric features of a certain observation footprint or model unit.The model unit is a specified subspace where a geophysical variable evolves in the model space; it could be a point, a rectangular grid, or an irregular unit such as a response unit (watershed, landscape patch, etc.).We offer a solution in which the definition of scale uses measure theory and the expression of a geophysical variable as a stochastic process uses stochastic calculus.Therefore, we first introduce several basic concepts of measure theory and stochastic calculus.

Measure theory
Let be an arbitrary non-empty space.F is a σ -algebra (or σ -field) of subsets of that satisfies the following conditions: i.
∈ F, and the empty set ∈ F ; ii.A ∈ F implies that its complementary set A set function µ of F is called a measure if it satisfies the following conditions: 1. µ (A) ∈ [0, ∞) and µ ( ) = 0; Nonlin.Processes Geophys If µ ( ) = 1, µ can be replaced by the probability measure p, and if µ is finite, p can be calculated as p (A) = µ (A) /µ ( ).The triples ( , F, µ) and ( , F, p) are the measure space and probability measure space, respectively.Let be the set of real numbers R and σ -algebra B be Borel algebra, which is generated by all closed intervals in R.Then, ∀A = [a, b] ∈ B, a Lebesgue measure on R is defined as I (A) = b − a. Intuitively, the Lebesgue measure on R coincides with the length.
An n-dimensional Lebesgue volume is defined to measure the standard volumes of the subsets in R n based on is an n-dimensional regular cell in R n .The n-dimensional Lebesgue volume is an ordinary volume, such as length (n = 1), area (n = 2) and volume (n = 3).
Next, the outer measure is defined as regular cell in R n , and A ⊆ +∞ i=1 A i .Thus, if A is any subset of R n , one can collect many sets of n-dimensional regular cells {A i } to cover A. Among them, the outer measure denotes the set, whose union has the smallest n-dimensional Lebesgue volume.
Actually the outer measure does not match the two conditions of a measure, but one can define the outer measure m n as a Lebesgue measure on measure spaces (R n , L n , m n ), where L n is the Lebesgue σ -algebra of R n .The construction of the Lebesgue σ -algebra is based on the Caratheodory condition (Bartle, 1995, definition 13.3).Fortunately, almost all of the observation footprints and model units are finite and closed; therefore, they are Lebesgue measurable.This consequently ensures that the Lebesgue measure m n is a measure and the triple (R n , L n , m n ) is a measure space.The Lebesgue measure of a Lebesgue measurable subset in R n also coincides with its volume.
The n-dimensional Lebesgue integral in (R n , L n , m n ) is f dm n , where f is a real function on R n .The Lebesgue integral can be further denoted by f dm n = f (x) dx, where x ∈ R n and x = (x 1 , . .., x n ).
In the two-dimensional case (n = 2), the Lebesgue integral is where A ∈ L 2 .Next, we consider the Lebesgue integration by substitution on R 2 .Let T (x 1 , x 2 ) = [t 1 (x 1 , x 2 ) , t 2 (x 1 , x 2 )] = y 1 , y 2 be a one-to-one mapping of a subset X onto another subset Y on R 2 .Assuming that T is continuous and has a continuous partial derivative matrix where the Jacobian determinant By doing so, any observation footprint or model unit can be regarded as a Lebesgue measurable subset in a twodimensional space R 2 .Additional details regarding measure theory can be found in the literature (for example, Billingsley, 1986;Bartle, 1995).

Stochastic calculus
We then introduce some necessary concepts and theorems of stochastic calculus without proofs; their detailed derivations can be found in the literature (Itô, 1944;Karatzas and Shreve, 1991;Shreve, 2005).
Stochastic calculus is defined for ordinary integrals with respect to stochastic processes.One of the simplest stochastic processes defined on ( , F, p) is Brownian motion W .It is characterised as follows: The last two conditions represent that Stochastic calculus based on Brownian motion produces an Ito process.The differential form of the time-dependent Ito process is where ϕ (t) , σ (t) and W (t) are the drift rate, volatility rate and Brownian motion, respectively.The integral form of Eq. ( 1) is www.nonlin-processes-geophys.net/24/279/2017/ Nonlin.Processes Geophys., 24, 279-291, 2017 F. Liu and X. Li: Formulation of scale transformation Theorem 1: For any Ito process defined as in Eq. ( 1), the quadratic variation that is accumulated on the interval [0, t] is and the drift of Eq. ( 1) is As distinguishing features of stochastic calculus, the quadratic variation and drift can be regarded as stochastic versions of the variance and expectation, respectively.That is, the variance and expectation are instances of their stochastic counterparts within a certain integral path.Therefore, rather than being constants, the quadratic variation and drift are given in terms of probability.
Theorem 2 (Ito's Lemma): If the partial derivatives of function f (u, I ), viz.f u (u, I ), f I (u, I ) and f I I (u, I ), are defined and continuous.If t ≥ 0, we have (4) Ito's Lemma is typically used to build the differential of a stochastic model with Ito processes.In this study, Ito's Lemma is applied to study the scale-dependent relationship between the observation and state and the errors caused by scale transformation.

Traditional formulation of data assimilation in the Bayesian theorem framework
We use the well-accepted Bayesian theory of data assimilation (Lorenc, 1995;van Leeuwen, 2015) to investigate its time-and scale-dependent errors.State and observation are first assumed to be one-dimensional.
A non-linear forecasting system can be described by where M k−1:k (•), X (t k ) and η (t k ) represent a non-linear forecasting operator that transits the state from the discrete time k − 1 to k, the state with prior probability distribution function (PDF) p (X) and the model error at time k, respectively.
If a new observation is available at time k, the observation system is given by where H k (•), Y o (t k ) and ε (t k ) represent the non-linear observation operator, true observation with prior PDF p (Y ) and observation error at time k, respectively.Previous studies (e.g.Janjić and Cohn, 2006;Bocquet et al., 2011) described the origins of the components of ε (t k ) and η (t k ), such as white noise, the discretisation error of a continuum model, the errors that are caused by missing physical processes, and the scale-dependent bias.In this study, we assume that both forecasting and observation operators are perfect models; thus, errors caused by missing physical processes are discarded.
According to Bayesian theory, the posterior PDF of the state based on the addition of a new observation into the system is where p (X|Y ) is the posterior PDF that presents the PDF value of state X given an available observation Y .p (Y |X) is a likelihood function, which is the probability that an observation is Y given a state X. p (X) and p (Y ) are the prior PDF values of the state and observation, respectively.Here, p (X) is supposed to be known and p (Y ) is a normalisation constant ( van Leeuwen, 2014).The aim of data assimilation is equivalent to finding the posterior PDF p (X|Y ).
3 Reformulation of scale transformation in data assimilation framework

Definition of scale
We define the scale based on the measure theory that was introduced in Sect. 2. The relationship between Lebesgue measure in R 2 , L 2 , m 2 and scale is first introduced by the following measures of Earth observations.Measure of a single-point observation: when the observation footprint is very small and homogeneous, we assume that its footprint approaches zero, and its measure is accordingly zero under the condition of the Lebesgue measure.
Measure along a line: the measure is a one-dimensional Lebesgue measure.
Measure of a rectangular pixel (for example, remote sensing observation): Measure of a footprint-scale observation: the footprint is any bounded closed domain A, which is not necessary to be regular rectangles, but can also be circles or ellipses.We use Lebesgue measure on R 2 , i.e.
Clearly, measures (i)-(iii) are special cases of the measure of a footprint-scale observation.
All of the above measures depend mainly on the shape and size of A. The Lebesgue measure on R 2 coincides with the area; thus, the Lebesgue integral of µ iv (A) is A dx 1 dx 2 , where the real function f ≡ 1.Now, we can generalise the above examples by defining the scale as the Lebesgue measure with respect to the observation footprint.This definition can also be extended to a certain model unit.Thus, for any subset A ∈ L 2 , the scale is s = m 2 (A) = A dx 1 dx 2 , where the real function f ≡ 1.From a geometric perspective, the measure function m 2 (•) refers to the shape of the subset, and the scale further indicates its size.
We represent the scale as s, and let s 0 = m 2 0 (A 0 ) = A 0 dx 1 dx 2 = 1 be the standard scale, where The standard scale can be regarded as a basic unit of scale.It presents a standard reference by which one can make a quantitative comparison between different scales.The standard scale is also the origin of scales that lets scales vary similarly to other physical quantities, such as time.
We can further define scale transformation.For ∀A 1 A 2 ∈ L 2 , if there are two different scales, s 1 = m 2 (A 1 ) = A 1 dx 1 dx 2 and s 2 = m 2 (A 2 ) = A 2 dy 1 dy 2 , then we can obtain s 2 = A 2 dy 1 dy 2 = A 1 |J (x 1 , x 2 )| dx 1 dx 2 based on Lebesgue integration by substitution, where the Jacobian matrix J (x 1 , x 2 ) represents the geometric transformation from A 1 to A 2 .In particular, if J (x 1 , x 2 ) = diag (ξ, ξ ) , ξ ∈ R, which also indicates that the geometric transformation is linear, then the following expression is valid based on Lebesgue integration by substitution: where s 1 and s 2 represent the change of the one-dimensional rule.
If two scales follow the one-dimensional rule, they are geometrically similar.This rule simplifies scale as a one-dimensional variable that corresponds to the scale transformations between most remote sensing images with various spatial resolutions.For example, ∀A = [x : a ≤ x k ≤ b, k = 1, 2], where A and the unit square A 0 are geometrically similar, and the scale s = µ iii (A) can be expressed by the one-dimensional rule of scale transformation: For another example, let s = A dy 1 dy 2 be the scale of a disc footprint A with radius r.The mapping function between A and A 0 is r cos (2π x 2 ) −2π rx 1 sin (2π x 2 ) r sin (2π x 2 ) 2π rx 1 cos (2π x 2 ) = 2π r 2 x 1 .Therefore, | dx 1 dx 2 = π r 2 s 0 , which is equal to its area.However, s 0 and s do not obey the one-dimensional rule because the Jacobian matrix is not diagonal.
Layer 1 in Fig. 1 shows the relationship between the Lebesgue measure and scale.The measure space C3 (C3) be the Lebesgue measures of disc observation footprints C1, C2 and C3, respectively.Then, because they are the same Lebesgue measure functions.That is, if {A i } is the set with the smallest volume that covers C1, then similar sets {A i + 2} and {A i × 3 + 2} can be used (with the origin located in the upper-left corner) to cover C3 and C2 with the smallest volumes, respectively.Here, and m 2 C3 (•) collect the desired set based on the same scheme; therefore, they are identical.Additionally, Therefore, the scale of C2 is not equal to the two other scales because the volumes of their subsets are different.However, their scales are governed by one-dimensional rules because their measures are identical and the Jacobian matrices between them are diagonal.

Stochastic variables in data assimilation
Instead of using Eqs.( 5) and ( 6), which are discrete in time, we use Ito process-formed expressions with the one-dimensional infinitesimals ds and dt to formulate a continuous-time (or continuous-scale) state and observation.
A geophysical variable can be regarded as a real function V (s, t), and it maps the space R 2 , L 2 , m 2 onto R, where s is the scale, s = m 2 (A), A ∈ L 2 , and t is the time.In ndimensional data assimilation, a geophysical variable V is www.nonlin-processes-geophys.net/24/279/2017/ Nonlin.Processes Geophys., 24, 279-291, 2017 related to an element of state vector X at a specific scale s and time t.In Fig. 1, layer 2 presents a heterogeneous geophysical variable in the entire region.If we aggregate layer 2 into layer 1 and let each pixel intensity be the value for a geophysical variable in that pixel, then the measure space is heterogeneous.A geophysical variable represents a spatial average in a specific observation footprint with a specific scale.Therefore, the geophysical variables in C1 and C3 are not equal because their observation footprints are different, and the geophysical variables in C2 and C3 are also different because the scale changes.The former introduces that the geophysical variables vary with the location, and the latter states that the geophysical variables are scale dependent.
If the statistical properties of the geophysical variable are available, we can construct an explicit stochastic equation for it.We introduce the time-dependent Ito process Eq. ( 1) to define the geophysical variable process: dV = p (t) dt + q (t) dW (t) . (9) Similarly, the geophysical variable is supposed to evolve via a stochastic process, for which the dynamic process and uncertainty are allowed to vary with scale, where ϕ (s) and σ (s) are the scale-based drift rate and volatility rate, respectively.The geophysical variable is a probabilistic process with respect to scale and thus has scaledependent errors, where the scale should shift forward or backward based on the condition that the scale follows the one-dimensional rule.Equation ( 9) can be regarded as a continuous-time version of Eq. ( 5), i.e. the estimation of the state is equal to the integral of Eq. ( 9) over a time interval.Here, p (t) indicates the physical process with respect to time, and q (t) is the error only caused by the evolution of time; thus, model error η in Eq. ( 5) contains more parts than q (t).Equation (10) implies that the value and variance of a geophysical variable may change if the scale changes.The formulation of ϕ (s) should consider the spatial heterogeneities and physical process variations among different scales, which together constitute the deterministic part of a geophysical variable.However, neither of them is well understood in a general theoretical study.Therefore, ϕ (s) is conceptualised in Eq. ( 10).Particularly, if the study region is homogeneous, then the values of a variable that are observed at the same place are identical between the large scale and fine scale, and ϕ (s) can be left out.Due to the integral over the space of Brownian motion, σ (s) is the stochastic part, meaning that scale transformation produces uncertainties.
The state in the forecasting step can be expressed by Eq. ( 9) because only time is involved.In the analysis step of data assimilation, the state does not pertain to time, and we assume that the scale has a quantifiable effect on the errors in this step; thus, both the states and observations can be defined by Eq. (10).

Expression of scale transformation in a stochastic data assimilation framework
First, we provide the following lemma.
Remark on Lemma 1: Note that in the definition of Brownian motion, the parameter starts at zero.However, the scale is realistically greater than zero, which means that it cannot be directly applied in Brownian motion.Therefore, Lemma 1 is logical because it implies that W (s) s ≥ s 0 is an equivalent expression of W * (s) s ≥ 0. Therefore, beginning with the standard scale, the Brownian motion and stochastic calculus with respect to scale can be further developed.
In the following content, we use Brownian motion with a parameter that starts at s 0 to define the scale-dependent geophysical variables; therefore, the classic expressions above are changed.According to Lemma 1, Eq. ( 3) is given by Additionally, the integral form of Eq. ( 10) is where V 0 = V (s 0 ), and the drift of Eq. ( 12) is Similarly, Eq. ( 4) becomes f (s, I (s)) = f (s 0 , I (s 0 )) + Now, we make the following assumptions.
Assumption 1: the scale transformations between the state and observation spaces of data assimilation obey the onedimensional rule as defined in Sect.3.1.
Assumption 2: in the forecasting step, the model unit equals the scale of the state space, and both of them are constant.
Assumption 3: in the analysis step, the state, observation and observation operator are scale dependent.Only one observation is added into the data assimilation system at a time.
In assumption 1, the one-dimensional rule ensures that scale changes in a sense of geometrical similarity (for example, from a larger square observation footprint to a smaller square observation footprint, or from C2 to C3 as presented in Fig. 1).Therefore, based on assumption 1, scale only varies in one-dimensional space, meaning that the corresponding scale transformation is an integral over onedimensional space.
Assumption 2 indicates that the model unit and state scale are supposed to be the same and both invariant in space and time.Thus, there is no scale transformation in the forecasting step; thus, Eq. ( 9) can adequately describe this step.
Based on assumption 3, the analysis step is related to the scale.The scale transformation is only involved in the process of mapping the state vector from state space to observation space.According to Eq. ( 10), the state and observation in the analysis step are dX = ϕ X (s) ds + σ X (s) dW (s) and where ϕ X (s), σ X (s), ϕ Y (s) and σ Y (s) represent the scaledependent drift rates and volatility rates of state X and observation Y , respectively.ϕ (s) also implies the heterogeneities and physical processes from standard scale to a specific scale, which may be hard to formulate.σ (u) can be regarded as the stochastic perturbation with respect to scale.Based on the above discussion, the integral forms of the state are For the observation, we have In Eqs. ( 15) and ( 16), the time t is omitted, and s X , s Y , X 0 and Y 0 represent the scale of the state space, scale of the observation space, state in s 0 and observation in s 0 , respectively.These formulas prove that the value of state varies with the changes of scale.The Bayesian equation of data assimilation (Eq.7) produces the posterior PDF p (X|Y ) that is associated with the likelihood function p (Y |X) and the distributions of the state and observation.In addition, under the condition that the variances exist, assumption 1 states that the scales vary in one-dimensional space, which results in (17) Equations ( 17) and ( 18) are the prior PDFs of state and observation with respect to scale in state space and observation space, respectively.These two prior PDFs are introduced into the Bayesian theorem that is reformulated by scale.Then, we calculate the posterior PDF.The scale-dependent observation operator is H (s, I ), which suggests that the observation operator and its parameters are both susceptible to the scale.If H (s, I ) is defined, its continuous partial derivatives are H s (s, I ), H I (s, I ) and H I I (s, I ).In line with Ito's Lemma, we get an estimation of observation in the observation space (the notations (u, X (u)) and (u) were omitted, H s = H s (u, X (u)), σ X = σ X (u), etc.) Assumption 1 suggests that the observation and state spaces have the same probability measure; thus, the Brownian motions in these two spaces are equivalent.Equation ( 19) can also be rewritten by replacing s X H I σ X dW (u), and then we obtain www.nonlin-processes-geophys.net/24/279/2017/ Nonlin.Processes Geophys., 24, 279-291, 2017 286 F. Liu and X. Li: Formulation of scale transformation Equation ( 20) can be regarded as an Ito process, and its drift is The last integral term in Eq. ( 21) is the difference in the first-order differential observation operator between the state scale s X and the observation scale s Y .This term illustrates that the mapping process should consider not only the observation operator but also the first-order differential term when state is mapped to the observation space.The former is typically determined from the literature, whereas the latter was derived in this study for the first time.This result prompted us to further consider the first-order differential of the observation operator when calculating the representativeness error.
The quadratic variation of Eq. ( 20) is This equation suggests that the uncertainty in the observation error includes the change in the observation operator from scale s X to s Y .Therefore, Eqs. ( 21) and ( 22) can be combined to produce Based on Eqs. ( 17), ( 18) and ( 23), p (Y |X), p (X) and p (Y ) are stochastic functions that depend on the scale; thus, the posterior PDF of the state is scale-dependent as well.
In particular, if Y is a direct observation, which means that the observation is of the same physical quantity and scale as the state, and for simplicity, assume that X is only influenced by scale-dependent Gaussian noises, viz.
dW (u) and ( 24) In Eq. ( 24), the integral s Y s X dW (u) can be regarded as the noise based on the increment of Brownian motion with respect to scale, and its expectation equals zero.
The significance of Eqs. ( 20)-( 25) is that the effect of scale on the posterior PDF can be determined quantitatively.In addition to the model error and instrument error (both were not introduced explicitly in this study because they have little influence on the error caused by scale transformation), a new type of error in data assimilation was discovered in the analysis step.The expectation of the posterior PDF may vary with the scale of the state space if Y is an indirect observation, and the variance of the drift depends on the difference between s Y and s X (based on Eq. 22).In addition, if Y is a direct observation and X is only influenced by scale-dependent Gaussian noises (Eqs.24 and 25), the expectation of the posterior PDF is the difference between Y and X, and the variance is equal to the increment of Brownian motion with respect to the scale.Additionally, if the results are not derived from assumption 1, i.e. the scale varies randomly, the posterior PDF is more complex because the Jacobian matrix in the Lebesgue integration of scale transformation is arbitrary.

Example: the stochastic radiative transfer equation (SRTE)
To explicitly show how the stochastic scale transformations impact assimilation, we introduce an illustrative example based on the scales presented in Fig. 1.Assume that in the analysis step, the state has the standard scale s 0 , whose observation footprint is the unit square A 0 .If the scale of observation space is s C1 and its observation footprint is the disc C1, then the Jacobian matrix of the transformation between the scales of the state space and observation space is not diagonal according to the statements in Sect.3.1, leading the two scales to not obey the one-dimensional rule and be against assumption 1.However, if the scales of state space and observation space are s C3 and s C2 , respectively, assumption 1 is met, and it can be determined that s X = s C3 = π 4 s 0 and s Y = s C2 = 9π 4 s 0 .Now the scales of state space and observation space obey the one-dimensional rule, and we further presume that the measure space in Fig. 1 is free of spatial heterogeneities and dynamic process variations depending on scale.Consequently, the drift rate ϕ (s) = 0.If the value of state in the standard scale is denoted as X 0 and assuming that σ (s) = 1, then the prior PDF of state is X ∼ N X 0 , π 4 s 0 − s 0 according to Eq. ( 17), where π 4 s 0 − s 0 is not a real number and is only used to indicate the variation when the scale changes.If H (s, X (s)) = X (s), the observation has the same physical quantity as the state, and according to Eq. ( 25), the likelihood function is To formulate the likelihood function in the case that the observation is different from the state, the SRTE will be employed in the following text.The SRTE is a stochastic integral-differential equation that describes the radiative transfer phenomena through a stochastically mixed immiscible media.Scientists have developed analytical or numerical methods for finding the stochastic moments of the solution, such as the ensemble averaged and the variance of the radiation intensity (Pomraning, 1998;Shabanov et al., 2000;Kassianov and Veron, 2011).
Consider the general expression of the SRTE (leaving out the scattering and emission), where I (τ ), µ and τ are the radiation intensity, coefficient of radiation direction and optical depth, respectively.
To tie into more substantial random optical properties of the transfer media, such as absorption and scattering, the optical depth τ is assumed to be stochastic.This suggests that the optical depth is a scale-dependent Ito process and can be expressed as dτ (s) = ϕ τ (s) ds + σ τ (s) dW (s) . ( This causes the radiation intensity to depend on scale. The analytical solution of Eq. ( 26) is I (τ ) = I 0 e τ/µ , where I 0 = I (τ (s 0 )).
SRTE can be considered as a concrete instance of a stochastic observation operator by defining H (s, x (s)) = I (x) = I 0 e x/µ .Therefore, its first-and second-order derivatives are H s (s, x (s)) = 0, H x (s, x (s)) = 1 µ I 0 e x/µ and H xx (s, x (s)) = 1 µ 2 I 0 e x/µ .Based on Ito's Lemma, dI (τ (s)) = dH (s, τ (s)) = H s (s, τ (s)) ds The radiation intensity is a scale-dependent Ito process.The difference between Eq. ( 28) and the general Ito process is that there is a primitive function I (τ (s)) in the integral term.Therefore, the uncertainty of the radiation intensity is more complex because it is related to both the change of scale and the primitive function.
Integrating both sides of Eq. ( 28) yields the general solution of the radiation intensity, where the constant C ∈ R. Equation ( 29) further indicates that I (τ (s)) is a scale-dependent Ito process.
Considering that the optical depth τ is the state, the radiation intensity I is the observation and I (τ (s)) is the observation operator, the results in Sect.3.3 could easily be applied here.For example, Eqs. ( 20) and ( 23 Then, the posterior PDF of the data assimilation can be determined by Eqs. ( 27), ( 29) and (31).
4 Discussion and conclusions

Discussion
Our study offered a stochastic data assimilation framework to formulate the errors that are caused by scale transformations.
The necessity of the methodology, the difference from previous works by other investigators, and the advantages and limitations of this study are discussed as follows.
The reasons that the methodology focuses on a stochastic framework are as follows.First, the stochastic data assimilation framework is essentially consistent with the concepts of scale and scale transformation; both are associated with corresponding measure spaces ( , F, µ).Therefore, it is natural to regard the state space and observation space as two different measure spaces, and each element of state (or observation) vector can be seen as a geophysical variable that maps the state (or observation) measure space onto R. Correspondingly, as the integrals of random processes with respect to random processes, stochastic calculus was ultimately adopted.Second, using stochastic calculus can also formulate the errors caused by scale transformations.The study proceeds with and improves the understanding of representativeness error in terms of scale.The results did not only prove the conventional point that the uncertainties of these errors mainly depend on the differences between scales but also indicated that the first-order differential of the non-linear observation operator should be incorporated in representativeness error.Third, the error caused by scale transformation was presented in a general form.The drift and quadratic variation of error were formulated by Eqs. ( 21) and ( 22), respectively, and both defined the probability distribution space of p (Y |X).Last, stochastic calculus can be extended to meet a general scale transformation and formulate the corresponding representativeness error, which was unattainable in previous work.For example, if the scale changes randomly, say, from an irregular footprint to another irregular footprint, the stochastic equation can offer a multiple integral to present this type of scale transformation, such as dW 2 (y), where W 1 (x) and W 2 (y) are two independent Brownian motions.
The significant innovation of this work is as follows.We developed a more rigorous formulation of the scale and scale transformation based on Lebesgue measure, which places the related concepts in a rigorous mathematical framework and then provides a new understanding of the errors caused by scale transformation.In addition, due to the Ito processformed state and observation, a stochastic data assimilation framework was proposed by considering the non-linear operators, heterogeneity of a geophysical variable and a general Gaussian representativeness error.The scale transformation is also non-linear if the one-dimensional rule is not applied.Additionally, Ito process-formed state and observation offer the drift rate (i.e.ϕ (s) in Eq. 10) to formulate the heterogeneity associated with scale transformation.It also permits the representativeness error to be general Gaussian in this framework.If all the integrands in Eqs. ( 13) and ( 14) are non-linear functions instead of constants, then these two equations can be integrated over the field of Brownian motion, and state and observation are the general Gaussian processes of scale.Based on these functions, the representativeness error is a general Gaussian process.
As a theoretical exploration towards scale transformation and stochastic data assimilation, there is still much room for improvement.First, we reduced the scale transformation by the one-dimensional rule, and let the variables in data assimilation evolve regularly according to assumptions 1-3; thus, only the ideal result was investigated.Therefore, an in-depth and comprehensive exploration should be conducted in the future to describe other situations in the real world.How-ever, the use of either an arbitrary scale transformation or the geophysical variable without ignoring the drift rates will obtain lengthy results.Therefore, the second improvement focuses on how to make the formulation more concise.Lastly, noting that all the results in our framework were given in terms of probability, it is necessary to implement real-world applications of these theoretical results, such as introducing some concrete dynamic models to formulate the Ito processformed geophysical variable of scale.

Conclusions
In this study, we mainly addressed two basic problems associated with scale transformation in Earth observation and simulation.First, we produced a mathematical formalism of scale and scale transformation by employing measure theory.Second, we demonstrated how scale transformation and its associated errors could be presented in a stochastic data assimilation framework.
We revealed that the scale is the Lebesgue measure with respect to the observation footprint or model unit.The scale is related to the shape and size of a footprint, and scale transformation depends on the spatial change between different footprints.We then defined the geophysical variable, which further considers the heterogeneities and physical processes.A geophysical variable consequently expresses the spatial average at a specific scale.
We formulated the expression of scale transformation and investigated the error structure that is caused by scale transformation in data assimilation using basic theorems of stochastic calculus.The formulations explicate that the first-order differential of the non-linear observation operator should be considered in representativeness error, and the uncertainty of representativeness error is directly associated with the difference between scales.A concrete physical models (SRTE) was introduced to demonstrate the results when observation operator is non-linear.
This work conducted a theoretical exploration of formulating the errors caused by scale transformation in a stochastic data assimilation framework.We hope that the stochastic methodology can benefit the study of these errors.

Figure 1 .
Figure 1.Diagram of the relationships among a Lebesgue measure, scale and geophysical variable.