Estimation of the local response to a forcing in a high dimensional system using the fluctuation-dissipation theorem

The fluctuation-dissipation theorem (FDT) has been proposed as a method of calculating the response of the earth’s atmosphere to a forcing. For this problem the high dimensionality of the relevant data sets makes truncation necessary. Here we propose a method of truncation based upon the assumption that the response to a localised forcing is spatially localised, as an alternative to the standard method of choosing a number of the leading empirical orthogonal functions. For systems where this assumption holds, the re p e to any sufficiently small non-localised forcing may be estimated using a set of truncations that are chosen algorithmically. We test our algorithm using 36 and 72 variable versions of a stochastic Lorenz 95 system of ordinary differential equations. We find that, for long integrations, the bias in the response estimated by the FDT is reduced from ∼ 75 % of the true response to ∼ 30 %.


Introduction
An important problem in atmospheric sciences is how the climate responds to a forcing. For example how does the mean local temperature respond to an increase in carbon dioxide or a change in the incident solar radiation. Explicit simulation can estimate the effects of a particular forcing; however, simulation can be prohibitively expensive if the response to a large set of possible forcings is required. One method of reducing the cost of a local estimate is to truncate the calculation to a local simulation at a high resolution forced by boundary conditions given by a low resolution global simulation. For estimating the response to a forcing applied over a short time this regional approach seems effective, (see for example the UK Met office numerical weather prediction model Davies et al., 2005); however, it remains to be seen if it can be extended to the long time climate response. The hypothesis explored in this paper is that a statistical approximation based upon the fluctuation-dissipation theorem (FDT), using only data on the unforced system, will have some skill in predicting the response to a forcing. The form of the FDT that we consider predicts only the linear component of the response, i.e. the response of a (fully non-linear) system to a sufficiently small forcing, and much in the same way that simulation is expensive in computational effort, the FDT is expensive in data. Some sort of truncation is necessary to reduce the quantity of data required. In this paper we describe a method of truncating a data set locally for the purposes of applying the FDT. This method, inspired by the technique of local simulation at high resolution, is a practical alternative to the globally distributed truncation to a particular number of empirical orthogonal functions (EOFs); see Joliffe (2002) chapter 6. The linear nature of the FDT then allows for the responses calculated with local truncation to many small localised forcings to be added together to give the global response to a forcing that is not localised.
The expected response δx(t) of a system to a forcing as a function of time t is denoted by where x(t) represents the state vector of the system without forcing applied, x f (t) represents the state vector of the system with the forcing applied and the angled brackets . . . denote the ensemble average over a large number of independent realisations. Assuming that the system is sufficiently random and that its probability density function (PDF) ρ where δf (t) represents a forcing vector at time t and T denotes the vector transpose; see for example Marconi et al. (2008) for a recent review. Throughout this paper we make the assumption that the mean of x has been subtracted and therefore x(t) = 0. For practical evaluation of (1) knowledge of ρ (x) is required. If it is assumed that ρ (x) is well approximated by a Gaussian, (1) reduces to the Gaussian FDT (sometimes called the quasi-Gaussian FDT) where C(τ ) is the lag τ covariance matrix. Gritsun and Branstator (2007) amongst others have applied (2) to atmospheric general circulation model simulations with promising results. The Gaussian FDT requires the inversion of a covariance matrix which can be difficult when considering a large state vector x. In order to evaluate (2) Gritsun and Branstator reduced the size of their state vector by first selecting particular variables and truncating their system from 18 352 components to 1800 EOFs. Majda et al. (2010) suggests some strategies for truncation in EOF space, however, a general quantitative theory for such truncation, e.g. predicting any biases introduced, has not yet been developed.
In the context of global climate modelling aspects of the dynamical system, such as separation of timescales associated with hydrostatic and geostrophic balance, can be exploited to facilitate truncation. For example, Ring and Plumb (2008) use the solution of an "Eliassen problem" based on geostrophic balance to truncate an idealised general circulation model by expressing all variables in terms of the zonal wind. Their truncation is then used, for example, to convert a particular forcing of global temperature into an equivalent forcing of the zonal wind, that can then be used in a truncated version of (1).
One problem with (2) is that the assumption of Gaussianity where it is not appropriate may cause unacceptably large errors in a response prediction (Cooper and Haynes, 2013). An alternative approach, appropriate for short time responses, is to use an ensemble adjoint technique (Eyink et al., 2004). Equation (1) is recast into a different form that depends upon a tangent linear model. This technique overcomes the problem of an unknown PDF, but is valid only up to a maximum lag τ , and requires the availability of the tangent linear model (and hence cannot be used with data derived from measurements). Abramov and Majda (2007) introduce a "blended response algorithm" that extends the Eyink et al. (2004) approach to longer lags by assuming that (2) is accurate for sufficiently large τ . An alternative suggested by Cooper and Haynes (2011) is to use the methods of non-parametric statistics to approximate ρ(x). Unfortunately, the non-parametric FDT they suggest is susceptible to problems due to a high dimensional state vector and some form of truncation appears helpful.
To explore possible truncation strategies a simple toy model is an appropriate first step. We consider as our test bed a stochastic version of the Lorenz 95 system (Lorenz, 1995) whose state vector x = (x 1 , x 2 , x 3 , . . . , x d ) contains d > 3 variables and is governed by where f is a constant equal to 8 in this paper. This choice of f makes the system chaotic. j = 1, 2, 3, . . . , d, δf j represents an additional small forcing, ξ represents a Gaussian white noise term with unit variance with a a constant controlling its amplitude. The state vector is periodic, so that in (3) x 0 = x d , x −1 = x d−1 and x d+1 = x 1 . The precise details of our integrations are detailed in Appendix A. (The Lorenz 95 system is sometimes referred to as the Lorenz 96 system in the literature due to a confusion regarding the dates of first publication.) This particular system is appropriate to this investigation because the dimensionality of its phase space is easy to vary and can be made large enough to allow exploration of the effectiveness of truncation, it is symmetric in the sense that each point has the same relation to its neighbours, and its solution with an appropriate choice of f , is chaotic. In addition, the system is local in the sense that each element of the state vector only directly effects its close neighbours. More distant elements of the state vector are only affected indirectly allowing for the possibility of correlation decaying with distance as in many physical systems. Lucarini and Sarno (2011) demonstrate that the Lorenz 95 system possesses many of the properties required for investigation using some form of linear response theory and Abramov and Majda (2007) tested the various forms of their blended response algorithm (that requires a tangent linear model) to predict the response of the Lorenz 95 system. They reported predictions with similar accuracy to those of this paper. We must bear in mind that the Lorenz 95 system may have a very different underlying nature to that of the true climate system, so the results reported here may or may not be applicable to that case. The response after a long time has elapsed of each element of x to a continuous unit forcing applied to a single point x k of the Lorenz 95 system (3) is plotted in Fig. 1. The response is estimated from two long integrations of (3), with δf k = 0 and 1, respectively. An initial period, longer than any correlation times, is discarded and the experiment is repeated ten times for robustness; see Appendix A for details. The final element of x in each plot is linked by (3) to the first. The response to the left of the applied force is relatively small and almost completely restricted to two points. The response to Nonlin. Processes Geophys., 20, 239-248, 2013 www.nonlin-processes-geophys.net/20/239/2013/ the right is larger in places with the amplitude reducing in a non-monotonic manner with distance. The overall impression is that the response is somewhat local to the forcing and does not strongly depend upon the dimensionality of x. In effect, the extra points in the d = 72 system do not make a significant contribution to the response and we should be able to simply ignore them when making a response estimate. The experiment repeated for d = 18 and d = 144 does not alter this conclusion. Figure 1 suggests that the response will only be well captured if the system is truncated locally rather than in a non-local Fourier or EOF space. For the purpose of applying (1), we are interested in the linear component of the response. For a forcing sufficiently small to be in the linear regime, the response to forcing more than one element of x may be estimated as the sum of the responses due to forcing the single elements separately. In the case of the Lorenz 95 system, all grid points are equivalent and we can calculate the response to any linear forcing by considering the response of a single grid point. In other words, the linear response to any combination of δf j may be estimated using Fig. 1. Therefore, we restrict ourselves to forcing δf only in the direction of x k , so that δf k = 1 and δf j = 0 for j = k.
To investigate possible truncations, we first split the state vector into three components where x k is the element to which a forcing is applied, x d is a vector of the components of x that are closely related to x k , and x i is a vector of the components of x that are in some sense approximately independent of x k during an integration. We now introduce a localised response approximation (LRA) that the PDF ρ k|(d,i) of x k given a particular value of all of the other state vector elements is equal to the PDF ρ k|d of x k given only the values of x d , This assumption is justified by the fact that changes to ρ k|(d,i) are negligible given sufficiently small variations in x i , i.e. typical random or chaotic variations in x i are not large enough to influence the value of x k . The utility of the LRA therefore depends upon the relative dimension of x d and x i . We use the LRA of (4) and the fact that and to write where and Substituting (7) into (1) and restricting forcing to the direction of x k , we obtain Note that the response in the direction of x i due to δf k is zero: The PDF ρ k,d in (8) is a function only of x k and x d . Dependence upon x i has been eliminated. If we now assume Gaussianity, we can obtain (2) for the system that involves smaller covariance matrices obtained using only x k and x d . Now suppose we are interested in the response to a forcing δf that is not restricted to δf k , we may use the fact that we are in the linear regime to compute the linear response to forcing a single element δf j for each j and sum the results.

A truncation algorithm
An algorithm that finds the appropriate truncation of phase space according to (8) may now be developed. We define Now we do not know in advance which elements of x to include in x d and therefore truncate iteratively towards numerical convergence of our estimate of the response δx(t) .
Our starting point will be the maximum truncation possible, i.e. setting x r to have only the element to which the forcing is applied, x r (t) = (x k (t)). We then obtain an estimate for δx(t) using a non-parametric FDT algorithm (A1) which we denote by δx r (t); see Appendix A for details. Next we try a truncation that in addition includes another grid point and see if the estimated response of grid point k changes. If it does change, then using two grid points gives us a more accurate estimate of ρ (x) present in (1) and approximated in (8), see Fig. 2. On the other hand if it does not change then perhaps point k and the tested point are independent. We repeat this process, checking point k with all other grid points and pick the grid point that contributed the largest change in δx r (t) to include in our truncated state vector x r . For example, if grid point k + 3 caused the estimated response to change the most, then for the next round of testing we choose Integrating ρ 2 (x 1 , x 2 ) in the x 2 direction gives the one dimensional marginal PDF ρ 1 (x 1 ) (b). In this case the plots illustrate that the quantities ∂ρ 2 /∂x 1 and ∂ρ 1 /∂x 1 , which may be used in (8)  The entire process is then repeated to find the next points to add to x r , this time looking at the largest change in the response at points k and k + 3, (estimated using the normalised root mean squared difference, RMS) where δy r,j (t) is the response estimated from the previous round of tests and d indicates the current dimensionality of x r . We continue to successively add more and more grid points to x r until the estimated response no longer changes. At this point we assume that we have a good approximation of the optimal truncation so that (8) is a good approximation of (1).
One problem with our truncation algorithm is that we are comparing a response estimated using d grid points with a response estimated using d + 1 grid points. Our estimate δx r (t) is biased by an amount depending somehow upon the dimensionality of phase space; see Eq. (21) of Cooper and Haynes (2011). We must account for this or otherwise any Nonlin. Processes Geophys., 20, 239-248, 2013 www.nonlin-processes-geophys.net/20/239/2013/ change in the estimated response may be due to bias rather than a genuine interaction of variables. We do so by adding a variable of Gaussian distributed random numbers (with approximately the same variance as that of x k ) to the truncated state vector x d . So in the example above we would at some point compare the response given the vectors where x G represents the Gaussian random numbers.

Further truncation
It is necessary to estimate a subset of the domain to use as a starting point for the above calculation. This is because it is computationally expensive to apply the FDT to every single grid point, several times, when the number of grid points is large. One choice of region in an atmospheric context could be the set D of all points where the integral of absolute positive lagged correlations is greater than some threshold c abs,min , i.e. the set Then the elements of c abs that are greater than c abs,min would be the only points considered when looking for an appropriate truncation. The zero in the lower limit indicates that we are looking for the response to a forcing, rather than all variables that have some relation to x k . For the Lorenz 95 system, c abs is plotted in Fig. 3 which shows that the (normalised) absolute value of the response lies under the (normalised) values of c abs . Note that in any suitably random system, elements of c abs cannot be zero because we take the absolute value of the correlation estimate which, in turn, is subject to statistical uncertainty. Therefore, if we are able to estimate the contribution of this uncertainty, for example from two independent integrations, we can pick an appropriate value for our threshold c abs,min . For the Lorenz 95 integrations considered here the normalised value of c abs,min ∼ 0.12 and from Fig. 3 we would be able to eliminate about one third of the points from the 72-element system and none of the points from the 36-element system. Equation (10) specifies a rather conservative estimate of the region to consider and as such may not solve the problem given limited computational resources. For this investigation, rather than c abs , a more useful truncation region is based upon c 0 which is given by with a threshold c 0,min estimated in the same way. This is in fact the absolute value of row k of the matrix ∞ 0 C(τ )dτ that appears in the expression for the Gaussian FDT (2). It is related to the spectral density via Parseval's theorem. The motivation for not simply using (2) to estimate our truncation region is that given a large state vector and limited data, which is typical in a climate context, it can be difficult or even impossible to accurately estimate C(0) −1 . In any case where (11) is zero, (2) gives zero response anyway. Figure 3 shows that in our case the region estimated by (11) is a much tighter fit to the response region and the number of points to consider is reduced considerably. However, to find an appropriate threshold value c 0,min , accurate evaluation of (11) is necessary. This may not always be possible and is not such a problem for (10). Another problem with (11) is that it is only a good approximation for systems with close to Gaussian statistics. If a system is extremely non-Gaussian, (11) may be a poor approximation and we must resort to using (10) instead.

Application of the algorithm
By invoking the LRA (4), we now have an algorithm that we can actually apply to high dimensional data sets provided the response to a localised forcing is sufficiently local. Using Fig. 3c and d, we consider only the region between points k− 7 to k + 16 inclusive and apply the FDT for each truncation. The RMS difference between the true response at all grid points, (found by integrating (3)), and the response predicted by our estimation of (8) and (9) is plotted as a function of the number of elements included in x r in Fig. 4 (black circles). Figure 4 demonstrates that for the Lorenz 95 system, predictions of the response using our estimator of (8) have some skill. This figure illustrates that for a truncation to 10 grid points of the 36-element system, with the noise term a = 0, the RMS error squared (i.e. the mean squared difference), is made up by variance due to a finite run length ∼ 5 %, missing responses due to truncation ∼ 29 % and bias in the FDT algorithm ∼ 66 %. For the 36-and 72-element systems, the results have similarities which indicates that for the Lorenz 95 system we may be close to the point where the predictive skill of our algorithm is independent of the dimensionality of the full state vector x. Similar results (not shown here) were obtained with an 18-element system.
An interesting point to make about Fig. 4 is that there appears to be an optimal truncation at around d = 9. Unfortunately, we did not predict the precise location of this optimum in advance. Initially the skill increases nonmonotonically as more grid points are added before reaching the optimum truncation. The RMS error then increases slowly up to its value when all grid points are included (without any truncation). We can hypothesise as to the reasons behind this behaviour. Firstly, we have no reason to believe that high skill at predicting the response of the first point on its own would translate to other dynamical systems in some general case. Secondly, the reduction in RMS error with more than four grid points could be due to the FDT calculation including a better approximation of the underlying PDF in the manner described in Fig. 2. Thirdly, the increase in RMS error beyond the optimal truncation is at least partly due to the bias present in our estimator of (8). An intuitive explanation of the cause of this bias could be simply that complex shapes present in a PDF become harder and harder to approximate accurately as their dimensionality increases. In our case we are approximating the underlying PDF by a convolution of the data (from integrations) with a kernel function; see Ap-pendix A. As the dimensionality d of phase space increases, the integral over all x r of the kernel function must remain constant. Fixing an acceptable level of variance in the predictions of the FDT and increasing the dimensionality has the effect of requiring a broader kernel to approximate the PDF with. Eventually, the kernel is so broad that one may as well approximate the convolution of the data and the kernel with a single Gaussian blob with non-isotropic covariance. If this is true for our system then the response predicted by the Gaussian form of the FDT (2) and Eq. (1) using our estimator, should be approximately the same; see Fig. 5.
The fact that our estimator approximates a smoothing convolution of the PDF leads us to suspect that it has skill when applied to stochastic systems where the white noise term leads to a genuinely smooth underlying PDF. Although it may appear slight, the FDT (A1) has more skill for the stochastic case where a = 2 than the deterministic case where a = 0; see Fig. 4. Given sufficient smoothing (by increasing the value of a), we eventually get to the situation where the non-Gaussian aspects of the underlying PDF are small corrections to a d dimensional Gaussian. In this case the Gaussian approximation (2) is a good one.

Conclusions
If we wish to apply the fluctuation-dissipation theorem (FDT) to predict the response to forcing in general circulation models, reanalysis data sets, and measurements of the climate system, truncation of the huge state vector present in these systems seems necessary. We have argued that truncation of the state vector to include only the locality of the forcing (according to variations in predictions of the FDT) may be appropriate to some chaotic high dimensional systems and have tested this hypothesis using a 36-and 72element Lorenz 95 system. The practical implications of this localised response approximation (LRA) are, for example, that if we consider the atmospheric response to a heating over one small part of the planet, perhaps there are regions that contribute very little to the response. If these regions exist then they may be safely ignored in a FDT calculation.
One method of truncation that has previously been applied (see for example Gritsun and Branstator, 2007) is to use only a given number of leading EOFs. Each EOF is described by a fixed global pattern and the associated principal component time series describes its evolution in time. For systems where the response to a local forcing is spatially localised, a localised truncation seems preferable. To illustrate this we note that the responses of the 36-and 72-element Lorenz 95 systems to a good level of accuracy are the same, i.e. the extra points in the 72-element system can be ignored. Truncation in EOF space does not take advantage of this fact but localised truncations do. In addition, if the response to a local forcing is sufficiently localised, truncation to these points allows us to apply the FDT to data sets of any dimensionality.  (8) and (9). The grey circles indicate the RMS difference of only the elements to which the calculation is applied using (8). The solid line at the bottom of each plot indicates the minimum possible RMS difference of the grey circles with this data set. I.e. the difference between the responses estimated using two independent sets of data found by integrating (3). The straight horizontal line is the RMS difference between integrating (3) and the response estimated using the Gaussian form of the FDT (2) for the 36 element system, see figure (5) for a plot of this response. The dashed lines, indicating uncertainty, represent two standard deviations divided by the square root of the number of independent integrations. An estimate with zero skill, (given by (9) with the entire state vector being independent of the forcing), has a RMS difference of one using this normalisation. It is possible to do worse than this by overestimating the response by a factor of greater than 2.
The FDT applied here gives only the linear component of the response of the non-linear system considered. This is a good approximation for the full response if the forcing is sufficiently small. Being linear, the sum of the responses estimated separately for two different forcings is identical to the single response estimated for the sum of these two forcings. We may therefore estimate the entire set of possible linear responses by appropriate summing of the responses to forcing each grid point individually. Similarly, quantities such as the response in the mean, or any other combination of several variables, may be estimated from the appropriate combination of responses to separate local forcings. Forcings sufficiently large to make the response strongly non-linear are in general not susceptible to any simple linear decomposition into localised forcings. In addition (1) only applies to the linear component of any response calculation and large forcings  (8) and (9). The grey circles indicate the RMS difference of only the elements to which the calculation is applied using (8). The solid line at the bottom of each plot indicates the minimum possible RMS difference of the grey circles with this data set, i.e. the difference between the responses estimated using two independent sets of data found by integrating (3). The straight horizontal line is the RMS difference between integrating (3) and the response estimated using the Gaussian form of the FDT (2) for the 36-element system; see Fig. (5) for a plot of this response. The dashed lines, indicating uncertainty, represent two standard deviations divided by the square root of the number of independent integrations. An estimate with zero skill, (given by (9) with the entire state vector being independent of the forcing), has a RMS difference of one using this normalisation. It is possible to do worse than this by overestimating the response by a factor of greater than 2.
The FDT applied here gives only the linear component of the response of the non-linear system considered. This is a good approximation for the full response if the forcing is sufficiently small. Being linear, the sum of the responses estimated separately for two different forcings is identical to the single response estimated for the sum of these two forcings. We may therefore estimate the entire set of possible linear responses by appropriate summing of the responses to forcing each grid point individually. Similarly, quantities such as the response in the mean, or any other combination of several variables, may be estimated from the appropriate combination of responses to separate local forcings. Forcings sufficiently large to make the response strongly non-linear are in general not susceptible to any simple linear decomposition into localised forcings. In addition (1) only applies to the linear component of any response calculation and large forcings require a generalisation (e.g. Boffetta et al., 2003). In some situations, even with simple systems similar to that considered here, the response may be entirely non-linear (Lacorata and Vulpiani, 2007).
The non-parametric FDT algorithm requires that we use an approximation of a PDF using a convolution of the data and a kernel function. The kernel function fills the gaps between data points. Given more data these gaps are smaller and the kernel function can also be narrower leading to a better PDF approximation. However, increasing dimensionality necessarily increases the width of the kernel function relative to the distance between data points leading to a worse PDF approximation. In a high dimensional space, the kernel functions significantly overlap and although we have not shown it in any analytical way, the non-parametric FDT seems to approach the Gaussian FDT (Fig. 5). If the response to a local forcing is spread over a large number of grid points (and an accurate transformation to a smaller number of grid points is not available), then the dimensionality of phase space in which the non-parametric FDT must approximate a PDF is also large. If the PDF has a complex shape then its accurate approximation is extremely difficult and is likely to require more data than is possible to obtain. Deterministic systems may even have a fractal PDF that makes use of (1) difficult to justify. Other expressions such as the linear response relation of Ruelle (1998) may have to be considered.
In addition to using truncation, there are several improvements that could be made to the non-parametric FDT algorithm. It may be possible to obtain insight into its bias by considering different kernels with the same data set, or to reduce the bias via adaptive density estimation, see Silverman (1986) and Cooper and Haynes (2011). If restrictions such as conservation laws can be placed upon a system's PDF, perhaps by analytical consideration of the underlying equations, it may also be easier to approximate. Another approach may be to adopt the tangent linear algorithm of Eyink et al. (2004) with the "blended response" modification of Abramov and Majda (2007) which replaces estimation of the PDF by estimation (or calculation if the underlying equations are available) of a tangent linear system.

Experiment specification
We test the FDT (1) using the stochastic Lorenz 95 system introduced above where we choose the constant f = 8. It is approximated by applying a fourth-order Runge-Kutta method (Press et al., 2002) to the deterministic part of (3) and the method of Rümelin (Rümelin, 1982) to the stochastic part using the Mersenne Twister algorithm (Matsumoto and Nishimura, 1998) for random numbers. We choose state vectors of size d = 36 and d = 72 and starting with random initial conditions we use a time step of t = 10 −3 and after discarding 10 5 time steps iterate from t = 0 to t = 10 6 . This may be compared with ∼ 1.74, the value of the largest Lyapunov exponent for the deterministic system (Abramov and Majda, 2007). We perform 10 identically specified integrations with different initial conditions for each experiment in order to get a measure of the statistical uncertainty. Equation (8) is only valid in the linear regime, i.e. for a sufficiently small forcing. In order to discover the range over which the linear approximation is valid, we perform several integrations of (3) with −8 ≤ δf k ≤ 8 and δf j = 0 for j = k and k = 18. The average value of elements 18 and 19 of the state vector of the d = 36, a = 0 integrations averaged over 10 10 time steps is plotted in Fig. A1. For a forcing of δf 18 = ±1 the non-linear component of the response is small in comparison with the size of the total response. A similar range and accuracy of linearity is observed for all elements of the state vector, and if the integrations are carried out with d = 72.
Taking care that the relevant integrals have sufficiently converged, (10 is used to approximate infinity in the upper limit of the integral), the Gaussian FDT (2)  the standard unbiased estimator for the covariance C(τ ). Following Cooper and Haynes (2011), we approximate (8) by the non-parametric method where δx(h) is the estimated response as a function of the free parameter h and X is the appropriate segment of the state vector recorded at a particular time. Here a unit of the indices i and j correspond to a time between recordings of X of 10 and a unit of the index s corresponds to a time between recordings of 10 −3 . So for example if i = 5, j = 7 and s = 137 then the values of t that X i , X j and X j +s correspond to are 50, 70 and 70.137, respectively. To evaluate (A1) we use r = 10 4 (representing 10 time units to approximate infinity as the upper limit of the integral in (8)), m = 10 5 , n = 10 5 , µ 0 = 0.5 t and µ s = t for s > 0 with our choice being motivated by a trade off between accuracy and available computational power; see Cooper and Haynes (2011). In Cooper and Haynes (2011), the function E was represented by N, a d dimensional Gaussian with isotropic covariance h. Here for computational efficiency we use the multidimensional Epanechnikov kernel (Silverman, 1986) without normalisation To estimate the response using (A1), a value of the free parameter h must be chosen. We choose the smallest guess of h that corresponds to the standard deviation of the estimated response of element x 18 being no larger than five percent of the estimate of its mean. Our choice of h is not well optimised because this is computationally expensive and the benefits of improving upon a simple guess appear small; see Cooper and Haynes (2011) for a discussion.