Nonlinear Processes in Geophysics Climate spectrum estimation in the presence of timescale errors

We introduce an algorithm (called REDFITmc2) for spectrum estimation in the presence of timescale errors. It is based on the Lomb-Scargle periodogram for unevenly spaced time series, in combination with the Welch’s Overlapped Segment Averaging procedure, bootstrap bias correction and persistence estimation. The timescale errors are modelled parametrically and included in the simulations for determining (1) the upper levels of the spectrum of the rednoise AR(1) alternative and (2) the uncertainty of the frequency of a spectral peak. Application of REDFITmc2 to ice core and stalagmite records of palaeoclimate allowed a more realistic evaluation of spectral peaks than when ignoring this source of uncertainty. The results support qualitatively the intuition that stronger effects on the spectrum estimate (decreased detectability and increased frequency uncertainty) occur for higher frequencies. The surplus information brought by algorithm REDFITmc2 is that those effects are quantified. Regarding timescale construction, not only the fixpoints, dating errors and the functional form of the age-depth model play a role. Also the joint distribution of all time points (serial correlation, stratigraphic order) determines spectrum estimation.


Introduction
A classical decomposition of a climate process is into a trend and a noise component. Spectral analysis investigates the noise component. A Fourier transform into the frequency domain makes it possible to separate short-term from long-term Correspondence to: M. Mudelsee (mudelsee@mudelsee.com) variations and to distinguish between cyclical forcing mechanisms of the climate system and broad-band resonances. Spectral analysis allows to learn about the climate physics.
The major task is to estimate the spectral density function. This poses more difficulties than, for example, linear regression (which may be used for trend estimation) because here we estimate a function and not just two regression parameters (intercept and slope). Spectral smoothing becomes therefore necessary, and this brings a tradeoff between estimation variance and frequency resolution.
In the case of evenly spaced records, the multitaper smoothing method achieves the optimal tradeoff (Thomson, 1982). The focus of this paper, however, is spectrum estimation for unevenly spaced records. Such a situation is ubiquitous in palaeoclimatology. In this case the method of choice is combining the Lomb-Scargle periodogram with Welch's Overlapped Segment Averaging (Welch, 1967;Lomb, 1976;Scargle, 1982;Schulz and Mudelsee, 2002), which yields an estimation in the time domain and avoids distortions caused by interpolation.
Bootstrap resampling enhances Lomb-Scargle methods by providing a bias correction. It supplies also a detection test for a spectral peak against realistic noise alternatives in form of an AR(1) model (which belongs to the wider class of "red noise" processes). We introduce a bootstrap algorithm, called REDFITmc2, which takes into account the effects of timescale uncertainties on detectability and frequency resolution. REDFITmc2 is a further development of the REDFIT algorithm (Schulz and Mudelsee, 2002), which assumes negligible timescale errors.
Palaeoclimate time series from two different archives serve to illustrate our concept. The first example is an ice core from Antarctica, which covers the past 800 ka and Published by Copernicus Publications on behalf of the European Geosciences Union and the American Geophysical Union. constitutes the, at present, longest glacial archive. The second is a stalagmite from southern Arabia (Oman), which covers a large portion of the Holocene (last ∼11 ka).

Method
Let X(T ) be a stationary climate process in continuous time, T . The one-sided non-normalized power spectral density function, h(f ), where f ≥0 is frequency, can be defined (Priestley, 1981, Chapter 4 therein) via a truncation of X(T ) and a Fourier transformation. h(f ) integrates to VAR[X(T )]=S 2 , the variance of X(T ). The power is a measure of the variability ("energy") within a frequency interval The spectral theory in the case of a process X(i) in discrete time, T (i), is similar (Priestley, 1981, Sect. 4.8.3 therein), except that the frequency range has an upper bound and the discrete Fourier transform is invoked to calculate h(f ).
The objective of spectrum analysis is to estimate h(f ) using a time series sample, {t (i), x(i)} n i=1 . The time points, t (i), may be unevenly spaced. n is the sample size. The spacing is given by d(i)=t (i)−t (i−1), its average is denoted as d. A spectral peak at a frequency, f , corresponds to a period we denote as T period =1/f .

Lomb-Scargle periodogram
An early spectrum estimator is Schuster's (1898) periodogram, I (f j ), which is evaluated at discrete frequency points, f j . Scargle (1982) suggested for the case of uneven spacing a new version, where Lomb's (1976) time shift, τ Lomb , is given via ( For even spacing (d(i)=d), even n and frequency points f j =1/(nd), . . . , 1/(2d), it follows that τ Lomb =0 and I LS (f j )=I (f j ). To calculate I LS (f j ) on the sample level, that is, using the time series sample, plug in x(i) for X(i) and t (i) for T (i). Scargle's objective behind introducing the Lomb-Scargle periodogram was that the probability distribution of I LS (f j ) should be equal to the distribution of I (f j ). Scargle (1982Scargle ( , 1989 showed that this is so (chi-squared distribution) when X(i) is a Gaussian white noise process (with mean zero and variance S 2 ), X(i)=E N(0, S 2 ) (i).

Welch's overlapped segment averaging
The periodogram is a suitable estimator of line spectra, where harmonic signal components appear as sharp peaks. To enhance the unfavourable inconsistency of the periodogram (i.e., the variance does not decrease with n increasing) when (mis-)applied to estimate continuous spectra, Welch (1967) advanced the idea of Bartlett (1950) to divide a time series {t (i), x(i)} n i=1 into different time segments, calculate the periodograms segment-wise and average them to obtain a reduced estimation variance. Welch (1967) allowed the segments to overlap (for example, by 50%), and the method is called "Welch's Overlapped Segment Averaging" or WOSA procedure. Overlapping has the positive effect that the number of segments, and therefore the number of averaged periodograms, is increased.
The negative effect of using WOSA (number of segments, n 50 >1) is that the frequency points, where the periodograms are calculated, are spaced wider than for n 50 =1. More precisely, f j =(n 50 +1)/(2nd)>1/(nd) for n 50 >1. This is the smoothing problem in the spectral domain, the tradeoff between spectral estimation variance and frequency resolution.
Instead of calculating the Lomb-Scargle periodogram at frequencies f j =1/(nd), 2/(nd), . . ., there is no hindrance to using a finer frequency grid. The oversampling factor (OFAC) (Schulz and Stattegger, 1997) gives the increase in frequency resolution. Oversampling is mainly for "cosmetic" reasons, to have a smoother curve and to be able to determine precisely the frequency of a spectral peak. Like interpolation in the time domain, oversampling does not generate new information and cannot solve the smoothing problem. Welch (1967) showed also that tapering (weighting) the data points X(i) within segments improves the spectrum estimate in terms of bias, variance and suppression of spurious peaks (sidelobes). The overview by Harris (1978) lists the properties of many taper functions, including the Welch taper (negative parabola) that is used in the present paper.

Bandwidth
The degrees of freedom of the chi-squared distribution of a Lomb-Scargle spectrum estimate based on WOSA with 50% overlap and Gaussian distributed X(i) are where c≤0.5 is a constant representing the taper. A uniform taper has c=0.5, a Welch taper c=0.344; further values are listed by Harris (1978). The discrete Fourier transform of a pure harmonic process (X(i) composed of sinus and cosinus terms with frequency f ) has a peak at f . The decay on the flanks of the peak to a value of 10 −6/10 ≈0.251 times the maximum value defines a width in frequency, the 6-dB spectral bandwidth, B s . This is a useful quantity for assessing the frequency resolution, how good closely neighboured spectral peaks can theoretically be separated (Harris, 1978;Nuttall, 1981). The 6-dB bandwidth depends on n, n 50 and the type of taper.

Bootstrap bias correction
If X(i) is a red-noise process on an unevenly spaced timescale, perhaps with superimposed peaks in the frequency domain, the distribution of I LS (f j ) cannot be calculated analytically. Here, simulation methods can be used to infer the distributional properties of the Lomb-Scargle periodogram. Of particular interest is its bias (systematic error).
Monte Carlo experiments (Schulz and Mudelsee, 2002) reveal the bias of the Lomb-Scargle periodogram for an AR(1) process and uneven spacing. The "absolute bias", of I LS (f j ) as an estimator of non-normalized power, h(f ), can be substantial. The bias disappears (i.e., becomes smaller than the "simulation noise") for an AR(1) process and even spacing. Also in peak detection, where normalized power, g(f )=h(f )/S 2 , is analysed, the Lomb-Scargle periodogram exhibits bias. That means, even if the normalization is known, the bias of the Lomb-Scargle periodogram is significant, and it is frequency-dependent.
Spectrum estimation for unevenly spaced time series can be enhanced by combining the WOSA procedure with the Lomb-Scargle periodogram (Schulz and Stattegger, 1997). A bias correction for such estimates was devised by Schulz and Mudelsee (2002). It uses a bootstrap simulation approach based on artificially generated AR(1) time series ("surrogate data"), for which the theoretical spectrum is known (Priestley, 1981). The frequency-dependent bias correction factor is calculated as the ratio of the theoretical AR(1) spectrum and the average Lomb-Scargle spectrum. The bias-corrected spectrum estimate is h (f j ).

Peak detection by red-noise test
A hypothesis test helps to assess whether peaks (or lows) in the estimate of h(f ), denoted as h(f ), are significant or not. Such information is for the climate time series analyst of major relevance because it helps to filter out the variability, to construct and test conceptual climate models -the absolute value of h(f ) is less important. The typical test performed in climate spectral analysis is of H 0 : "X(i) is an AR(1) process, with continuous, red spectrum", the red-noise hypothesis, against H 1 : "X(i) has a mixed spectrum, with peak at f j =f j on a red-noise background." Schulz and Mudelsee (2002) devised the following rednoise test. The null distribution of h(f ) is obtained by fitting an AR(1) process to {t (i), x(i)} n i=1 , that is, estimating the persistence time, τ , followed by bootstrap resampling.
The process is given by The parameter τ defines the "equivalent autocorrelation coefficient", a , via a = exp(−d/τ ) for the case of uneven time spacing. For even spacing, a equals a, the ordinary AR(1) autocorrelation parameter. Even the estimation of a (even spacing) is not trivial because of estimation bias. For an AR(1) process with unknown mean, which is the usual case in climatology, the estimator of a is whereX= n i=1 X(i)/n is the sample mean. The approximate expectation of a is (Kendall, 1954) E ( a) a − (1 + 3a) / (n − 1) .
That means, a underestimates a. Eq. (6) can be used to correct for the negative bias. However, Eq. (6) is valid only for a not too large (Kendall, 1954). A bias formula applicable also to large values (above, say, 0.9) of a has yet only been derived for the simpler case of known mean (Mudelsee, 2001). Regarding the estimation of τ (uneven spacing), Monte Carlo simulations (Mudelsee, M.: Climate Time Series Analysis: Classical Statistical and Bootstrap Methods, Springer, Heidelberg, manuscript in preparation) show that the leastsquares estimator of a = exp(−d/τ ) has a bias similar in size to the bias of a. Mudelsee (2002) presented a least-squares estimation procedure of τ with bias correction and bootstrap percentile confidence intervals. Our conclusions for the practice of persistence time estimation (Mudelsee, 2002) are as follows.
-Large data sizes and not too large values of τ make bias correction via the application of Eq. (6) to a accurate, -It may in practice be difficult to show statistically significant changes of τ over short time intervals and therefore be often advisable to assume a constant persistence time. We use this approach because own analyses (not shown) with the ice core records (Sect. 3.1) divided into sub-intervals, do not indicate significant changes in τ . An option that may apply to a situation where the physics of the sampled climate system indicates a change in the persistence properties and where also very long time series are available, may be to consider a model that divides the time interval into sub-intervals with specific persistence times. Divine et al. (2008) 46 M. Mudelsee et al.: Climate spectrum estimation in the presence of timescale errors have recently suggested a similar model (for even spacing), accompanied by a maximum likelihood estimation procedure.
Resampling is done (Schulz and Mudelsee, 2002) via the surrogate data approach. Instead of (or in addition to) performing bootstrap resampling for deriving the null distribution, one may calculate upper (lower) bounds via the chisquared distribution (Schulz and Mudelsee, 2002). Of practical relevance in climatology, although conventionally ignored, should be not only peaks but also lows in h(f ), frequency intervals where less power than under an AR(1) hypothesis resides. Also the information about the absence of power in a frequency range is helpful; it may, for example, support an argumentation that a certain mechanism does not play a role for generating an observed climate phenomenon.
So far we have assumed as a realistic noise alternative an AR(1) process with positive persistence time τ (uneven spacing) or positive autocorrelation parameter a (even spacing). This corresponds to a "red" spectrum, with more power for lower than for higher frequencies. Fisher et al. (1985) and Andersen et al. (2006) describe the interesting observation that postdepositional effects (wind erosion) at a site of an ice core may lead to "blue" noise, with more power for higher frequencies. This would make, however, the testing of the spectrum alternative problematic. Uneven spacing does not allow to embed a discrete-time, blue-noise process within a continuous-time, blue-noise process (Robinson, 1977;Chan and Tong, 1987) This means that for uneven spacing it is not possible to infer uniquely the parameters of the underlying continuous-time process. Embedding is an important property because it allows a foundation on physics, which works in continuous time (differential equations). A practical approach to ice core spectrum estimation may be to use extra caution in the interpretation of high-frequency peaks.

Multiple tests
Assume for convenience even data size, even spacing and no oversampling. If the red-noise hypothesis test for the existence of a spectral peak is to be carried out for one single, pre-defined frequency f j ∈ f j n/2 j =1 , then selection of the 100(1−α)th percentage point of the red-noise distribution leads to a one-sided hypothesis test with a P-value equal to α. If, what is usually the case, the test is multiple, that means, it is to be carried out for many (if not all) frequencies from the set f j n/2 j =1 , then a higher frequency-point-wise confidence level, (1−α ) with α <α, has to be employed to yield an overall P-value of α. If a test is performed multiple times, it becomes more likely to find a significant single result.
One may define a "maximum effective number of test frequencies", M, via the overall prescribed P-value: For small α and large M this leads to α ≈α/M. The effective number of frequencies refers to a hypothetical situation where M frequencies f j are tested and the spectrum estimates h(f j ), a set of size M, are mutually independent. For the simple case of even data size, even spacing, Gaussian distributional shape and periodogram estimation, independence is fulfilled and the maximum number is M=n/2. If n is odd (other setting unchanged), M=(n−1)/2. Also if the Gaussian assumption is violated not too strongly, the effects on M should be negligible.
Stronger influence on M can have uneven spacing with Lomb-Scargle periodogram estimation (no WOSA). Because the periodogram estimates are then no longer independent, M is reduced. Horne and Baliunas (1986) and VanDongen et al. (1997) studied the effects by means of Monte Carlo simulations. If the {t (i)} n i=1 are more or less uniformly distributed, the approximation M≈n/2 is still acceptable. This formula may also be applied to series where the timescale is even with the exception of a few missing observations. However, if the time points are highly clustered in time, one should not use the number of points, n, but rather the number of clusters, to determine M (VanDongen et al., 1997). The effects of segmenting (WOSA) on M with Lomb-Scargle or ordinary periodogram estimation (no tapering) can be taken into account by using instead of n the number of points per segment: M=NINT[n/(n 50 +1)], see Schulz and Mudelsee (2002). NINT is the nearest integer function. The effects of tapering could in principle be studied by means of Monte Carlo simulations. Restricting the frequency range where to study peaks is an other way to reduce M.
What practical conclusions can be drawn for peak detection in climate spectra? A typical situation is an unevenly spaced timescale without strong clustering, and where the researcher is interested also in the longer periods of variations recorded by the time series. Here, Lomb-Scargle periodogram estimation with tapering, WOSA and n 50 not too high (less than, say, 10) is an option. To have more reliability in the low-frequency spectrum portion, one decides to follow a rule of thumb (Bendat and Piersol, 1986) and requires at least two cycles per segment, that is, one sets the minimum test frequency f j equal to [(2n 50 )/(nd)]. This also reduces M. Regarding the high-frequency spectrum portion, theoretically the uneven spacing allows inferences also for frequencies above 1/(2d), see Scargle (1982). On the other hand, an archive may a priori be known not to preserve a high-frequency signal, for example a marine sediment core affected by bioturbation. Then it would make sense to ignore a part of the high frequencies, leading to a further reduction of M.

Timescale errors
In palaeoclimatology the timescale is usually expected to have uncertainties. The time assigned to a sample, T (i), determined by dating and constructing an age-depth curve, is expected to deviate from the true time value, T true (i). In such cases we write the measured times as i=1, . . . , n, where T noise (i) is the error owing to age uncertainties. Equivalently, the spacing, d(i), has an error component.
To recapitulate, T (i) is a sequence of random variables representing the observed time of a sequence of samples, that is, T (i) is a discrete-time stochastic process ("process level"). The sequence t (i) are the actually observed time values ("sample level"), a realization of the T (i). The function T true (i) is the true time -uniquely defined, but unknown numbers. Because of dating errors we cannot expect that T (i) equals T true (i). Hence, there is a noise component, T noise (i). The noise process is not required to have zero mean. This allows for measurement bias, that is, systematic over-or underestimations. Eq. (7) is a rather general description of uncertain timescales; it may be used for various combinations of climate archives and dating techniques.
Timescale errors lead to a distortion of the estimated spectrum. Two effects are expected: 1. reductions of significance (detectability) of peaks compared to a situation with exact timescale and 2. increases of frequency uncertainty for a detected spectral peak.
Moore and Thomson (1991) and Thomson and Robinson (1996) studied the influence of a "jittered" spacing on the process level. The simple case of independent Gaussian jitter, is analytically tractable. δ d is the standard deviation of the spacing error distribution. Its effect on the true continuoustime spectrum, h(f ), amounts (Moore and Thomson, 1991;Wunsch, 2000) to a multiplication by a frequency-dependent factor: where the constant c 0 serves to give the distorted spectrum, h distort (f ), the nominal area of S 2 . This means, timescale errors in the form of independent jitter add white noise (c 0 ). As a result, spectral peaks have a reduced detectability. Several assumptions went into the derivation of Eq. (9) by Moore and Thomson (1991), this limits its applicability to the practice of climate spectrum estimation.
-No aliasing (h(f )=0 for f >f Ny =1/(2d)). This may in practice be violated to some degree, and lead to power "folded" back into the interval [0; f Ny ] and, possibly, to spurious spectral peaks. In addition, for unevenly spaced time series the Nyquist frequency is not well defined.
-Independent jitter. This is not realistic for many records (e.g., from ice or sediment cores). Moore and Thomson (1991) study AR(1) dependence in the jitter equation (Eq. 8), finding potential for larger effects on the spectrum if the dependence is strong. Still it is questionable how relevant the AR(1) jitter model is. Ice core data could exhibit heteroscedastic jitter owing to compaction. Timescales derived from layer counting might be better described by means of a random walk (i.e., a nonstationary AR(1) process with parameter a=1) than by a jitter model. The argument is as follows. If a layer is miscounted (e.g., missed), then this error propagates without "damping" to the next layer to be counted. It is rather difficult to obtain analytical results in such cases.
-Gaussian jitter distribution. This assumption is not fulfilled without imposing a constraint to guarantee a monotonic age-depth curve. (Moore and Thomson, 1991 had the purpose to study other data, in the spatial domain, where no such constraint is required.) -Process level. The mentioned paper does not study the spectrum estimators on the sample level, in particular, Lomb-Scargle estimation.
Based on the limited relevance of available analytical results on the effects of realistic types of timescale errors on spectrum estimates (Lomb-Scargle), we suggest a novel numerical algorithm. It is based on simulations of the timescale. One mode of it quantifies the reduced detectability, the other the increased frequency uncertainty. 2.7.1 Algorithm REDFITmc2, mode "a": peak detection Algorithm REDFITmc2 adapts the REDFIT algorithm (Schulz and Mudelsee, 2002) to take into account timescale errors. Mode "a" of the algorithm performs peak detection.

Determine area, A h , under spectrum within
after Eq. (4).

Go to
Step 5 until b=B replications exist.
9. Test at each f j whether h (f j ) exceeds a pre-defined The selection of B depends on the size of the percentile; higher percentiles require higher values of B.
The sets of frequencies f j at Steps 2 and 7 are identical. Taking into account that the test is usually performed not only at one single frequency but at f j =0, 1/(nd), . . . , 1/(2d) requires to adjust the significance level for a single test. See Sect. 2.6. 2.7.2 Algorithm REDFITmc2, mode "b": frequency uncertainty Mode "b" of the algorithm quantifies the effect of timescale errors on the uncertainty of the frequency of a spectral peak.
12. Calculate standard error, se f j , by taking the standard .

Examples
We illustrate spectrum estimation using two types of palaeoclimate archives: ice cores and speleothems. We describe shortly the records (proxy variables) and their timescales. The results of the application of the REDFITmc2 algorithm are discussed with a focus on statistical aspects. Climatic interpretations shall be presented elsewhere.
3.1 Ice core: EPICA Dome C (Antarctica) The European Project for Ice Coring in Antarctica (EPICA) core from the Dome C site (75 • S, 123 • E) has a length of ∼3260 m and covers the past ∼800 000 a (EPICA Community Members, 2006; Wolff et al., 2006;Jouzel et al., 2007). We refer to this as the EDC core.

Data
The deuterium isotope (δD) time series  records variations in air temperature at the inversion height over the EDC site. This proxy may indicate, at a reduced accuracy, also the temperature changes over the entire southern hemisphere. The non-sea-salt calcium (nssCa) flux record (Wolff et al., 2006;Röthlisberger et al., 2008) documents variations in the climate variable dust (transported from South America to Antarctica). The sea-salt sodium (ssNa) flux (Wolff et al., 2006;Röthlisberger et al., 2008) series is a proxy for changes in the extent of sea ice around Antartica. The nssCa flux and ssNa flux records have rather strongly skewed data distributions. We employ the logarithmic transformation (Röthlisberger et al., 2008) to bring the distributions closer to a symmetric/normal shape. All three climate variables display long-term variations during the late Pleistocene (Fig. 1) and belong potentially to the set of variables governing the ice-age climate. Spectrum estimates may shed further light to the current understanding (Raymo and Huybers, 2008;Röthlisberger et al., 2008) of glacial-interglacial transitions. We follow Jouzel et al. (2007) and divide the interval into an earlier (400-800 ka) and a later (0-400 ka) interval.

Spectra, without timescale errors
First, we present the raw periodograms (Fig. 2), then the climate spectra calculated with timescale errors ignored (Fig. 3) and compare finally (Sect. 3.1.4) some of these results with those resulting from taking timescale errors into account.
In the δD spectrum for the earlier interval (400-800 ka) (Fig. 3b), from the Milankovitch periods only one peak appears to exist, at the period of obliquity variations (41 ka). Jouzel et al. (2007) find this and also an other peak, at a period of roughly 100 ka. For the later interval (0-400 ka) there are three peaks in the Lomb-Scargle spectrum (Fig. 3a), at the periods 109 ka, 40 ka and 23 ka. This agrees with the findings of Jouzel et al. (2007).
The question of the statistical significance of the Milankovitch peaks of the δD spectra is deferred to Sect. 3.1.4. Here we remark that the peaks at 64 ka and 30 ka period in the unsmoothed periodogram for δD (Fig. 2a), do not appear in the (more reliable) smoothed spectrum estimate (Fig. 3a,  b). The reason could be that the shortness of the 400-ka intervals leads to a large bandwidth, B s , which in turn prevents detection and separability of those peaks. An other reason is that their appearance in the unsmoothed periodogram is spurious owing to the large estimation variance.
The log(nssCa flux) record displays remarkably "clean" spectra ( Fig. 3c, d), with power concentrated in two frequency bands: approximately 100-109 ka and approximately 41-50 ka. The log(ssNa flux) spectra (Fig. 3e, f), on the other hand, show more peaks that lie above the upper red-noise levels.
We also show spectra for the Holocene (Fig. 4). These were calculated using records of very high temporal resolution (δD, 55-cm resolution; nssCa flux and ssNa flux, 2-a downsampled). A number of peaks appear, mainly in the centennial band.

Timescale
The "official" timescale, called EDC3, was produced  by employing a snow accumulation and ice-flow model and invoking constraints in form of a set of independent age markers. The markers are given, on the one hand, by events such as a reversal of the Earth's magnetic field or a volcanic eruption that has been recorded by an absolutely dated palaeoclimatic archive. On the other hand, pattern matching of the proxy climate series with insolation series produced an other kind of absolutely dated fixpoints.
Like any palaeoclimatic timescale, the EDC3 timescale relies on the validness of the assumptions made (e.g., model formulation, model parameter values) and the precision of numerical input values (e.g., absolute dates of markers). The EDC3 timescale is, thus, an estimate of the true, but unknown timescale. Parrenin et al. (2007,   The application of the REDFITmc2 algorithm requires a strictly monotonically increasing timescale. We utilized the regression model of Heegaard et al. (2005) in a Bayesian manner to produce copies of {t * (i)} n i=1 by means of the posterior age distributions. The input to this method are the EDC3 dating fixpoints and their errors   Table 1 therein). The same method has been recently used for timescale construction on speleothem data (Spötl et al., 2008).
A summary of the Bayesian method is as follows. Owing to their finite precision, the determined fixpoint ages may overlap within the dating error bars. This may result in age reversals in the simulated fixpoint timescale. To avoid such reversals, we used a version of the Gibbs sampler, which is a Markov-chain Monte Carlo implementation (Buck and Millard, 2004). In a first step, an initial fixpoint timescale is simulated by drawing random numbers from the age distribution of each individual fixpoint. Then a sample age for the youngest fixpoint   Table 1 therein) is obtained taking into account (1) its age distribution and (2) the condition that it is younger than all other fixpoints. If that condition is violated, the age-simulation is repeated until the stratigraphic constraint is fulfilled. Similarly, the second fixpoint age of the section is simulated using its age distribution and the condition that it is older than the fixpoint above but younger than all other samples. This simulation is performed for all fixpoint ages of the section, which completes one iteration of the Gibbs sampler and results in a strictly monotonically increasing sequence of simulated fixpoints. In the next step we applied the Heegaard et al. (2005) nonparametric regression model to the simulated fixpoints, providing a simulated timescale for the ice core proxy data. It may occur that the timescale produced by the Heegaard et al. (2005) algorithm does not strictly monotonically increase despite the simulated fixpoints do increase. (Such cases were rare in the case of the EDC simulations.) Thus, in a last step every simulated timescale, {t * (i)} n i=1 , is tested for apparent age reversals, and timescales with age reversals are discarded.
We apply the Bayesian method here to the ice core records for heuristic reasons. Other approaches for constructing {t * (i)} n i=1 can be conceived, for example, piecewise-linear least squares regression (Sect. 3.2.2). The similarity between that (which may be labelled "frequentist") and the "Bayesian" approach described in the preceding paragraph emphasizes the notion that the crucial point is not from which statistical school the method comes but that it reflects the underlying physics of the sampling of the climate archive. A further option is a "perturbed" version of physical timescale modelling . In the latter method, the B model runs are performed with model parameters not fixed but randomly drawn for each run.

Spectra, with timescale errors
The δD spectrum for the interval from 0 to 400 ka shows that the 40 ka obliquity cycle is significant also when taking into account timescale errors (Fig. 5a). The overall confidence level depends on the number of test frequencies, M. If we adopt a position that investigates only manifestations of three Milankovitch cycles (eccentricity, obliquity and precession) in the EDC records, then M=3 and the overall confidence level of the peak is approximately equal to 1−(1−99%)×M=97% or higher (Fig. 5a). The uncertainty of the estimated peak frequency, se f j , is of the same order as the half bandwidth.
Also the significance of the EDC δD cycle at T period =109 ka is robust against errors in the timescale (Fig. 5a). The major uncertainty associated with this cycle is the large bandwidth, which in turn results from the relative shortness of the selected time interval.
The significance of the precession cycle (T period =23 ka) in δD is already not high when ignoring timescale errors (Fig. 3a). We cannot therefore expect elevated significance levels in the more realistic test (obtained using t * ) and do not show that frequency range in the plot (Fig. 5a) exist in Late Pleistocene temperature variations over Antarctica -making such a rather general statement would require the analysis of the large body of previous evidence (multiple records and ice cores) analysed using multiple statistical tests (Sect. 2.6). Such tests had to evaluate also the degree of circular reasoning invoked by astronomically tuning the palaeoclimate timescales. The less general statement that obliquity has a stronger power than precession (temperature variations, 0-400 ka), on the other hand, appears to be supported well by the spectral analyses shown here. The EDC log(nssCa flux) record in the Holocene has a clear signal at T period =213 a (Fig. 4b) that is significant at a high level also when taking into account timescale errors (Fig. 5b). These errors do not largely increase the accuracy of T period . The reason for the small effects is that the EDC timescale is rather accurately known for the Holocene interval. Parrenin et al. (2007, Table 1 therein) give timescale errors ranging from 1-6 a in the latest part (last 800 a) to 50-180 a in the earlier part of the Holocene. On basis of the height of the peak, we suppose that the significance of the 213-year cycle is valid also when taking into account the multiplicity of the test (no calculations performed with higher percentiles).
There exists a considerable body of proxy evidence (Stuiver and Braziunas, 1993) for centennial solar activity variations during the Holocene; the cycle with a period of somewhat above 200 years is named after Hessel de Vries and Hans Suess. The coincidence of this solar cycle with the peak in the spectrum of log(nssCa flux) should not be interpreted as a proof of solar forcing of dust flux variations in the southern hemisphere during the Holocene. Rather, this finding may encourage to explore such relationships using physical climate models.

Stalagmite: Q5 (Oman)
Stalagmite Q5 from the Qunf cave (17 • N, 123 • E) in Oman has a length of nearly 1 m and covers the interval from 10 300 a to 2740 a before the present (B.P.), see Fleitmann et al. (2003). Note that "present" is set to A.D. 1950.

Data
The oxygen isotope (δ 18 O) time series (Fleitmann et al., 2003) is a proxy mainly for changes in the intensity of Indian Ocean monsoonal rainfall (Neff et al., 2001;Fleitmann et al., 2003Fleitmann et al., , 2007: high δ 18 O values correspond to low rainfall amounts. The Q5 δ 18 O record (Fig. 6) displays certain long-term (trend) features: the onset of the Indian Ocean monsoon in the Early Holocene, a high ("optimum") level until ∼7200 a B.P. and a subsequent gradual decrease in rainfall intensity, which is thought to be related to the change in solar insolation (Fleitmann et al., 2003). The trend is punctuated by events of extreme dryness (heavy δ 18 O); the most prominent are the 8.2-ka and the 9.2-ka events (Fleitmann et al., 2008). We focus here on the interval [2740 a; 8000 a], that means, the time span after those events. The fitted trend was subtracted prior to spectrum estimation to extract the trendstationary component. (REDFITmc2 subtracts, as its predecessor REDFIT (Schulz and Mudelsee, 2002), segment-wise a linear fit).  (Mudelsee, 2000), and a sinusoid in the late part, fitted by ordinary leastsquares. Spectral analysis focuses on the period after the 8.2-ka and 9.2-ka cold/dry extremes (arrows) (Fleitmann et al., 2008). This time interval [2740 a; 8000 a] has n=973 andd=5.4 a. Fleitmann et al. (2003,  To resample the times, t * , for feeding them into the RED-FITmc2 algorithm (Sects. 2.7.1 and 2.7.2), we overtake the piecewise model. We impose a constraint to ensure that the {t * (i)} n i=1 increase strictly monotonically. 3. Timescale, t (i)=f (z(i)), i=1, . . . , n. The function f (·) is a piecewise linear function through the dating points.

Spectrum, with timescale errors
The Q5 spectrum (Fig. 7) exhibits a number of peaks above the upper bounds for the AR(1) hypothesis. Peak I (T period =10.9 a) is significant also when taking the test multiplicity into account. This peak from a Holocene monsoon proxy record may correspond to the cycle found in the 2) is studied using algorithm REDFITmc2. The 90% bootstrap bound (the lower of the grey, wiggly lines in a and b) is obtained from mode "a" (Section 2.7.1) with B=10000; the 99.9% bound (upper grey lines) with B=100 000. The frequency uncertainties se f j (horizontal bars) due to timescale errors (mode "b", Sect. 2.7.1), calculated with B=2000, are compared with the half of the bandwidth.
record of the number of sunspots, a proxy for solar activity variations, observed since early 17th century (Hoyt and Schatten, 1997). Not as high confidence levels are achieved by the three periods in the centennial band (II, T period =107 a; III, T period =137 a; IV, T period =221 a). The last peak (V, T period =963 a), again strong, may be related to a peak in the spectrum of radiocarbon variations (Stuiver and Braziunas, 1993), which contain information about changes in solar activity. The timescale of stalagmite Q5 is not exactly known, it has errors stemming from dating uncertainties. How does this influence the detectability and the frequency accuracy of the monsoon peaks?
The upper percentiles of the red-noise alternative obtained with timescale simulations (Fig. 7) are over the whole frequency interval higher than the corresponding percentiles obtained from ignoring dating uncertainties, as expected. This effect seems in case of stalagmite Q5 not excessively large, except for higher frequencies (Fig. 7b). Especially the 99.9% level becomes inflated by the timescale errors, to such a degree that peak I at T period = 10.9 a is not significant anymore in a multiple test. The only peak in the monsoon spectrum passing the hard test (timescale errors, multiplicity) is that at T period = 963 a.
Feeding the resampled times into the REDFITmc2 algorithm, mode "b" allows to quantify the standard error, se f j , of the frequencies of the monsoon peaks owing to timescale errors. Again, the effects are larger on the high-frequency (Fig. 7b) than on the low-frequency side (Fig. 7a). There, the half of the 6-dB bandwidth is in the same order of magnitude as the frequency standard error, se f j . On the high-frequency side the error interval for the period of peak I (T period =10.9 a) is from 1/(1/10.9+se f j )=10.6 years to 1/(1/10.9−se f j )=11.4 years.
To summarize, the contribution of spectral analysis to answering the question after the existence of solar peaks in the spectrum of the Holocene monsoon proxy record from stalagmite Q5 is as follows. Peak I corresponds within error bars of frequency to the sunspot cycle, but taking into account timescale errors reduces its multiple test significance considerably. Peaks II (T period =107 a), III (T period =137 a) and IV (T period =221 a), which are partly at periods similar to what is found for the Holocene radiocarbon record (Stuiver and Braziunas, 1993), are not statistically significant (multiple test) even when ignoring timescale errors. Only peak V at T period =963 a, also a solar cycle candidate, is significant.
It would be premature for an analysis of the Sun-monsoon relation to stop at this point. Four lines should be explored. First, the stability of the phasing between both signals should be examined by means of bivariate spectral analysis. Second, the relation can be further investigated, using the same data sets, in the time domain by means of bandpass or harmonic filtering (Ferraz-Mello, 1981). Third, the climate physics of the Sun-monsoon link can be considered. This has been done by Kodera (2004), who explained a positive correlation between solar activity and Indian monsoon strength via a weakening of the Brewer-Dobson circulation in the stratosphere. However, this was established on measurement data from 1958-1999, and the feasibility of this or other mechanisms on longer timescales is still elusive. Fourth, other records of Holocene monsoon variations need to be analysed. For example, Neff et al. (2001) analysed a δ 18 O record from a stalagmite from an other cave than where Q5 is from, finding monsoon peaks at T period =10.7 a, 226 a and 1018 a. Combining this evidence with the information from Q5 in a new multiple test should raise the overall statistical significance. A synopsis of evidence pro and contra the Sun-monsoon hypothesis in a multiple statistical test, with timescale errors taken into account, is a major task awaiting to be done.

Conclusions
The results from applying the novel REDFITmc2 algorithm for spectrum estimation in the presence of timescale errors support in a quantitative manner the intuition that stronger effects on the spectrum estimate (decreased detectability and increased frequency uncertainty) occur for higher frequencies. Analogously, the results agree with the spectrum distortion (Eq. 9) that is based on assuming timescale errors in the form of independent Gaussian jitter.
The surplus information brought by the algorithm consists of: 1. the corrected upper levels of the spectrum of the rednoise AR(1) alternative, conjectured to be valid for timescale errors of arbitrary form; 2. the uncertainty of the frequency of a spectral peak.
This paper is a first, introductory step. Monte Carlo simulations with pre-defined spectral properties and pre-defined timescale error models, may shed more light on the performance of the REDFITmc2 algorithm for timescale errors of arbitrary form and further support our conjecture. Such simulations may also help to quantify the effects of misspecification of the timescale error model.
Regarding timescale construction, we conclude that not only the fixpoints, dating errors and the functional form of the age-depth model play a role. Also the joint distribution of all time points, that means, serial correlation (jitter) and constraints such as the stratigraphic order, determines spectrum estimation.
Application of REDFITmc2 to ice core and stalagmite records of palaeoclimate allowed a more realistic evaluation of spectral peaks than when ignoring this source of uncertainty.