A general theory on frequency and time – frequency analysis of irregularly sampled time series based on projection methods – Part 2 : Extension to time – frequency analysis

Geophysical time series are sometimes sampled irregularly along the time axis. The situation is particularly frequent in palaeoclimatology. Yet, there is so far no general framework for handling the continuous wavelet transform when the time sampling is irregular. Here we provide such a framework. To this end, we define the scalogram as the continuous-wavelet-transform equivalent of the extended Lomb–Scargle periodogram defined in Part 1 of this study (Lenoir and Crucifix, 2018). The signal being analysed is modelled as the sum of a locally periodic component in the time–frequency plane, a polynomial trend, and a background noise. The mother wavelet adopted here is the Morlet wavelet classically used in geophysical applications. The background noise model is a stationary Gaussian continuous autoregressive-moving-average (CARMA) process, which is more general than the traditional Gaussian white and red noise processes. The scalogram is smoothed by averaging over neighbouring times in order to reduce its variance. The Shannon–Nyquist exclusion zone is however defined as the area corrupted by local aliasing issues. The local amplitude in the time–frequency plane is then estimated with least-squares methods. We also derive an approximate formula linking the squared amplitude and the scalogram. Based on this property, we define a new analysis tool: the weighted smoothed scalogram, which we recommend for most analyses. The estimated signal amplitude also gives access to band and ridge filtering. Finally, we design a test of significance for the weighted smoothed scalogram against the stationary Gaussian CARMA background noise, and provide algorithms for computing confidence levels, either analytically or with Monte Carlo Markov chain methods. All the analysis tools presented in this article are available to the reader in the Python package WAVEPAL.


Introduction
The continuous wavelet transform (CWT) is widely used for the time-frequency analysis of geophysical time series, mainly through its scalogram, which is the squared modulus of the CWT.The CWT relies on a probing function, called the mother wavelet.A common choice for the mother wavelet is the Morlet wavelet (Grossmann and Morlet, 1984), which is well suited for the analysis of signals whose components have a time-varying frequency and/or a time-varying amplitude.The scalogram is then often smoothed to reduce its variance, and significance testing against a stationary Gaussian white or red noise is commonly applied.Stateof-the-art references in climate for the analysis of regularly sampled time series include Torrence and Compo (1998), who provide the basis for the subsequent works; Torrence and Webster (1999), who define a smoothing method for the scalogram (which is a particular case of the wavelet coherency developed in there); and Maraun and Kurths (2004) and Maraun et al. (2007), who build more reliable significance tests for the smoothed scalogram.A non-exhaustive list of the applications, in climatology, of the scalogram of the CWT with the Morlet wavelet, includes the following.
Published by Copernicus Publications on behalf of the European Geosciences Union & the American Geophysical Union.
G. Lenoir and M. Crucifix: A general theory on time-frequency analysis of irregularly sampled time series -Studies in climate and weather: analysis of the El Niño Southern Oscillation in Torrence and Compo (1998), analysis of the Arctic oscillation in Grinsted et al. (2004), or the analysis of daily precipitation in the Alps in Schaefli et al. (2007).
-Studies in palaeoclimate: analysis of the astronomical forcing in Berger et al. (1998), analysis of the mid-Pleistocene transition in Elderfield et al. (2012), or the analysis of the equatorial Pacific thermocline over the last eight glacial periods in Regoli et al. (2015).
Most of these studies use the algorithms provided by the papers cited above or similar algorithms, and all of them require the data to be regularly spaced.However, it may happen that the time series be intrinsically irregularly sampled (this actually happens in some of the above examples) and it is then interpolated on a regularly spaced grid in order to apply the algorithms of the CWT and its scalogram.But the interpolation procedure may seriously affect the analysis with unpredictable consequences for the scientific interpretation, especially when performing significance testing.This is illustrated in Appendix F. A solution to this problem was addressed by Foster in a series of articles which share a common thread with our two papers, in the sense that it first generalises the Lomb-Scargle periodogram, based on orthogonal projection methods, in Foster (1996a) and Foster (1996b), and then extends the formalism to the continuous wavelet transform, in Foster (1996c), allowing the time series not to be interpolated.Foster's formulas were motivated by the astronomical study of the light curves of variable stars, which are unevenly sampled time series with large gaps.The methods presented in this article are influenced by Foster's theory and it is shown in Appendix E2 that most of its formulas can actually be deduced from our general framework.The main limitations of Foster's theory are the following (see Appendix E2 for detailed explanations): significance testing is only performed for the white noise background case, it only deals with the unsmoothed scalogram, and the areas in the time-frequency plane corrupted by aliasing are underestimated.It therefore suffers from a limited interest in geophysical applications1 .Another study tackling the problem of the estimation of the scalogram of irregularly sampled time series, without interpolating the data, is given in Mathias et al. (2004).In there, the authors propose an estimator of the amplitude of the signal, locally in the time-frequency plane, but no significance testing is performed.The other limitations of their algorithm are basically the same as for Foster's formulas.This is detailed in Appendix E3.
In this article, we extend the analysis tools that we derived in the first part of this study (Lenoir and Crucifix, 2018) in the case of the frequency analysis of irregularly sampled time series.They are based on a similar model, which is a locally periodic component in the time-frequency plane, plus a polynomial trend, plus a stationary Gaussian continuous autoregressive-moving-average (CARMA) process.Let us sketch the main points of the article.First, the taper of the periodogram, derived in Lenoir and Crucifix (2018, Sect. 4.4), is chosen here to be a time-dependent Gaussian function with a variance depending on the scale in order to define the Morlet wavelet-based scalogram.This is detailed in Sect.3.2 of this work.Second, the scalogram is smoothed in order to reduce its variance, by averaging over neighbouring times.To this end, we apply the same formula as in Cohen and Walden (2010).This is explained in Sect. 3.4. Third,in Sect. 3.5, we estimate the amplitude of the locally periodic component, extending the results obtained in Sect.6 of Part 1, and define, in Sect.3.6 of this article, the weighted smoothed scalogram as the time-frequency analogue of the weighted WOSA periodogram defined in the first part of this study (Lenoir and Crucifix, 2018).Fourth, we define in Sect.3.8 the Shannon-Nyquist exclusion zone (SNEZ) as the area of the time-frequency plane which must be excluded from the analysis because of the local aliasing issues.Fifth, we design a test of significance for the weighted smoothed scalogram, against the stationary Gaussian CARMA background noise.This is based on the theory developed in Sect. 5 of Part 1.More specifically, we define a null and an alternative hypothesis, and estimate the distribution of the weighted smoothed scalogram under the null hypothesis, either analytically, conserving the first moments of the distribution, or with Markov Chain Monte Carlo (MCMC) methods.The latter approach allows the uncertainty in the parameters of the CARMA background process to be fully taken into account.This is presented in Sect. 4. Sixth, we provide, in Sect.5, formulas for filtering the signal in a band delimited by two scales, or with the ridges, which are the lines going through the maxima of the estimated amplitude, in the time-frequency plane.Ridge filtering is based on state-of-the-art algorithms provided in Lilly and Olhede (2010) and (https://github.com/jonathanlilly/jLab). Seventh, we define in Sect.6 the global scalogram as the time-averaged weighted smoothed scalogram, resulting in a periodogram-like analysis tool with a frequency-varying bandwidth.Eighth, we illustrate, in Sect.7, the theory on the same palaeoclimate data set as in our first article (Lenoir and Crucifix, 2018).Finally, a Python package named WAVEPAL is available to the reader and is presented in Sect.8. Before tackling the problem of irregularly sampled time series, the paper starts with the theory of the CWT applied to continuous-time signals.This gives the bases for the subsequent developments.
Most of the mathematical concepts and notations are introduced in the first part of this study (Lenoir and Crucifix, 2018), and the reader is invited to revise them.Throughout this article, we will denote the equations of the preceding paper by, for example, "Eq.(I,30)", meaning "the equation (30) of Part 1", and will refer to the paper itself by "Part 1".
2 The continuous wavelet transform of continuous-time processes

The continuous wavelet transform and its scalogram
The mathematical background to Fourier analysis is given in Appendix A. Let S denote the Schwartz space.The continuous wavelet transform of a function x ∈ S(R) is where ψ τ,a ∈ S(R) is defined by Here, ψ is called the mother wavelet, τ ∈ R is the translation time, a ∈ R + 0 is the scale, and c(a) ∼ a m with m ∈ Q.We can write the CWT as a convolution product, where in which • denotes the complex conjugate.From the convolution theorem, and S x (τ, a) is then obtained by taking the inverse Fourier transform.
|S x (τ, a)|2 gives the local power in the time-scale plane, and is called the scalogram by analogy with the periodogram.

The wavelet power spectrum
The wavelet power spectrum (WPS) of a continuous-time stochastic process {x(t)} t∈R is defined by the following (see Li and Oh, 2002;Maraun and Kurths, 2004): where the expectation is taken over the samples of the stochastic process.A simple example is the WPS of a realvalued stationary white noise.Define {η(t)} t∈R satisfying the following covariance property: Its WPS is then

The Morlet wavelet as the mother wavelet
In this article, we choose the mother wavelet ψ to be the Morlet wavelet (Grossmann and Morlet, 1984): This mother wavelet is a complex plane wave weighted by a Gaussian, to which is added a correction term to make it admissible 2 , i.e. satisfying This correction term is negligible 3 for σ 0 ω 0 ≥ 5.5.If this inequality is satisfied, and with the variable change a = a/ω 0 , the CWT with the Morlet wavelet is where c(a ) ∼ (a ) m , m ∈ Q, and c(a ) holds all the multiplicative constants.Without loss of generality, we impose σ 0 = 1 and assume that is fulfilled in the following of this article.Therefore, where Under this form, interpreting Eq. ( 12) is straightforward: the CWT is the inner product between the signal x and a Gaussian wave packet centred in τ = t, of period 2π a, and with numerical support 4 of length 6ω 0 a.As the scale increases (decreases), the support becomes wider ( narrower).

On the parameter c(a)
There are two common choices for c(a) (see Maraun and Kurths, 2004, Sect. 3).The first one is c(a G. Lenoir and M. Crucifix: A general theory on time-frequency analysis of irregularly sampled time series and gives a constant L 2 norm for ψ τ,a , namely ||ψ τ,a || = ||ψ||.This implies that the wavelet power spectrum of a white noise is flat, as we can see in Eq. ( 8).The second choice is which gives a constant L 1 norm for ψ τ,a and, most importantly, gives the same maximal power for sines of the same amplitude and with different frequencies.Indeed, from the Fourier transform of ψ a , and applying Eq. ( 5), we must require c(a)a to be constant to have the maxima of the scalogram of a sum sine waves (all with the same amplitude but with different frequencies) invariant with the scale.

The parameter ω 0 and the time-frequency resolution
The parameter ω 0 controls the time-frequency resolution, as it can be seen from the standard deviations of the Gaussian weights in ψ a , Eq. ( 13), and in its Fourier transform, Eq. ( 16).The standard deviations are equal to ω 0 a and 1/ω 0 a respectively.Consequently, for a fixed scale, increasing (decreasing) the value of ω 0 will generate a CWT with a better (worse) frequency resolution and a worse (better) time resolution.This property is of primary importance for the applications to time series, as illustrated in Sect.7. Note that, for any time-frequency transform, there is always a tradeoff between time and frequency localisation according to the Fourier uncertainty principle.The Morlet wavelet exhibits the best trade-off, thanks to its Gaussian shape.We provide further details on this topic in Appendix B.

Scale-to-period conversion
The Morlet wavelet is often used to detect the periodicities in a signal, and it is therefore suitable to convert scales a into periods T (Meyers et al., 1993).In practice, take a signal and is independent of τ .Scale-to-period conversion is performed with the value of the scale for which |S(τ, a)| 2 is at its maximum (as a function of a).We find the following: For a fixed scale, and while ω 0 ≥ 5.5, the difference between both never exceeds 2 %.

Reconstruction with the amplitude ridges
Reconstruction of a signal can be performed with the CWT along the amplitude ridges5 (Carmona et al., 1997), which are the lines going through the maxima of the scalogram.Indeed, take the signal x(t) = A exp(iωt) and c(a) ∼ 1/a.Its scalogram is maximum at a = 1/ω (from Eq. 17) and we can therefore easily recover the amplitude A at each time τ , going through the ridge a(τ ) = 1/ω in the scalogram, on which we have |S(τ, 1/ω)| = αA∀τ , where α ∈ R is a multiplicative constant.Jointly with the amplitude, the full signal x(t) can be exactly recovered from the CWT along the ridge.This can be extended to signals with slowly varying amplitude and phase (see Delprat et al., 1992;Carmona et al., 1997), namely, for which the CWT taken along the ridge, i.e. at the maxima of its modulus, can approximately reconstruct x(t).
The inequality in Eq. ( 19), called the asymptoticity condition, means that the instantaneous frequency inside the wave packet must be much smaller than the frequency of the amplitude of the wave packet.The analysis can be further extended to a sum of asymptotic signals plus noise, which can be detected by multiple ridges (Carmona et al., 1999).When considering a real signal like x(t) = A(t) cos(φ(t)), we have to work with its analytic counterpart, which is built from the Fourier transform of x, x, for which we impose x(ω < 0) = 0 and then take the inverse Fourier transform.Analyticity ensures that the phase and amplitude of a signal are uniquely determined (see Lilly and Olhede, 2010, and the references therein for more details).State-of-the-art algorithms for ridge detection are developed in Lilly and Olhede (2010) and are available for use in the package jLab (https://github.com/jonathanlilly/jLab), in which the ridgefinding algorithm is general enough to be applied to various mother wavelets, such as the Morlet wavelet.By construction, ridge filtering is well-adapted for filtering a multi-periodic signal, even if it is plunged in a noisy environment (Lilly and Olhede, 2010).In such conditions, it outperforms the techniques based on the Hilbert transform.As mentioned in Lilly and Olhede (2010, p. 4135), "[...] the Hilbert transform can lead to disastrous results as the amplitude and phase will then reflect the aggregate properties of the multi-component signal".

Writing the scalogram under the formalism of orthogonal projections
Finally, we mention that the scalogram can be written under the formalism of orthonormal projections.Indeed, defining which has a unit norm, the scalogram can be formulated as where 3 The continuous wavelet transform of irregularly sampled time series

The model for the data
We consider the same model as in Part 1: where |X = [X 1 , . .., X N ] and is real, |t = [t 1 , . .., t N ] , A τ,a = E τ,a cos(φ τ,a ), B τ,a = −E τ,a sin(φ τ,a ), E 2 τ,a = A 2 τ,a +B 2 τ,a , |c = [cos( t 1 ), . .., cos( t N )] and |s = [sin( t 1 ), . .., sin( t N )] .We have added subscripts (τ, a) since all the subsequent analyses will be done in the time-scale plane.The trend is a polynomial of degree m, and the background noise term, |Noise , is a zero-mean stationary Gaussian CARMA process sampled at the times of |t , as defined in Sect.3.2 of Part 1.As stated in Part 1, considering or not the presence of a trend in the model for the data is left to the user, given that we can always interpret a polynomial trend of low order as a very low-frequency oscillation.

The scalogram
When applying the CWT to finite discrete time series, a choice for the discretisation must be made.In the influential paper of Torrence and Compo (1998), which deals with regularly sampled time series, the expression under the form of a convolution product in the Fourier space, Eq. ( 5), is conserved, and computed with the discrete Fourier transform (DFT) of the data.The CWT is then the inverse DFT of the convolution product.Unfortunately, we cannot extend the convolution theorem6 to irregularly spaced time series and we cannot therefore follow the same computational procedure as in Torrence and Compo (1998).Alternatively, we can conserve the squared norm of the orthogonal projection, Eq. ( 21).The advantage of such a formalism is that it can be applied to irregularly sampled time series, as shown in Part 1.
Similarly to Part 1, we work with cosines and sines instead of working with complex exponentials.Very little difference is observed between both choices.Based on the results of Sect.2.8, our Morlet wavelet scalogram for irregularly sampled time series is therefore where G τ,a is a diagonal matrix with diagonal elements and |c τ,a = cos ((|t − τ )/a) , |s τ,a = sin ((|t − τ )/a) , (26) are vectors of length N. We can impose τ = 0 into the cosine and sine terms, since sp{|G τ,a c τ,a , |G τ,a s τ,a } is invariant with respect to the variable τ appearing in the cosine and sine, and the scalogram becomes In the following, the notations |G τ,a c a and G τ,a |c a refer to the same vector.Our wavelet scalogram is similar to the tapered periodogram defined in Sect.4.4 of Part 1, and its properties and generalisations will therefore be similar as well.In particular, the variables a and τ are considered as continuous variables, similarly to the continuous frequency variable of Part 1.
When the time series is regularly sampled, the scalogram, given by Eq. ( 27), is extremely close to what is obtained with the traditional approach based on the convolution theorem, e.g. in Torrence and Compo (1998), Maraun and Kurths (2004) or Cohen and Walden (2010).This is illustrated with a regularly sampled time series representing a noisy version of the caloric summer insolation 7 at 65 • N, in Fig. 1b and c.We also show the effect of randomly removing a large amount of data points, with the resulting time series being irregularly sampled.This is illustrated in Fig. 1d and we observe that our algorithms still do a good job at estimating significant regions, although there are some artefacts and the power tends to be overestimated.Equation ( 27) reduces to the Lomb-Scargle periodogram, defined in Eq. (I,40), if the weight G τ,a is replaced by the identity matrix.Similarly to the Lomb-Scargle periodogram, we rescale |G τ,a c a and |G τ,a s a such that they are orthonormal.This can be done by defining , where β τ,a is the solution of The scalogram is then

Scalogram and trend
Analogously to Sect.4.3 of Part 1, we extend the scalogram to take into account the presence of a polynomial trend of degree m in the data.Indeed, the scalogram defined in Sect.3.2 applies well to data which can be modelled as If we want to work with the full model, Eq. ( 22), holding a polynomial trend of degree m, we define a new scalogram as which is invariant with respect to the parameters of the trend.This is the analogue of Eq. (I,57).

Smoothing the scalogram
The scalogram suffers from the same inconsistency issue as the periodogram, in the sense that it remains very noisy regardless of the number of data points we have at our disposal8 .Smoothing techniques must therefore be applied, and we proceed like in Part 1, extending the formulas used with regularly sampled time series.Note that the disadvantage of any smoothing procedure is that the resolution (in time, frequency or both, depending on the smoothing choice) is reduced.Consequently, there is always a trade-off between variance reduction and resolution.Smoothing is traditionally performed by averaging the scalogram over neighbouring points in the time-scale plane, either by averaging over times followed by averaging over scales (Torrence and Webster, 1999;Grinsted et al., 2004), or simply by averaging over time (Cohen and Walden, 2010).In this work, we apply the latter technique because, even for very simple signals like |X = sin(ω|t ), the correlations in the scalogram between neighbouring scales, for a fixed time, are highly irregular when the time series is irregularly sampled, unlike the correlations between neighbouring times, for a fixed scale, which are driven by the Gaussian shape of the wave packets |G τ,a c a and |G τ,a s a .Smoothing over time must be carried out in accordance with the length of the support of the wave packets, which is proportional to the scale and to parameter ω 0 (Eq. 25).This choice also implies that the number of oscillations over which smoothing is performed is constant throughout the time-scale plane.This results from Eq. ( 26).
We adopt here the smoothing procedure of Cohen and Walden (2010) for which they derived analytical asymptotic results in the case of regularly sampled time series.The averaging window is a square window with a length proportional to the scale.Our smoothed scalogram is in which γ is called the smoothing coefficient.Appendix D provides further details on the practical implementation of the bounds of integration.

Definition
We want to estimate the amplitude E τ,a = A 2 τ,a + B 2 τ,a of our model, Eq. ( 22), at a given point (τ, a) in the time-scale plane.The estimation of E 2 τ,a is called the amplitude scalogram and is denoted by E 2 τ,a .We start with a trendless signal and derive an approximate formula linking the amplitude scalogram and the scalogram.

Trendless signal
Formula (I,118) is applied with the left-hand-side term changed to encompass wavelet formalism.A and B are determined by projecting the data onto the tapered cosine and sine: where the taper G τ,a is defined in Eq. ( 25), Conversion from the angular frequency ω to the scale a is performed with the formula ω = 1/a (justification is given in Sect.3.9).Using the same development as in Sect.6.2.2 of Part 1, we obtain The amplitude scalogram is then The approximations made in Sect.6.2.2 of Part 1 are valid in this work, and applying Eq. (I,128) to our case gives an approximate formula linking the scalogram and the amplitude scalogram, namely Let us compare this equation with its continuous counterpart, Eq. ( 21), in which the weight must be γ (a) ∼ 1/a to get an estimation of the local squared amplitude, as explained in Sect. .The comparison is made by analysing the weight of the right-hand-side term of Eq. (37) in the continuous limit: where t is the average time step.This is proportional to 1/a and it is therefore consistent with the continuous case.

Signal with a trend
Formula (I,129) is applied with the left-hand-side term changed to encompass wavelet formalism: where and Conversion from the angular frequency ω to the scale a is performed with the formula ω = 1/a (justification is given in Sect.3.9).Using the same development as in Sect.6.3 of Part 1, we obtain where W τ,a m+3 is identical to V a m+3 except in the last two columns, where the cosine and sine are tapered by G τ,a .This gives where τ,a (m + 2) and τ,a (m + 3) are the two last components of vector | τ,a .The amplitude scalogram is then

With smoothing
Like in Part 1, estimating the amplitude is more robust against noise when a smoothing procedure is performed.We apply to the squared amplitude, Eq. ( 44), the same kind of smoothing as for the scalogram (see Eq. 32), giving Appendix D provides further details on the practical implementation of the bounds of integration.

The weighted smoothed scalogram
The weighted smoothed scalogram is the analogue of the weighted WOSA periodogram, defined in Sect.7 of Part 1, and its objectives are the same, i.e. to keep the advantages of both the amplitude scalogram and the scalogram, namely, provide an estimation of the squared amplitude of a signal, locally in the time-frequency plane, by weighting the scalogram like in Eq. (37); conserve the advantage of the formalism of orthogonal projections in order to avoid the matrix inversions required for the computation of the amplitude scalogram (see, for example, Eq. 45, relying on Eq. 42 which requires a matrix inversion).
The last item is useful for building confidence levels when performing a test of significance (see Sect. 4).The disadvantage of weighting the smoothed scalogram is that it no longer provides a flat pseudo-wavelet power spectrum for a white noise signal (see Sect. 4.2), analogously to its frequency counterpart (see Sect. 7 of Part 1).The weighted smoothed scalogram is derived from Eq. ( 32), in which the integrand is weighted by the right-hand-side weight of Eq. ( 37), namely, Appendix D provides further details on the practical implementation of the bounds of integration.We recommend the use of the weighted smoothed scalogram in most timefrequency analyses under irregular sampling.

Cone of influence
When the wave packets |G τ,a c a and |G τ,a s a intersect the borders of the time series, a part of their support can stand after the last point of the time series, or before the first point of the time series.Consequently, one has to remove two halfcones from the area under analysis.From Eq. ( 25), the support of the wave packets is approximately equal to 2βω 0 a, so that the excluded areas are given by {τ, a} such that |τ − t 1 | ≤ βω 0 a and |τ − t N | ≤ βω 0 a, (47) Torrence and Compo, 1998).We recommend the conservative choice.When smoothing is performed, Eq. ( 47) becomes {τ, a} such that (48) where γ controls the smoothing length; see Eqs. ( 32), ( 45) and ( 46).This has another implication: the maximal scale available by the analysis is

Aliasing and Shannon-Nyquist exclusion zone (SNEZ)
When probing the irregularly sampled time series with the wavelet packet, it may happen that the period of the oscillation inside the packet, 2π a, is too low compared to the local time step in the time series, therefore causing aliasing issues according to the Shannon-Nyquist theorem, locally in the time-scale plane.As stated in Part 1, this issue also happens with the WOSA periodogram.We adapt formulas (I,72), (I,73) and (I,74) and define the local time step as where and H τ,a is a diagonal matrix with We then apply the Shannon-Nyquist theorem to this local time step, namely: where We call Shannon-Nyquist exclusion zone, the area in the scalogram that does not satisfy Eq. ( 54) and which is therefore delimited by a SNEZ .Note that matrix H τ,a is similar to matrix G τ,a , defined in Eq. ( 25), but with elements taken at (t k +t k+1 )/2 instead of t k .Quantity t τ,a is equal to the maximum between the average weighted time step and the average weighted central time step.We now justify Formula (50) with an example.Consider the function X(t) = sin(2π t/0.01), sampled on an irregular grid.This is drawn in Fig. 2a.The time step is represented in Fig. 2b.These two figures show that the time series exhibits intervals where it is more or less regularly sampled, separated by large gaps.The weighted (unsmoothed) scalogram is drawn in Fig. 3a.We remind the reader that the weighted scalogram is supposed to estimate the local squared amplitude in the time-frequency plane.Since X(t) has an amplitude equal to 1, we expect that the maximal power of the scalogram be equal to 1, along a scale corresponding to the period of x, for all τ .Because of the large gaps in the time series, extended regions corrupted by aliasing occur in Fig. 3a, resulting in a maximal power for the scalogram which is much greater than 1. Figure 3b, c and d present the weighted scalogram corrected by the SNEZ.In Fig. 3b the SNEZ is computed with t τ,a = t G τ,a .We observe that it does a good job of rejecting the areas where aliasing occur, although it is desirable that the black areas peak on higher scales.In Fig. 3c, the SNEZ is computed with t τ,a = t H τ,a .We observe that most of the aliasing-related areas are rejected, although we would rather the black areas be wider.Finally, the SNEZ computed with t τ,a = max{ t G τ,a , t H τ,a } is drawn in Fig. 3d and we observe that it does a very satisfactory job at rejecting the areas where aliasing occur.
The SNEZ is applied to all the analysis tools defined above.When smoothing is to be applied, it is performed on the areas outside of the SNEZ, since the scalogram is not computed in the SNEZ.In the neighbourhood of the SNEZ, adjustments of the smoothing procedure are therefore necessary, as explained in Appendix D.
G. Lenoir and M. Crucifix: A general theory on time-frequency analysis of irregularly sampled time series outside of the SNEZ, since the scalogram is not computed in the SNEZ.In the neighborhood of the SNEZ, adjustments of the smoothing procedure are therefore necessary, as explained in appendix D.  3.9 From the scale to the period Scale-to-period conversion is performed in the continuous limit, with Eq. ( 18).The first case of Eq. ( 18), with c(a) ∼ 1/a, corresponds to estimators of the amplitude, and is then used for scale-to-period conversion with the amplitude scalogram (all the formulas of Sect.3.5) and for the weighted smoothed scalogram, Eq. ( 46).The second case of Eq. ( 18), with c(a) ∼ 1/ √ a, is used for scale-to-period conversion with the unweighted scalogram, that is, the formulas appearing in Sect.3.2, 3.3 and 3.4.

Refining the Shannon-Nyquist exclusion zone
As illustrated in Fig. 4, the Shannon-Nyquist exclusion zone may not to be sufficient to avoid all the patches due to aliasing, because of the correlations between neighbouring scales in the scalogram.We therefore extend the Shannon-Nyquist exclusion zone by considering the continuous limit case for the simple periodic signal x(t) = exp(i2π t/T SNEZ (τ )), where T SNEZ (τ ) is the period at the border of the SNEZ, determined by Eqs. ( 55) and ( 18).Its scalogram is given in Eq. ( 17).In order to make the correspondence with all the above formulas, three cases are considered.

c(a
in which the unknown is a.The border of the extended Shannon-Nyquist exclusion zone at time τ is therefore on scale a = a SNEZ (τ ) + βσ a,3 (τ ).
Case 1 is used with formulas giving the squared amplitude E 2 τ,a in Sect.3.5 and with the weighted smoothed scalogram, Eq. ( 46).The (unsquared) amplitude E τ,a can also be of interest, and case 2 is therefore used.Case 3 is used with formulas arising in Sect.3.2, 3.3 and 3.4.Finally, note that the refinement of the SNEZ is performed after the smoothing procedure, because an extension of the SNEZ may result from the smoothing, as explained in Appendix D. 3.9 From the scale to the period Scale to period conversion is performed in the continuous limit, with Eq. ( 18).The first case of Eq. ( 18), with c(a) ⇠ 1/a, corresponds to estimators of the amplitude, and is then used for scale to period conversion with the amplitude scalogram (all the formulas of Sect.3.5) and for the weighted smoothed scalogram, Eq. ( 46).The second case of Eq. ( 18), with c(a) ⇠ 1/ p a, is used for scale to period conversion with the unweighted scalogram, that is the formulas appearing in Sect.3.2, 3.3 and 3.4.

Discretising τ and a
With regularly sampled data, the discretised variable τ is usually equal to |t (like in Torrence and Compo, 1998), or a subset of |t with regularly spaced elements.For irregularly spaced time series, we opt for the same type of grid as in the regularly sampled case, i.e. a linear regular grid, namely The scales are commonly discretised as fractional powers of two (Torrence and Compo, 1998), namely a j = a min 2 j δj , j = 0, . .., J, where Here, a min is the minimum over τ of a SNEZ (defined in Eq. 55), and a max is defined in Eq. ( 49).Discretisation as a power law comes from the geometry of the wavelet transform, and is justified in Appendix C. The integrals in Eqs. ( 32), ( 45) and ( 46) are discretised with the rectangle method.In particular, the discretisation of the integrals in Eqs. ( 32) and ( 46) allows these formulas to be written as finite-size matrices.To this end, we apply a Gram-Schmidt orthonormalisation to the orthogonal projec-

Discretizing ⌧ and a
With regularly sampled data, the discretized variable ⌧ is usually equal to |ti (like in Torrence and Compo, 1998), or a subset of |ti with regularly spaced elements.For irregularly spaced time series, we opt for the same type of grid as in the regularly tions, like in Eq. (I,67).This gives which is the analogue of Eq. (I,68).M τ,a is a matrix of size (N, 2n col (τ, a)), n col (τ, a) ≥ 1, where n col is a non-trivial function depending on the scale and on the closeness of (τ, a) with the SNEZ and with edges of the time-frequency plane.
4 Significance testing with the scalogram

Hypothesis testing
We test for the presence of periodic components, locally in the time-frequency plane.Significance testing is mathematically expressed as a hypothesis testing.Taking our model, Eq. ( 22), the null hypothesis is Therefore, |X = |Trend + |Noise .The alternative hypothesis is H 1 : A τ,a and B τ,a are not both zero. (61) The decision of accepting or rejecting the null hypothesis is based on the scalogram (Eq.46), independently for each couple (τ, a) (this is called pointwise testing).Concretely, for each couple (τ, a), we compute the distribution of the scalogram under the null hypothesis, and then see if the data scalogram at (τ, a) is above or below a given percentile of that distribution.The percentile is called level of confidence.
If the data scalogram is above the Xth percentile of the reference distribution, we reject the null hypothesis with X % of confidence.The level of significance is equal to (100−X) %, e.g. a 95 % confidence level is equivalent to a 5 % significance level.
To perform significance testing, we thus need 1. to estimate the parameters of the process under the null hypothesis (this is studied in Sect.5.2 of Part 1); 2. to estimate the distribution of the scalogram under the null hypothesis (this is studied in Sect.4.2 below).
Finally, we mention that, for regularly sampled time series, the pointwise significance test can be supplemented with an area-wise significance test, which takes into account the correlations between neighbouring points in the time-frequency plane.This is introduced in Maraun and Kurths (2004) and studied in detail in Maraun et al. (2007).Applying this method to irregularly sampled series is way beyond the scope of this work, since the correlations between neighbouring points in the time-frequency plane are highly irregular.

Estimation of the distribution of the scalogram under the null hypothesis
The results obtained for the periodogram in Sect.5.3 of Part 1 are valid for the scalogram, with minor changes that we detail below.
1. Monte Carlo approach: the same procedure as in Part 1 is applied to the (weighted) smoothed scalogram, Eq. ( 32) or ( 46).We can thus estimate the confidence levels for the (weighted) smoothed scalogram taking into account the uncertainty in the parameters of the background noise.
2. Analytical approach (with a unique set of CARMA parameters): -Theorem 1 of Part 1 can be applied to the (weighted) smoothed scalogram, as follows.
-The pseudo-wavelet power spectrum, WPS, is the analogue of the pseudo-spectrum defined in Eq. (I,92).It is defined as the expected value of the (weighted) smoothed scalogram distribution, namely -For a Gaussian white noise background with variance σ 2 , the unweighted pseudo-wavelet power spectrum is flat, and is equal to 2σ 2 , for all (τ, a).Moreover, if the scalogram is not smoothed, it is exactly chi-square distributed with 2 degrees of freedom: where |Z is a standard Gaussian white noise.
-The variance of the distribution of the (weighted) -We approximate the linear combination of the independent chi-square distributions, appearing in Eq. ( 62), by a gamma-polynomial distribution conserving its first d moments, based on the theory developed in Provost et al. (2009).The formulas are given in Sect.5.3.3 of Part 1.We observe, however, that the convergence of the percentiles (as the number of conserved moments grows) strongly depends on the smoothing coefficient γ , defined in Eqs. ( 32) and ( 46).As a general rule, the larger γ is, the faster the convergence.Moreover, it turns out that the gamma-polynomial G. Lenoir and M. Crucifix: A general theory on time-frequency analysis of irregularly sampled time series where the discretized scale is defined in Eq. ( 57).Such a filtering procedure is a generalization of the scale-averaged wavelet power of Torrence and Compo (1998) which deals with trendless regularly sampled signals.Note that we use the formulas for which there is no smoothing.Indeed, the smoothing procedure in Eq. ( 45) does not give access to b A ⌧,a and b B ⌧,a (only the sum of their squared values is available).An example of band filtering is shown in Fig. 9 and 10. approximation becomes numerically unstable at large numbers of conserved moments, because the matrix in Eq. (I,100) becomes singular.Consequently, for relatively small values of γ , convergence cannot be numerically guaranteed.This is illustrated in Fig. 5.In such cases, a simple 2moment approximation is therefore a reasonable choice since it is always numerically stable, it is much quicker than with higher numbers of conserved moments from a computational point of view, and it provides a satisfactory approximation.

A comparison between the computing times of the
Monte Carlo approach and the analytical approach is presented in Appendix G.
5 Filtering with the amplitude scalogram

Band filtering
From Sect.3.5.3,Eq. ( 43) gives A τ,a and B τ,a .We can therefore reconstruct the signal A τ,a |c a + B τ,a |s a over the whole time-scale plane, i.e. for all (τ, a).Band filtering is performed by averaging the reconstructed signal between scales a j min and a j max , namely A τ,a j |c a j + B τ,a j |s a j , (65) where the discretised scale is defined in Eq. ( 57).Such a filtering procedure is a generalisation of the scale-averaged wavelet power of Torrence and Compo (1998) which deals with trendless regularly sampled signals.Note that we use the formulas for which there is no smoothing.Indeed, the smoothing procedure in Eq. ( 45) does not give access to A τ,a and B τ,a (only the sum of their squared values is available).An example of band filtering is shown in Figs. 9 and 10.

Ridge filtering
Consider a signal |X = E cos(ω|t + φ).We can easily reconstruct the signal from the estimated amplitudes A τ,a and B τ,a , given by Eq. ( 43), taken at the maximum of the scalogram, in this case at a = 1/ω.More generally, we can reconstruct more complex signals relying on the theory of the amplitude ridges, developed for continuous-time signals (Sect.2.7) and which can approximately be applied to irregularly sampled time series.An example of ridge filtering is shown in Figs. 9 and 10.
Nonlin.Processes Geophys., 25, 175-200, 2018 www.nonlin-processes-geophys.net/25/175/2018/ G. Lenoir and M. Crucifix: A general theory on time-frequency analysis of irregularly sampled time series 189 method has the advantage of representing the signal for which the amplitude scalogram is locally maximal, and also allows to reconstruct the time-varying amplitude of the filtered signal (in red on Fig. 10b).The drawback is that it rarely delivers a continuous reconstruction with climate data.7 Application on palaeoceanographic data

Preliminary analysis
The time series we use to illustrate the theoretical results is the benthic foraminiferal δ 18 O record from Jian et al. (2003), which holds 608 data points with distinct ages and covers the last 6 million years.The choice of a CARMA(1,0) process as the background stationary noise, as well as the choice of m = 7 for the degree of the polynomial trend, are justified in Sect.9 of Part 1, in which the same data set is used as an example of frequency analysis.The time series, its trend and its time step are drawn in Fig. 6.We remind the reader that the time series is not detrended before computing the scalogram of the data, but it is detrended before estimating the confidence levels.

Time-frequency analysis
The weighted smoothed scalogram (Sect.3.6) and its 95 % analytical and MCMC confidence levels presented in Fig. 7 with parameter ω 0 = 5.5, and in Fig. 8 with parameter ω 0 = 15.As explained in Sect.2, increasing ω 0 results in a better frequency resolution and a worse time resolution.
In our example, the scalogram with ω 0 = 15 exhibits more clearly the period band around 40 kyr and the changes in amplitude along that band.The form of the SNEZ, which is the black region at the bottom in Figs.7 and 8, follows from the sampling of the time series presented in Fig. 6b.The parameters are γ = 0.5 (smoothing coefficient), 2 is the number of conserved moments in the gamma-polynomial approximation (see the discussion in Sect.4.2), a fixedlength smoothing per scale (see Appendix D), β = 3 (halfsupport of a standard Gaussian function exp(−x 2 /2)) and δj = 0.05 (coefficient for the scale resolution).

Filtering
As explained in Sect.5, band and ridge filtering are performed on the unsmoothed amplitude scalogram.This is illustrated in Fig. 9, with a filtering band in the interval [35,45] kyr and with the ridges.From the whole set of the ridges, we select those in the band [35,45] kyr in order to make a comparison with the band filtering.Band and ridge filtered signals are shown in Fig. 10.We can see that the amplitude modulations in Fig. 10a and b are consistent.Compared to band filtering, the ridge filtering method has the advantage of representing the signal for which the amplitude scalogram is locally maximal, and also allows the time-varying amplitude of the filtered signal to be reconstructed (in red in Fig. 10b).The drawback is that it rarely delivers a continuous reconstruction with climate data.and 95 % MCMC confidence levels (magenta), against a red noise background, are also drawn.Note that the green and magenta contours are almost superposed.The two lateral shaded areas are the half-cones of influence, the bottom black area is the SNEZ, and the shaded area above the SNEZ is the refinement of the SNEZ.There are also two lateral black areas, where the scalogram is not computed, because of the fixed-length smoothing per scale.The bounds of the colour scale are the extrema of the scalogram over the non-shaded area.As we work with the weighted scalogram, the power is an estimation of the local squared amplitude.Dashed lines at usual palaeoclimate periods are also drawn.

WAVEPAL Python package
WAVEPAL is a package, written in Python 2.X, that performs frequency and time-frequency analyses of irregularly sampled time series, significance testing against a stationary Gaussian CARMA(p, q) process, and filtering.Frequency analysis is based on the theory developed in Part 1, and timefrequency analysis relies on the theory developed in this article.It is available at https://github.com/guillaumelenoir/WAVEPAL.

WAVEPAL Python package
WAVEPAL is a package, written in Python 2.X, that performs frequency and time-frequency analyses of irregularly sampled time series, significance testing against a stationary Gaussian CARMA(p,q) process, and filtering.Frequency analysis is based on the theory developed in paper I, and time-frequency analysis relies on the theory developed in this article.It is available at https://github.com/guillaumelenoir/WAVEPAL.

9 Conclusions
We defined the scalogram as an extension of the generalized Lomb-Scargle periodgram developed in paper I.This analysis tool is well-suited for irregularly sampled time series which can be modeled as a locally periodic component in the time-frequency plane, plus a polynomial trend, plus a Gaussian CARMA stochastic process.In the particular case of trendless regularly sampled times series, we shown that the unsmoothed scalogram gives the same results as with the traditional algorithms 10 such as in Torrence and Compo (1998).A smoothing procedure, by averaging over neighboring points in time, was then

Conclusions
We defined the scalogram as an extension of the generalised Lomb-Scargle periodogram developed in Part 1.This analysis tool is well suited for irregularly sampled time series which can be modelled as a locally periodic component in the time-frequency plane, plus a polynomial trend, plus a Gaussian CARMA stochastic process.In the particular case of trendless regularly sampled times series, we showed that the unsmoothed scalogram gives the same results as with the traditional algorithms such as in Torrence and Compo (1998).A smoothing procedure, by averaging over neighbouring points in time, was then applied to the scalogram in order to reduce its variance.Besides, we derived estimators of the am-plitude of the locally periodic component, based on the general results of Part 1, and proposed an approximate formula linking the scalogram and the squared amplitude.The latter result is at the basis of the weighted smoothed scalogram, which is the analysis tool that we recommend for most timefrequency analyses.We then showed that local aliasing issues may occur in the analysis tools previously derived, implying the delimitation of a forbidden area for the analyses, called the Shannon-Nyquist exclusion zone.Moreover, a test of significance for the scalogram was designed, similarly to its counterpart for the frequency analysis developed in Part 1. Finally, the classical filtering procedures, namely band and ridge filtering, were made available for use with our operator of the estimated amplitude.

C2 Scale discretisation
Scale discretisation is naturally based on the geometry of the boxes.We can, for example, require that the frequency component of the centre of mass of the box corresponding to scale a j be at the frequency of the border of the box corresponding to scale a j +1 .This is illustrated in Fig. C2.We obtain 1 where β is defined in Fig. C1, giving and by recurrence, Multiplying β by a positive factor, γ , allows to control the density of the discretised scales.With variable change δj = log 2 [(2 + βγ )/2], we obtain Association of Variable Star Observers (AAVSO); see https: //www.aavso.org/sites/default/files/software/wwz.tar.gz.

E2.2 Foster's approximation and weighted inner products
Let us start with the approximation made in Foster (1996b) and used in Foster (1996c).Define U as equal to a full rank real matrix, whose columns are the vectors generating the vector space on which we project the data vector |X , the latter belonging to R N .Define G as equal to a real diagonal square matrix of size N with positive elements.Foster's approximation (Foster, 1996b, Eq. 7.9) writes12 Note that, when U is a 2-column matrix holding a cosine vector and a sine vector, the above approximation can also be obtained from Eq. (I,127).The orthogonal projection on the span of GU thus becomes and, for any pair of vectors |Y and |W in R N , we have where N eff = tr(G) 2 tr(G 2 ) is defined in Foster (1996c, Eq. 7.7) and called the effective number of data points.We can actually rewrite the right-hand side of Eq. (E3) as N eff Y |P sp{U} |W Weighted , where the weighted inner product is defined by: for any |Y and |W in R N .The term •| • Weighted satisfies the requirements of an inner product since the elements of G are positive (see Brockwell and Davis, 1991, p. 43).Foster's theory is developed in a vector space provided with this weighted inner product.

E2.3 WWT
Now, we derive Foster's scalogram from our theory.The diagonal elements of the weight matrix G are as follows (Foster, 1996c, Eq. 5-3): Correspondence with our weight matrix, defined in Eq. ( 25), is performed with the variable changes ω = 1/a and c = 1/2ω 2 0 .Next, consider the formula of the unsmoothed scalogram, Eq. ( 31), with a = 1/ω, and transformed to accommodate for a trend given by |G τ,ω t 0 .This results in We then make use of the approximation of Eq. ( E2) with U = [|t 0 |c ω |s ω ] for the first projection and U = |t 0 for the second projection, resulting in the following formula: for which we now work in a vector space provided with the weighted inner product.If |X is a zero-mean Gaussian white noise, Formula (E6) follows exactly a chi-square distribution with 2 degrees of freedom multiplied by the variance of the white noise, namely σ 2 χ 2 2 .Consequently, under the null hypothesis that the process is a Gaussian white noise, the following expression, WWT = (E8) approximately follows a chi-square distribution with 2 degrees of freedom and expected value 1. Formula (E8) is rigorously the same13 as the weighted wavelet transform (WWT) of Foster (1996c), in which the author estimates σ 2 as Significance testing against a Gaussian white noise can be therefore be performed with the WWT.Below, we comment on the WWT and make a comparison with our formulas.
-The WWT is built on the assumption that the time series holds a Gaussian-shaped trend centred at the probed translation time τ , the support of which varies with the probed frequency.This is equivalent to a constant trend in the vector space provided with the weighted inner product.This contrasts with our choice for the trend, Eq. ( 23), which is independent of the analysis function.
-The WWT under the null hypothesis is only approximately chi-square distributed, compared to Formula (E6) which is exactly chi-squared-distributed.
G. Lenoir and M. Crucifix: A general theory on time-frequency analysis of irregularly sampled time series -The estimation of the variance of the white noise, σ 2 , which is part of the WWT formula, depends on the sampling.However, two realisations of a white noise are uncorrelated regardless of the time step separating them, and the estimation of its variance should thus be independent of the sampling, like in Sect.5.2.2 of Part 1.
-To our point of view, working with weighted inner products, approximations like in Eq. (E1) and complicated tensor notations (see Foster, 1996b) does not bring a simple and unified view of the problematic.

E2.4 WWA
The weighted wavelet amplitude (WWA), defined in Foster (1996c, Eq. 5-14), is similar to our amplitude scalogram defined in Eq. ( 44).The former is obtained from the latter taking the trend to be |G τ,ω t 0 , where G τ,ω is defined in Sect.E2.3.For practical applications, we note that computing the inverse of a matrix is needed for the computation of the WWA (this is also the case for our amplitude scalogram).But Foster's theory lacks an in-depth consideration of aliasing issues, and the WWA at some points of the time-frequency plane may be numerically infinite due to the occurrence of singular matrices caused by aliasing.

E2.5 WWZ
Under the null hypothesis that the data |X is a Gaussian white noise, its squared norm in the vector space provided with the weighted inner product is approximately chi-square distributed with N eff degrees of freedom, as this follows from the 2-moment approximation of Sect.5.3.3 of Part 1, in which formula (I,98) is applied to matrix G. Consequently, under the null hypothesis, the following formula, is approximately equal to the Fisher-Snedecor distribution with 2 and N eff − 3 degrees of freedom.Formula (E10) is defined in Foster (1996c, Eq. 5-12) and called the weighted wavelet Z-transform (WWZ).It generalises the F periodogram that we defined in Sect.5.4 of Part 1.
E3 Mathias et al. (2004)'s theory Mathias et al. (2004) present a formula similar to our amplitude scalogram in the case of a trendless signal.The difference with our Eq.( 35) is that they work with a complex exponential, exp(iω|t ), instead of sine and cosine.Switching these terms in our Eq.( 35) and taking a = 1/ω gives the Eq. ( 17) of Mathias et al. (2004).The authors then approximate the Gaussian shape of the Morlet wavelet by a function with a finite support.Based on that approximation, they develop a fast algorithm for the computation of the scalogram.
Apart from this advantage, this study is in fact more restrictive than Foster's theory, since it does not perform significance testing, a zero trend is assumed, no smoothing procedures are considered, and it does not tackle the problem of aliasing issues explained in Sect.3.8 and 3.10.

Appendix F: Warning about interpolating the time series
This appendix compares the scalograms and their confidence levels in the case of interpolated and non-interpolated time series.The time series we consider is the δ 18 O signal from the GISP2 ice core (Grootes and Stuiver, 1997), for which the first 11 kyr are removed in order to facilitate the detrending procedure.The time series is drawn in Fig. F1a and b, and its time step is given in Fig. F2.The interpolated time series is built on a time grid with t = 30 yr (this is the smallest time step of the raw time series) in Fig. F1a, or t = 300 yr in Fig. F1b.The (unsmoothed) scalograms with ω 0 = 15 of the raw and interpolated time series are shown in Fig. F3, jointly with the 95 % analytical confidence levels against a red noise.We observe that significance testing is strongly dependent on the interpolation procedure.This is because the parameters of the red noise are badly estimated when the time series is interpolated.Consequently, in general, we cannot rely on interpolated time series to perform significance testing.In particular, we draw the attention on the geological stacks (such as in Lisiecki andRaymo, 2005, or Huybers, 2007), which are composed of multiple interpolated time series and averaged together.Significance testing or analysis of the background noise for such time series may therefore be strongly biased.
Finally, we observe that, in this example, the power of the scalogram of the data is weakly affected by the interpolation.

Appendix G: Computing time: analytical versus Monte Carlo significance levels
A comparison between the computing times, for generating the scalogram, with the analytical and with the MCMC confidence levels, based on the hypothesis of a red noise background, is presented in Fig. G1.The computing times are expressed in function of the number of data points, which are disposed on a regular time grid in order to make a meaningful comparison.Confidence levels with the analytical approach are estimated with a 2-moment approximation.The number of samples for the MCMC approach is 10 000 for the 95th percentiles and 100 000 for the 99th percentiles.The smoothing coefficient is γ = 0.5, and the other parameters are default parameters of WAVEPAL.All the runs were performed on the same computer 14 .
Nonlin.Processes Geophys., 25,2018 www.nonlin-processes-geophys.net/25/175/2018/ G. Lenoir and M. Crucifix: A general theory on time-frequency analysis of irregularly sampled time series 197 2005) or Huybers (2007), which are composed of multiple interpolated time series and averaged together.Significance testing or analysis of the background noise for such time series may therefore be strongly biased.
Finally, we observe that, in this example, the power of the scalogram of the data is weakly affected by the interpolation.With this parametrisation, and within this interval of the number of data points, we see that the analytical approach is faster than the MCMC approach.The analytical approach delivers computing times of the same order of magnitude regardless of the percentile (the two blue curves in Fig. G1a and b are of the same order of magnitude), unlike the MCMC approach, which must require more samples as the level of confidence increases in order to keep a sufficient accuracy.The difference between both computing times therefore increases as the level of confidence increases.Note, however, that the 2-moment approximation, for the estimation of the analytical confidence levels, is very fast from a computational point of view.Increasing the number of conserved moments may considerably increase the computational cost associated with the analytical approach.But this configuration is rarely used in practice because it often results in numerical instabilities and badly estimated percentiles, as explained in Sect.4.2.
With this parametrization, and within this interval of the number of data points, we see that the analytical approach is faster than the MCMC approach.The analytical approach delivers computing times of the same order of magnitude whatever is the percentile (the two blue curves in Fig. G1a and G1b are in the same order of magnitude), unlike the MCMC approach, which must require more samples as the level of confidence increases, in order to keep a sufficient accuracy.The difference 5 between both computing times therefore increases as the level of confidence increases.Note, however, that the 2-moment approximation, for the estimation of the analytical confidence levels, is very fast from a computational point of view.Increasing the number of conserved moments may considerably increase the computational cost associated to the analytical approach.But this configuration is rarely used in practice because it often results in numerical instabilities and badly estimated percentiles, as explained in Sect.4.2.Competing interests.The authors declare that they have no conflict of interest.
No. of data points No. of data points

Figure 1 .
Figure 1.(a) The regularly sampled (RS) time series, with t = 1 ka, which represents the caloric summer insolation at 65 • N to which is added a realisation of a Gaussian red noise process with α = 0.1 and σ equal to half of the standard deviation of the original time series (these parameters are defined in Sect.3.2.3 of Part 1).The red dots are obtained from randomly removing 75 % of the data points of the RS time series, resulting in an irregularly sampled (IS) time series with 500 data points.Panels (b), (c) and (d) compare of the scalograms with ω 0 = 10, jointly with their 95 % analytical confidence levels against a red noise background.(b) Scalogram of the RS time series computed with the classical approach.(c) Scalogram of the RS time series computed with WAVEPAL.(d) Scalogram of the IS time series computed with WAVEPAL.The black zone, called the Shannon-Nyquist exclusion zone (SNEZ) and defined in Sect.3.8, is the area where the sampling is not sufficient to probe the lowest periods.In panels (b), (c) and (d), the two lateral shaded areas are the half-cones of influence (see Sect. 3.7), and the bottom shaded area is the refinement of the SNEZ (defined in Sect.3.10).The bounds of the colour scale in the panels (b), (c) and (d) are the extrema of the scalogram in panel (c) over the non-shaded area in order to make a meaningful visual comparison.Technical details about the computation of the scalogram and its confidence levels are given in Sects.3 and 4.

Figure 2 .
Figure 2. (a) The time series |Xi = sin(2⇡|ti/0.01),and (b) its time step, with the vertical axis in log-scale.The time vector |ti is taken from a real paleoclimate time series (Giosan, 2017).

16Figure 2
Figure 2. (a) The time series |X = sin(2π|t /0.01) and (b) its time step, with the vertical axis in log-scale.The time vector |t is taken from a real palaeoclimate time series (Liviu Giosan, WHOI, personal communication, 2017).

Figure 4 .
Figure 4. Scalogram of the time series |Xi = sin(2⇡|ti/10), where |ti has a piecewise constant time step.From t = 0 to t = 200, t = 4. From t = 200 to t = 400, t = 3.From t = 400 to t = 600, t = 2. (a) Weighted (unsmoothed) scalogram.The black area is the SNEZ.(b) Same as (a) with the addition of the refinement of the SNEZ, which is the shaded area on the top of the SNEZ.(c) Amplitude scalogram (unsmoothed).The black area is the SNEZ.(d) Same as (c) with the addition of refinement of the SNEZ, which is the shaded area on the top of the SNEZ.In the 4 panels, the bounds of the color scale are the extrema of the scalogram over the non-shaded area.Thanks to the refinement of the SNEZ, the upper bound of the color scale is close to 1, which is the value of the squared amplitude of the signal |Xi.

Figure 4 .
Figure 4. Scalogram of the time series |X = sin(2π|t /10), where |t has a piecewise constant time step.From t = 0 to t = 200, t = 4. From t = 200 to t = 400, t = 3.From t = 400 to t = 600, t = 2. (a) Weighted (unsmoothed) scalogram.The black area is the SNEZ.(b) Same as panel (a) with the addition of the refinement of the SNEZ, which is the shaded area on the top of the SNEZ.(c) Amplitude scalogram (unsmoothed).The black area is the SNEZ.(d) Same as panel (c) with the addition of refinement of the SNEZ, which is the shaded area on the top of the SNEZ.In the four panels, the bounds of the colour scale are the extrema of the scalogram over the non-shaded area.Thanks to the refinement of the SNEZ, the upper bound of the colour scale is close to 1, which is the value of the squared amplitude of the signal |X .

Figure 5 .
Figure 5. Analytical confidence levels in function of the number of conserved moments, at six particular couples (⌧, a), for the scalogram of the time series presented in Sect.7. Parameter is equal to 0.5.(a) 95 th percentile.(b) 99.9 th percentile.Slow convergence as well as numerical instabilities (spurious peaks) at high numbers of conserved moments are observed.Convergence cannot therefore be numerically guaranteed.

Figure 5 .
Figure 5. Analytical confidence levels in function of the number of conserved moments, at six particular couples (τ, a), for the scalogram of the time series presented in Sect.7. Parameter γ is equal to 0.5.(a) 95th percentile.(b) 99.9th percentile.Slow convergence as well as numerical instabilities (spurious peaks) at high numbers of conserved moments are observed.Convergence cannot therefore be numerically guaranteed.

Figure 7 .
Figure7.Weighted smoothed scalogram (left) and its global scalogram (right) with ω 0 = 5.5.The 95 % analytical confidence levels (green) and 95 % MCMC confidence levels (magenta), against a red noise background, are also drawn.Note that the green and magenta contours are almost superposed.The two lateral shaded areas are the half-cones of influence, the bottom black area is the SNEZ, and the shaded area above the SNEZ is the refinement of the SNEZ.There are also two lateral black areas, where the scalogram is not computed, because of the fixed-length smoothing per scale.The bounds of the colour scale are the extrema of the scalogram over the non-shaded area.As we work with the weighted scalogram, the power is an estimation of the local squared amplitude.Dashed lines at usual palaeoclimate periods are also drawn.

Figure 8 .
Figure 8. Weighted smoothed scalogram (left) and its global scalogram (right) with ω 0 = 15.The 95 % analytical confidence levels (green) and 95 % MCMC confidence levels (magenta), against a red noise background, are also drawn.Note that the green and magenta contours are almost superposed.

Figure 9 .Figure 10 .
Figure 9.The unsmoothed estimated amplitude (which is the square root of the amplitude scalogram, Eq. 44), jointly with the filtering band in the interval [35, 45] kyr (shaded).Black and white curves are the ridges.They go through the local maxima of the amplitude scalogram.The white ones are the ridges in the band [35, 45] kyr.Parameters are ω 0 = 15, β = 3 and δj = 0.01.

Figure 10 .
Figure 10.Filtered signal in the band [35, 45] kyr.(a) Band filtering.(b) Ridge filtering.The red curve is the amplitude of the filtered signal, which is only available with ridge filtering.

Figure
Figure F1.18 O signal from the GISP2 ice core(Grootes and Stuiver, 1997), for which the first 11 kyr are removed.Raw (red dots) and interpolated (blue line) time series, with (a) t = 30 yr and (b) t = 300 yr.

Figure F2 .Figure F2 .
Figure F2.Time step of the 18 O signal from the GISP2 ice core.

Figure F3 .
Figure F3.Scalogram of the time series presented in Fig. F1 and the 95 % analytical confidence levels against a red noise.(a) Raw time series.(b) Interpolated time series with t = 30 yr.(c) Interpolated time series with t = 300 yr.

Figure G1 .
Figure G1.Computing times for generating the scalogram with analytical (blue) and MCMC (green) confidence levels, in function of the number of data points (disposed on a regular time grid).Log-log scale.(a) 95 th percentiles.(b) 99 th percentiles.

Figure G1 .
Figure G1.Computing times for generating the scalogram with analytical (blue) and MCMC (green) confidence levels, in function of the number of data points (disposed on a regular time grid).Log-log scale.(a) 95th percentiles.(b) 99th percentiles.