Singular spectrum analysis (SSA) is a powerful technique for
time series analysis. Based on the property that the original time series
can be reproduced from its principal components, this contribution develops
an improved SSA (ISSA) for processing the incomplete time series and the
modified SSA (SSAM) of Schoellhamer (2001) is its special case. The approach
is evaluated with the synthetic and real incomplete time series data of
suspended-sediment concentration from San Francisco Bay. The result from the
synthetic time series with missing data shows that the relative errors of
the principal components reconstructed by ISSA are much smaller than those
reconstructed by SSAM. Moreover, when the percentage of the missing data
over the whole time series reaches 60 %, the improvements of relative
errors are up to 19.64, 41.34, 23.27 and 50.30 % for the first four
principal components, respectively. Both the mean absolute error
and mean root mean squared error of the reconstructed time series by ISSA
are also smaller than those by SSAM. The respective improvements are 34.45
and 33.91 % when the missing data accounts for 60 %. The results from
real incomplete time series also show that the standard deviation (SD)
derived by ISSA is 12.27 mg L

Singular spectrum analysis (SSA) introduced by Broomhead and King (1986) for studying dynamical systems is a powerful toolkit for extracting short, noisy and chaotic signals (Vautard et al., 1992). SSA first transfers a time series into a trajectory matrix, and carries out the principal component analysis to pick out the dominant components of the trajectory matrix. Based on these dominant components, the time series is reconstructed. Therefore the reconstructed time series improves the signal-to-noise ratio and reveals the characteristics of the original time series. SSA has been widely used in geosciences to analyse a variety of time series, such as the stream flow and sea-surface temperature (Robertson and Mechoso, 1998; Kondrashov and Ghil, 2006), the seismic tomography (Oropeza and Sacchi, 2011) and the monthly gravity field (Zotova and Shum, 2010). Schoellhamer (2001) developed a modified SSA for time series with missing data (SSAM), which was successfully applied to analyse the time series of suspended-sediment concentration (SSC) in San Francisco Bay (Schoellhamer, 2002). This SSAM approach does not need to fill missing data. Instead, it computes each principal component (PC) with observed data and a scale factor related to the number of missing data. Shen et al. (2014) developed a new principal component analysis approach for extracting common mode errors from the time series with missing data of a regional station network. The other kind of SSA approach process the time series with missing data by filling the data gaps recursively or iteratively, such as the “Caterpillar” SSA method (Golyandina and Osipov, 2007), the imputation method (Rodrigues and Carvalho, 2013) or the iterative method (Kondrashov and Ghil, 2006).

This paper is motivated by Schoellhamer (2001) and Shen et al. (2014) and develops an improved SSA (ISSA) approach. In our ISSA, the lagged correlation matrix is computed in the same way as by Schoellhamer (2001) – the PCs are directly computed with both the eigenvalues and eigenvectors of the lagged correlation matrix. However, the PCs in Schoellhamer (2001) were calculated with the eigenvectors and a scale factor to compensate for the missing value. Moreover, we do not need to fill in the missing data recursively and iteratively as in Golyandina and Osipov (2007). The rest of this paper is organized as follows: the improvement of SSA for time series with missing data follows in Sect. 2, synthetic and real numerical examples are presented in Sects. 3 and 4 respectively, and then conclusions are given in Sect. 5.

For a stationary time series

The SSAM approach developed by Schoellhamer (2001) computes the elements

In order to derive the expression of computing PCs for the time series with
missing data, Eq. (3) is reformulated as

If the non-diagonal elements of

The main difference of our ISSA approach from the SSAM approach of
Schoellhamer (2001) is in calculating the PCs. We produce the PCs from
observed data with Eq. (14) according to the power spectrum (eigenvalues)
and eigenvectors of the PCs, while Schoellhamer (2001) calculates the PCs
from observed data with Eq. (6) only according to the eigenvectors and uses
the scale factor

The same synthetic time series as in Schoellhamer (2001) are used to analyse
the performance of ISSA compared to SSAM. The synthetic SSC time series is
expressed as

Periodic signal

Although the selection of window length is an important issue for SSA
(Hassani et al., 2012; Hassani and Mehmoudvand, 2013), this paper chooses the same window length (

Relative errors of first four PCs (ISSA: red line; SSAM: black line).

RMSE of 50 experiments, (1)–(6) represent percentage of missing data ranging from 10 to 60 % in 10 % increments.

We reconstruct the time series

As can be seen from the Fig. 3, the RMSEs of ISSA are much smaller than those of SSAM for the same experiment scenarios. In Table 1, we present the mean absolute reconstruction error (MARE) and mean root mean squared error (MRMSE) of 50 experiments with different percentages of missing data.

Mean absolute reconstruction error and mean root mean squared error
of simulated time series with different percentage of missing data (mg L

Mid-depth SSC time series at San Mateo Bridge during water year 1997.

Obviously, if there are no missing data, the ISSA coincides with SSAM. If the
percentage of missing data increases, both MARE and MRMSE will become
larger. In Table 1, all the MARE and MRMSE of ISSA are smaller than those of
SSAM. When the percentage of missing data reaches 50 %, the MARE and MRMSE
are 3.17 and 4.14 mg L

The mid-depth SSC time series at San Mateo Bridge is presented in Fig. 4,
which contains about 61 % missing data. This time series was reported by
Buchanan and Schoellhamer (1999) and Buchanan and Ruhl (2000), and analysed
by Schoellhamer (2001) using SSAM. We analyse this time series using our
ISSA with the window size of 30 h (

The residual time series, e.g. the differences of observed minus reconstructed data, are presented in Fig. 5. The maximum, minimum and mean absolute residuals as well as the SD are presented in Table 2. It is clear that both maximum and minimum residuals are significantly reduced by using ISSA approach. The SD of our ISSA is reduced by 8.6 %. The squared correlation coefficients between the observations and the reconstructed data from ISSA and SSAM are 0.9178 and 0.9046, respectively, which reflect that the reconstructed time series with our ISSA can indeed, to very large extent, specify the real time series.

Maximum, minimum and mean absolute residuals of SSAM and ISSA.

Residual series after removing reconstructed signals from the first 10 modes (top panel: SSAM; bottom panel: ISSA).

We have developed the ISSA approach in this paper for processing the
incomplete time series by using the principle that a time series can be
reproduced using its principal components. We prove that the SSAM developed
by Schoellhamer (2001) is a special case of our ISSA. The performances of
ISSA and SSAM are demonstrated with a synthetic time series, and the results
show that the relative errors of the first four principal components by ISSA
are significantly smaller than those by SSAM. As the fraction of missing
data increases, the improvement of the relative error becomes greater. When
the percentage of missing data reaches 60 %, the improvements of the first
four principal components are up to 19.64, 41.34, 23.27 and 50.30 %,
respectively. Moreover, when the missing data account for 60 %, the MARE
and MRMSE derived by ISSA are 3.52 and 4.60 mg L

Y. Shen proposed the improved singular spectrum analysis and F. Peng wrote the FORTRAN program and performed the simulations. Y. Shen, F. Peng and B. Li prepared the paper.

This work is sponsored by the Natural Science Foundation of China (Projects: 41274035, 41474017) and partly supported by State Key Laboratory of Geodesy and Earth's Dynamics (SKLGED2013-3-2-Z). Edited by: I. Zaliapin Reviewed by: two anonymous referees