Estimation and calibration of observation impact signals using the Lanczos method in NOAA/NCEP data assimilation system

Despite the tremendous progress that has been made in data assimilation (DA) methodology, observing systems that reduce observation errors, and model improvements that reduce background errors, the analyses produced by the best available DA systems are still different from the truth. Analysis error and error covariance are important since they describe the accuracy of the analyses, and are directly related to the future forecast errors, i.e., the forecast quality. In addition, analysis error covariance is critically important in building an efficient ensemble forecast system (EFS). Estimating analysis error covariance in an ensemble-based Kaiman filter DA is straightforward, but it is challenging in variational DA systems, which have been in operation at most NWP (Numerical Weather Prediction) centers. In this study, we use the Lanczos method in the NCEP (the National Centers for Environmental Prediction) Gridpoint Statistical Interpolation (GSI) DA system to look into other important aspects and properties of this method that were not exploited before. We apply this method to estimate the observation impact signals (OIS), which are directly related to the analysis error variances. It is found that the smallest eigenvalue of the transformed Hessian matrix converges to one as the number of minimization iterations increases. When more observations are assimilated, the convergence becomes slower and more eigenvectors are needed to retrieve the observation impacts. It is also found that the OIS over data-rich regions can be represented by the eigenvectors with dominant eigenvalues. Since only a limited number of eigenvectors can be computed due to computational expense, the OIS is severely underestimated, and the analysis error variance is consequently overestimated. It is found that the mean OIS values for temperature and wind components at typical model levels are increased by about 1.5 times when the number of eigenvectors is doubled. We have proposed four different calibration schemes to compensate for the missing trailing eigenvectors. Results show that the method with calibration for a small number of eigenvectors cannot pick up the observation impacts over the regions with fewer observations as well as a benchmark with a large number of eigenvectors, but proper calibrations do enhance and improve the impact signals over regions with more data. When compared with the observation locations, the method generally captures the OIS over regions with more observation data, including satellite data over the southern oceans. Over the tropics, some observation impacts may be missed due to the smaller background errors specified in the GSI, which is not related to the method. It is found that a large number of eigenvectors are needed to retrieve impact signals that resemble the banded structures from satellite observations, particularly over the tropics. Another benefit from the Lanczos method is that the dominant eigenvectors can be used in preconditioning the conjugate gradient algorithm in the GSI to speed up the convergence.

made in data assimilation (DA) methodology, observing systems that reduce observation errors, and model improvements that reduce background errors, the analyses produced by the best available DA systems are still different from the truth. Analysis error and error covariance are important since they describe the accuracy of the analyses, and are directly related to the future forecast errors, i.e., the forecast quality. In addition, analysis error covariance is critically important in building an efficient ensemble forecast system (EFS).
Estimating analysis error covariance in an ensemble-based Kaiman filter DA is straightforward, but it is challenging in variational DA systems, which have been in operation at most NWP (Numerical Weather Prediction) centers. In this study, we use the Lanczos method in the NCEP (the National Centers for Environmental Prediction) Gridpoint Statistical Interpolation (GSI) DA system to look into other important aspects and properties of this method that were not exploited before. We apply this method to estimate the observation impact signals (OIS), which are directly related to the analysis error variances. It is found that the smallest eigenvalue of the transformed Hessian matrix converges to one as the number of minimization iterations increases. When more observations are assimilated, the convergence becomes slower and more eigenvectors are needed to retrieve the observation impacts. It is also found that the OIS over data-rich regions can be represented by the eigenvectors with dominant eigenvalues.
Since only a limited number of eigenvectors can be computed due to computational expense, the OIS is severely underestimated, and the analysis error variance is consequently overestimated. It is found that the mean OIS values for tem-perature and wind components at typical model levels are increased by about 1.5 times when the number of eigenvectors is doubled. We have proposed four different calibration schemes to compensate for the missing trailing eigenvectors. Results show that the method with calibration for a small number of eigenvectors cannot pick up the observation impacts over the regions with fewer observations as well as a benchmark with a large number of eigenvectors, but proper calibrations do enhance and improve the impact signals over regions with more data.
When compared with the observation locations, the method generally captures the OIS over regions with more observation data, including satellite data over the southern oceans. Over the tropics, some observation impacts may be missed due to the smaller background errors specified in the GSI, which is not related to the method. It is found that a large number of eigenvectors are needed to retrieve impact signals that resemble the banded structures from satellite observations, particularly over the tropics. Another benefit from the Lanczos method is that the dominant eigenvectors can be used in preconditioning the conjugate gradient algorithm in the GSI to speed up the convergence. Tables 1 and 2 in Wei et al. (2008).

at NWP centers. A recent description and comparison about how analysis error covariance is being used in ensemble initial perturbation techniques are given in
There have been several efforts to estimate analysis error variance and covariance outside EnKF. For example, Buizza et al. (2005) suggested that the spread of initial states of three centers (NCEP, ECMWF and CMC) could be considered as a crude lower-bound estimate of the analysis error variance. Swanson and Roebber (2008) used the NCEP and ECMWF reanalysis data, and suggested that the reanalysis difference could be considered as a "shadow" of the analysis error. They found that the analysis difference contains certain aspects of the true flow-dependent analysis error and has significant impact on the short-time forecast skills in downstream regions. Similarly, Langland et al. (2008) looked at the differences between the NCEP and FNMOC analyses from 1 January to 30 June 2007. The authors found that the differences and root mean of the squared daily differences in 500 hPa temperature are closely related to the distribution of radiosonde observations. The large differences between the two analyses were found to be associated with the regions with mostly satellite observations. Park et al. (2008) studied the ensemble performance from TIGGE (the THORPEX 1 Interactive Grand Global Ensemble) data. They argued that the mean analysis from different centers will probably be the best to be used as a reference analysis in comparing the performance of an ensemble from each center. The analysis error could be estimated from the deviation between that analysis and the mean of centers. Bowler et al. (2008) also argued that the mean of analyses from multi-centers is generally better than the analysis from any one center.
By using the analysis data from NCEP, ECMWF, UKMO, CMC and FNMOC, Wei et al. (2010) introduced a new method for estimating the analysis error variance. The method computes the anomaly of each center's analysis by removing the long-term mean using a recursive filter. The spread over the average anomaly (SPA) from different centers is then computed. These authors found that the time averaged distribution of SPA is even more related to the observation network density, compared with the spread around the center mean analysis. Furthermore, the typical systematic errors that appear in the spread around the center mean over high altitude regions are completely removed. The instantaneous values of SPA at any cycle for various variables bear a strong resemblance to the elusive analysis error variance.
While analysis error covariance in an ensemble based Kaiman filter is readily available (Bishop et al., 2001;Anderson, 2001;Whitaker and Hamill, 2002;Tippett et al., 2003;Zupanski, 2005;Whitaker et al., 2007;Kalnay et al., 2007;Szunyogh et al., 2008), it is not straightforward to obtain in 3-D/4-D-Var systems, which have been in operation at most major NWP centers. In the NCEP global EFS, an ensemble transform with rescaling (ETR) has been used to generate the initial perturbations as described in detail in Wei et al. (2005Wei et al. ( , 2008. In the ETR method, the initial perturbations depend on the accurate analysis error covariance, which should be provided by the DA system in operation. At NCEP, the operational DA system is the Gridpoint Statistical Interpolation (GSI), a three-dimensional variational analysis (3-D-Var) system (Derber et al., 1991;Parrish and Derber, 1992;Wu et al., 2002;Derber et al., 2003;Kleist et al., 2009). In the variational analysis system, the analysis is found by minimizing the cost function, written in terms of the background fields, the observations, and their respective error covariance matrices. The analysis error covariance matrix in 3-D/4-D-Var is determined by the background and observation error covariance matrices, and it can not be computed directly due to its huge memory demand.
Unlike the DA systems at NCEP and ECMWF, which are formulated in the model space, the Naval Research Laboratory Atmospheric Variational Data Assimilation System (NAVDAS) system at the US Naval Research Laboratory (NRL) is formulated on the observation space ( Manikin and Pondeca, 2009). In the RTMA application, only the observations near the surface are assimilated and only some of the surface variables are estimated, such as temperature, dew point, surface humidity at 2 m, in addition to the 10-m wind, etc. Another advantage of the RTMA, in comparison to the global 3-D-Var GSI presented in this paper, is that the analysis variables in the regional GSI are directly temperature and wind components, and therefore their analysis errors can be estimated directly, whereas the global GSI analysis variables are the stream function, unbalanced velocity potential, etc., which make the estimates of variances of wind components complicated. Furthermore, the RTMA focuses on small regions; this avoids the pole problem during the transformations between different variables.
In this paper, we use the Lanczos method in the NCEP global 3-D-Var GSI DA system to study some of its aspects and properties that were not exploited in Fisher and Courtier (1995) and Pondeca and Manikin (2009). The properties in question are very important not only in understanding the method but also in practical applications. In particular, we apply this method to estimate the observation impact signals (OIS) which are the square root of the difference between the background and analysis error variances. In this method, only a small number of eigenvectors can be computed due to the computational expenses. Thus, the error reduction is severely underestimated, and the analysis error variance is overestimated. In our study, we propose and compare four different calibration schemes to compensate for the missing trailing singular vectors. Without proper calibrations, the observation impacts computed using this method may be very inaccurate. In addition, we study the sensitivity of the OIS to the number of observations employed in the GSI system. Also studied in this paper are the correlations between the observation locations and the OIS for different variables in an operational environment Section 2 provides a brief description and formulation of the analysis error variance and OIS. The dominant eigenvectors and eigenvalues of the transformed Hessian matrix are analyzed in detail in Sect. 3. Section 4 presents four different calibration schemes and their results, while the correlations between the observations and the OIS are exploited in Sect. 5. Finally, discussion and conclusions are given in Sect. 6.

Introduction of basic formulation
The NCEP GSI DA system is a unified global/regional threedimensional (1) where x = x a -Xb is the analysis increment, B is the background error covariance matrix, y 0 = y -Hxi, is the innovation vector, R is the observation error covariance matrix, H is the linearized observation operator, and Xb is the background state field. It is well known that the analysis increment x can be solved through the minimization of Eq. (1) (Daley and Barker, 2001) (2) Let A be the analysis error covariance of x a with respect to the truth. Then in the incremental 3-D-Var, A can be described as where BH r (HBH r -I-R) _1 in Eqs. (2) and (3) is commonly called the Kaiman gain matrix. Fisher and Courtier (1995) proposed three approximate methods to estimate analysis error variance in 3-D/4-D-Var framework. The most promising method among these three is the Lanczos method, which uses the linkage between the Lanczos algorithm and conjugate gradient minimization that is widely used in 3-D/4-D-Var system. The Lanczos method uses the relationship between the Hessian matrix and the analysis error covariance. Thus, a limited number of dominant eigenvectors can be estimated and used to approximate the second right term in Eq. (3). This method was implemented in the ECMWF 4-D-Var system to estimate the analysis error covariance (M. Fisher, personal communication, 2007).
The basic formulations of Lanczos algorithm is described in Appendix A. The final analysis error covariance matrix can be expressed as in Eq. (A 10). Let where A.* and e k = Q k v k are the dominant eigenvalues and eigenvectors, respectively, of the transformed Hessian matrix of the cost function in Eq. (1). It is clear that C is the analysis error covariance reduction from the background due to the observations. Since it is impossible to compute all the eigenvectors of the transformed Hessian matrix in a real application such as GSI due to computing resource limit, we can only compute a very limited number (K) of dominant eigenvalues and their corresponding eigenvectors. As a result, many less dominant eigenvectors are ignored. The error reduction of C and analysis error covariance A would be underestimated and overestimated, respectively.
In order to compensate for the loss of trailing eigenvectors, we introduce a calibration factor p(k) in this paper so that

C *£/>(*)(! -X; l )(Q n k v k KQ n k v k ) T .
(5) k=\ Eq. (5) can be applied to estimate the analysis errors for any model variables at any levels. Different calibration schemes are discussed in the following sections.

Eigenvalues and eigenvectors
The diagonal part of C in Eq. (5) represents the reduction of error variance due to observations in DA. The following experiments were carried out with GSI at T62L64 resolution. The number of dominant eigenvectors K computed depends on the number of inner loops in the GSI minimization.
The values of K that we have tested are K = 30,60,100, and 116. The observations and background fields are for the 00:00 Z cycle on 10 April 2007. The observations that we chose were based on the data used in the NCEP operational GSI. There were a total of ndat = 60 data sets, which covered conventional, aircraft, GPS observations as well as radiances from different satellites. All the operational data were used in our experiments. To study the sensitivity of error reductions to the observations, we also experimented with only conventional observations. In this case, ndat = 6. This includes surface pressure, temperature, specific humidity, winds, sea-surface temperature, and precipitable water from rawinsonde. The conventional data also contain the satellite derived winds such as those below 850 mb from satellites JMA IR (Japan Meteorological Agency Infrared) and EUMETSAT (European Organisation for the Exploitation of Meteorological Satellites). Figure 1 shows the eigenvalue distribution as a function of the eigenvalue number for different numbers of observation data sets. Shown in the left panels are the eigenvalue distributions with 6 observation data sets for K = 30,60,100, and 116, respectively. On the right hand panels, the same eigenvalue distributions are displayed for the 60 observation data sets. Since all the eigenvalues X k are larger than one and diag[(ß£v*)(ßt»*) r ] > 0.0, it follows that the diagonal elements of C are always positive, which means This indicates that the analysis error variance is always less than the background error variance due to the observations assimilated. From Eq. (5), it is easy to see that the larger the eigenvalue, the greater the observation impact (and more reduction of error variance). Thus, more dominant eigenvectors with larger eigenvalues have greater impacts than the trailing eigenvectors with smaller eigenvalues. As the trailing eigenvalue is converging to one, the contribution to error reduction from the eigenvector is approaching zero, and the impact is negligible. The smallest eigenvalues for ndat = 6 shown on the left panel of Fig. 1 for K = 30,60,100, and 116 are displayed in Fig. 2a, while the smallest eigenvalues for ndat = 60 are displayed in Fig. 2b. The minimum eigenvalues in both cases should approach one (shown in dotted lines) as K increases. It is clear that the convergence of the eigenvalue with increasing number of inner loops in the GSI is much slower in the case with more observation data than in the case with a smaller amount of observation data. Thus, one conclusion from this result is that the more observations we have, the larger the number of eigenvectors should be included in order to minimize the loss of information from those observations. When only 6 observation data sets are used, as in Fig  As an example of eigenvector structure, we plotted in Figs. 3 and 4 the top five normalized eigenvectors (ßj»t) for temperature t and zonal wind component u for K = 30,60,100, and 116 at 500 hPa. The number of data sets is ndat = 6. From left to right, Fig. 3 shows the largest eigenvectors oft with different values of K, while the top 5 eigenvectors oft with the same value of K are shown from the top to the bottom, respectively. On the top panel, the 1st eigenvector for t has a similar dipole structure over North America for K = 30 and 60, while for K = 100 and 116, their structures are also similar, but different from when K = 30 or 60. This is also true for the other dominant eigenvectors 2,3,4, and 5 shown in the following rows. When K = 30, eigenvectors 1,2, and 3 have high values over North America, but in slightly shifted positions, while eigenvectors 4 and 5 have larger values over the European region. All of these reflect the fact that there is relatively more conventional data coverage in North America and European regions. When K = 60, the top 5 eigenvectors in column 2 show similar structures as  when K = 30 in column 1. When the number of inner loops K is increased to 100 and 116 as shown in columns 4 and 5, the larger amplitudes of the top 5 eigenvectors are mostly over the North America area. This is consistent with the conventional observations at this level, which will be shown later in the paper. Figure 4 shows the same as Fig. 3, but for u. Overall, the top 5 eigenvectors of u also demonstrate the larger presence of conventional observations over North America. Like the eigenvectors of t, eigenvectors 4 and 5 show the largest amplitudes over the European region when K = 30 or 60.
Next, we look at the overall observation impacts due to observations in the GSI. To see the sensitivity of error reduction to the number of observations, we have computed the error reduction using two different numbers of observation data sets: ndat = 6 and ndat = 60. From Eq. (4) DA systems. In the NCEP GSI, they are pre-computed using the NMC (National Meteorological Center) method (Parrish and Derber, 1992). The diagonal components of C are the direct impacts from the observations assimilated by the DA. Therefore, unless we have sensible situation-dependent background error variances, it is more meaningful to look at the diagonal components of C (instead of A) in order to assess the direct impacts from the observations. In the following, we will show the square root of error variance reduction, i.e., ydiag(C) (referred to as observation impact signal or 01S in the following) for different variables at different levels. Similar quantities, such as information content, relative entropy, degrees of freedom for signal, and mutual information, for quantifying the impacts of observations in a DA system have been introduced and studied by Rodgers (2000), Xu (2007), Zupanski et al. (2007) and Fowler and van Leeuwen (2012).
First, we consider the OIS without calibration, i.e., p(k) = 1.0 in Eq. (5). From the left to the right, Fig. 5 shows the OIS for «500, '500i and 91000 (the subscripts indicate the level in hPa) with different numbers of inner loops, K = 30,60,100, or 116. The results show that the OIS for all variables increase as the number of eigenvectors increases. Rows 1 to 3 shows the OIS for «500, fsoo. and 91000 with ndat = 6, while rows 4 to 6 show the OIS for the same variables but for Observation impact signal (OIS) for U500, (500. and </iooo-F rom me ,e ft t0 the "8"' on eacn row > K = 30,60,100, and 116. Rows 1 to 3 for «500-'500-and 91000 with ndat = 6, and rows 4 to 6 forndat=60. ndat = 60. The top two panels show that as K increases, the impacts of observations for U500 and /500 increase, particularly over the conventional data-rich regions such as North America. For cases with smaller numbers of eigenvectors (K = 30,60), the observation impacts over Europe, Asia and areas near Australia are less clear. The impact signals are much more pronounced for K = 100 and 116. Panels in row 3 show the OIS for relative humidity near the surface «71000-Again, as more eigenvectors are included, stronger impact signals are observed in North America, Europe, Asia and ar-eas near Australia, where there are more observations. The observation locations will be shown in Figs. 9-12. Panels in rows 4 to 6 show the OIS values for the same variables as the top 3 rows, but with ndat = 60. The above conclusions for ndat = 6 still hold when many more observations are included in the analysis. In this case, there are a lot more observations in the Southern Hemisphere (SH) that impact the OIS for the three variables. In ocean areas in the tropics, there are many moisture observations in the lower atmosphere, thus also resulting in very strong OIS. Fig. 5 are the values of OIS for some variables at specific levels without calibration, i.e., p(k) = 1.0. As demonstrated in the above section, the OIS is underestimated due to the missing trailing eigenvectors. However, to what extent the OIS is underestimated is not clear from With the smaller number of observations (ndat = 6), Fig. 6a and b shows that the mean OIS values of u and t at all three levels for K -60,100, and 116 are about 1.5,2.0, and 2.2 times larger than the mean OIS values for K = 30, respectively. This is equivalent to about 1.5 times increase of mean OIS when K is doubled. These ratios for q are even larger, especially at the highest model level (L = 64) where values reach over 4.5 for K = 100 and 116, as shown in Fig. 6c. This means that the OIS for q is even more severely underestimated compared with the other two variables when only a limited number of eigenvectors are included.

Results shown in
When more observations are assimilated with ndat = 60, as shown in Fig. 6d, e and f, the conclusions are similar, except that the ratio for u at the top model level is much larger than for the other two levels (Fig. 6d). In addition, the ratio for q at the top model level is also much larger than at the other two levels (Fig. 6f), but not as large as in the case of ndat = 6 (Fig. 6c). All these results clearly show that the mean OIS values for all variables at these three typical model levels increase as the number of loops K increases, in these two cases with two very different observation data sets. The maximum value of K that we have used is 116, which is too small in both cases. The OIS values have not converged when K = 116, even with the smaller amount of conventional observations (ndat = 6). Since the minimization algorithm in GSI has its own built-in convergence criterion, it will stop once this criterion is satisfied. To test larger K was computationally prohibitive; it would be possible, but it is hard to bypass the GSI criterion. In addition, computational cost is always a concern in an operational system like GSI. To account for the impacts from the missing eigenvectors, calibration is inevitable.
The calibration schemes we have tested are termed C\, C2, C3, and C4, and are defined as follows:

. Four calibration factors as functions of number of eigenvalues for Cl (solid) and Cl (dashed) in (a) and Ci (solid) and C4 (dashed) in (b) for K = 116 and ndat = 60.
shows similar impact patterns and magnitudes as scheme C1. Both of these schemes can not achieve the ideal results we have hoped for. The results from scheme Ci (column 3 of Fig. 8) show somewhat larger maximum values than the benchmark for all the variables, while the magnitudes of OIS from C4 (column 4 of Fig. 8) are similar to those in the benchmark. However, like C1 and C2, both schemes Ci and C4 do not seem to pick up the impact patterns over regions in Europe, Asia and Australia as well as the benchmark. Overall, scheme Ci is probably the best among these given the fact that the benchmark with K = 116 is far from converging, as seen from Fig. 6. From the results of these calibration experiments, one can conclude that none of the schemes tested can produce OIS distribution that are as good as the ones from the benchmark which uses more eigenvectors. There are some regions where the observations are less dense. These regions will certainly need more eigenvectors to yield reasonable OIS values. Our results indicate that it is hard to use calibration factors which only increase the magnitudes of some 30 dominant eigenvectors to recover those impact signals. However, two of die calibration schemes do enhance the OIS magnitudes in the regions with dense conventional data network coverage, which in general can be greatly underestimated due to the missing eigenvectors.
Unlike these two schemes, the calibration factors defined in schemes Ci and C4 increase as a function of k. This means that greater weight is given to the more trailing eigenvectors in compensating the missing eigenvectors. Note, however, that the calibration factors Ci and C4 do not depend on the eigenvalues. The calibration factors in these four schemes are shown in Fig. 7 as functions of number of eigenvalues for K = 116 and ndat = 60. To test these calibration schemes, we choose the smallest number of loops from our experiments, i.e., K = 30. The original OIS values without calibration for u500, '5oo. and 91000 are displayed in the first column of Fig. 5. Since we are unable to run an ideal case with a very large number of loops to assess the effectiveness of different calibration schemes, we assume that the benchmark to compare with is the one with K = 116, which is shown in the last column of Fig. 5. From the left to the right, columns in Fig. 8 show the OIS values from schemes Cl, C2, Ci, and C4, respectively. Similar to Fig. 5, rows 1 to 3 and rows 4 to 6 in Fig. 8 are the OIS for ndat = 6 and ndat = 60, respectively.
By comparison with the results without calibration (first column of Fig. 5), the OIS values from calibration C1 in first column of Fig. 8 show a very similar pattern and magnitude for all the variables. Clearly, the magnitudes are generally smaller than those in the "ideal case". Further more, the results from C1 with ndat = 6 fail to pick up much impact over regions in Europe, Asia and Australia as in the benchmark (last column of Fig. 5). Scheme C2 (column 2 of Fig. 8) also

Correlations between observations and observation impact signals
In order to further assess the impacts from observations in the GSI, we will study the OIS distributions at certain model levels and their correlations with the observation locations.
In the following figures, we will plot the horizontal locations of observations that are located between the vertical levels of p -50 mb and p + 50 mb, and compare them with the OIS values computed from Eq. (5) at level p. As an example, we choose K -100 and conventional data set only with ndat = 6 in the following. Figure 9 shows the locations of temperature observations between 500 mb plus and minus 50 mb, and the OIS for temperature at 500 mb. It is clear that the region with most conventional data is North America, followed by Europe, Asia and regions near Australia. The method indeed produces larger OIS values over these data dense areas. However, it is also noticeable that some sparsely distributed observations around the tropics did not generate much OIS. It is expected that more eigenvectors are needed in these areas to recover some impact from observations. Figure 10 shows the same as Fig. 9, but for u. Similarly the OIS values are generally larger over denser observation areas. As for f, some observations around the tropics do not produce much OIS. In Fig. 11, we display the same as Fig. 10 scatterometer-type wind observations (such as from JMA IR and EUMETSAT) in SH and the tropics as shown in Fig. 11. Our method does produce strong OIS in the SH, but not much signal can be seen in the tropics. In addition, the method does not produce a distribution that resembles the satellite data coverage across the tropics. It is expected that it will take a lot more eigenvectors to cover these satellite data, particularly in the tropics. Figure 12 is the same as Fig. 11, but for relative humidity 91000-The observations over Europe are equally as dense as over North America. Our method does generate some strong OIS impacts over these regions. Similarly, in Asia and Australia, the dense observations are correlated well with the strong OIS signals. Interestingly, a few sparse observations in the SH oceans are also captured by OIS. However, there are also some sparse observations around the tropics near Africa and South America (SA) that are missed by the OIS.  Fig. 9. Circles represent the locations of temperature observations between 500 mb -50 mb and 500 mb + 50 mb, and contour lines represent the OIS for temperature at 500 mb. u obs (500mb + /-50mb) and OIS at 500mb(ndot-6,k-100) Fig. 10. The same as Fig. 9, but for u.
As described in previous sections, OIS is able to capture some of the observation impacts, especially over the regions with dense conventional observations. Results in this section indicate that most satellite observations in the southern oceans can be captured. However, to capture the satellite band structures in the tropics, the number of eigenvectors we have tested is clearly not enough. All the results shown in this section indicate that many of the sparse observations in the tropics can not be captured, while the sparse observations in the southern oceans can be reflected in our OIS. This is related to the fact that the background error variances the tropics reduce the impact of the observations, and pull the analysis closer to the background. As a result, the analysis increment is also reduced. This can also be confirmed by looking at the analysis increments in Eq. (2). Corresponding to the observation locations and the OIS in Figs. 9-12, Fig. 13a-d shows the analysis increments for '500> «500. «looo. and ^IOOO-Ifwe compare Figs. 9 and 10 with Fig. 13a and b, we can clearly see that there are a reasonable number of observations in the tropics, but the increments are very small. As explained, this is due to the smaller background error values used in the GSI. Thus, the OIS values are also very small around the tropics. When Figs. 11 and 12 are compared with Fig. 13a and d, respectively, we see that areas around the tropics which do not show much OIS are also approximately the areas where the analysis increments are small. 6 Discussion and conclusions This paper represents another effort in estimating the analysis error variance, using the Lanczos method proposed by Fisher and Courtier (1995), in the NCEP global 3-D-Var GSI DA system. We have applied this method to the global 3-D-Var GSI and studied other different aspects of this method that were not exploited in Fisher and Courtier (1995) and Pondeca and . The properties of convergence in different calibration schemes discovered in this paper have greatly improved our understanding of the method and its implications in practical applications in an operational environment. Our focus is on estimating the observation impact signals (OIS) which are the square root of the difference between the background and analysis error variances. This quantity is a direct measure of the error reduction due to the observations assimilated.
The OIS values for different variables at typical model levels are computed for various numbers of inner loops K in the GSI with different numbers of observation sets. As expected, the smallest eigenvalue of the transformed Hessian matrix converges to one as K increases. However, the rate of convergence depends on the number of observations assimilated.
Our results show that the convergence is faster when smaller numbers of observations are used. If more observations are used, the converging speed is slower and a larger number of eigenvectors should be included in order to minimize the loss of information from the missing eigenvectors.
The top five corresponding normalized eigenvectors are also studied. In general, the structures for the largest eigenvectors when K is small show larger impacts in the regions where conventional data are dominant. If K is increased, the OIS can represent other areas with fewer observations. For the same number of AT, the less dominant eigenvectors may convey the impact signals from the less dominant observation regions.
When the OIS values are computed with a different number of data sets, the results show that the impact signals in the data rich regions are stronger with larger K. At the same time, more signals in the regions with fewer observations start to emerge as the number of inner loops increases. When the number of observations is increased, the method can clearly pick up the impact signals from the observations. As only a limited number of eigenvectors can be computed due to the computational constraint, the error reduction is severely underestimated. To estimate to what extent we are losing the information from the missing eigenvectors, we computed the mean OIS values for u, f, and q at three typical model levels (i.e., top, middle and bottom) for different values of K with two different numbers of observation data sets. It is found that the OIS values at K = 60,100, and 116 are about 1.5,2.0, and 2.2 times the value of OIS at K = 30, respectively. This is roughly 1.5 times the increase of the mean OIS when K is doubled. These ratios are much larger for the relative humidity at the top model level.
All of our results indicate that without proper calibrations, the estimates of observation impacts are not accurate. To overcome this difficulty, we have proposed and investigated four different calibration schemes to compensate for the missing trailing eigenvectors. Different schemes give different weights on a different number of eigenvectors. Our results show that the first two schemes cannot pick up the impact signals over the regions with less conventional data in comparison with the "ideal case", which has the largest number of inner loops. It is found that scheme C3 performs better than other schemes and can boost the OIS values in the data rich regions to the level in the "ideal case". However, it seems that none of them can pick up the impacts in the regions with less observation data as well as the "ideal case". The benefit of calibrations lies in the fact that they do enhance the OIS magnitudes in the regions with more traditional data coverage, which would be missed without calibrations.
We also studied the correlations between the observation locations and the OIS distributions for various variables at different levels. It is found that the method generally picks up the impact signals over the regions with conventional observations, particularly over the data dense areas. It even picks up the satellite observation impacts over the southern oceans. However, with the number of inner loops we have used, the method cannot show the satellite band structure over the tropics. A lot more eigenvectors are required to recover the whole satellite observation impacts. The area in which the method performs worst is the tropics. This is found to be due to the fact that the background errors produced by the NMC method are generally smaller over the tropics than over the extra-tropics, and the observation errors do not change with latitude. As a result, the observation impacts over the tropics are reduced. This also leads to the smaller analysis increments over the tropics.
In conclusion, the method presented in this paper with proper calibration is capable and effective in estimating the major observation impacts from the observations assimilated in the GSI, especially over those regions with more conventional data coverage. Since those gradient vectors can be generated by the operational global GSI almost at no cost, the computational expense in estimating the dominant eigenvectors is completely manageable with the current NCEP computing resources.
Another benefit of using this method is that the eigenvectors can be used in preconditioning the conjugate gradient algorithm in minimization to speed up the convergence. For example, an explicit or implicit preconditioner based on an approximation to the Hessian matrix can be chosen (Fisher, 1998). In this case, the time spent on computing the dominant eigenvectors can be offset by the time saved from this preconditioning. Therefore, this method is very suitable for an operational 3-D/4-D-Var system to estimate the observation impacts, and it can be used as part of a routine verification package. It can be shown that k k and e k = Q" k v k are the dominant eigenvalues and eigenvectors of the transformed Hessian matrix.