Global terrestrial water storage connectivity revealed using complex climate network analyses

Terrestrial water storage (TWS) exerts a key control in global water, energy, and biogeochemical cycles. Although certain causal relationship exists between precipitation and TWS, the latter quantity also reflects impacts of anthropogenic activities. Thus, quantification of the spatial patterns of TWS will not only help to understand feedbacks between climate dynamics and the hydrologic cycle, but also provide new insights and model calibration constraints for improving the current land surface models. This work is the first attempt to quantify the spatial connectivity of TWS using the complex network theory, which has received broad attention in the climate modeling community in recent years. Complex networks of TWS anomalies are built using two global TWS data sets, a remote sensing product that is obtained from the Gravity Recovery and Climate Experiment (GRACE) satellite mission, and a model-generated data set from the global land data assimilation system’s NOAH model (GLDAS-NOAH). Both data sets have 1× 1 grid resolutions and cover most global land areas except for permafrost regions. TWS networks are built by first quantifying pairwise correlation among all valid TWS anomaly time series, and then applying a cutoff threshold derived from the edge-density function to retain only the most important features in the network. Basinwise network connectivity maps are used to illuminate connectivity of individual river basins with other regions. The constructed network degree centrality maps show the TWS anomaly hotspots around the globe and the patterns are consistent with recent GRACE studies. Parallel analyses of networks constructed using the two data sets reveal that the GLDAS-NOAH model captures many of the spatial patterns shown by GRACE, although significant discrepancies exist in some regions. Thus, our results provide further measures for constraining the current land surface models, especially in data sparse regions.


Introduction
Terrestrial water storage (TWS) is defined as vertically integrated water of all forms above and below the Earth's surface (e.g., surface water, soil moisture, groundwater, and snow and ice) (Famiglietti, 2004).It is not only a key control of global water, energy, and biogeochemical cycles but also provides an integrated indicator of water availability and uses (Houborg et al., 2012;Lettenmaier and Famiglietti, 2006;Long et al., 2013;Voss et al., 2013;Guentner et al., 2007).Global TWS has been the subject of modeling studies for decades; however, validation of modeling results has been challenging historically because of limited availability of in situ data.Since its launch in 2002, the Gravity Recovery and Climate Experiment (GRACE) satellite mission has provided an unprecedented opportunity to study TWS remotely.GRACE detects temporal variations of the Earth's gravity field which, over land, are mainly caused by short-term variations or TWS anomalies (TWSA).Numerous studies conducted in the past decade have confirmed the remarkable capability of GRACE in tracking continental-and regionalscale TWS changes (e.g., Famiglietti et al., 2011;Sun et al., Published by Copernicus Publications on behalf of the European Geosciences Union & the American Geophysical Union.2010; Yeh et al., 2006;Long et al., 2013Long et al., , 2014;;Rodell et al., 2009;Swenson and Wahr, 2003;Han et al., 2005).So far, the monthly TWSA grids derived from GRACE have been used as an independent source of information for hydrologic model validation (Ramillien et al., 2008;Syed et al., 2008;Chen et al., 2005), calibration (Sun et al., 2012;Werth et al., 2009;Lo et al., 2010;Sun et al., 2010;Döll et al., 2014), and data fusion (Zaitchik et al., 2010;Houborg et al., 2012;Sun, 2013;Forman et al., 2012;Li and Rodell, 2015).
The global GRACE data set accumulated over the last decade is an important type of big data that can be mined for discovering information of global water/energy dynamics, and for helping to illuminate connections among major river basins and within the river basins themselves.Such information will be complementary to existing physically based TWS modeling efforts and will potentially provide calibration constraints (e.g., Guentner et al., 2007;Rodell et al., 2004).In this study, the complex network theory is adopted to construct a global TWSA network using GRACE data.The interannual spatial patterns of TWSA are then quantified through analyses of network topologies.
Complex network theory has long been used by scientists in various disciplines to study intricate connections in natural and social phenomena (Jackson, 2008;Newman and Girvan, 2004;Rubinov and Sporns, 2010).In recent years, the field of complex climate networks (CCN), which involves applications of traditional complex network analyses to climate systems (Tsonis and Roebber, 2004;Tsonis et al., 2006), has attracted significant attention.In typical CCN applications, cells of a gridded data set are deemed as nodes of a complex network, and links (or edges) between nodes are established on the basis of statistical similarity of the time series associated with the cells.After a climate network is constructed, various descriptive measures derived from the classical complex network theory are then applied to quantify network topologies (Donges et al., 2009b;Tsonis et al., 2006;Steinhaeuser et al., 2011).One of the main findings from the previous CCN studies is that climate networks manifest a "small-world" network property, akin to networks appear in many other fields (e.g., social networks).In CCN, this can be contributed to the existence of long-range connections that stabilize the climate system and enhance energy transfers within it (Donges et al., 2009a(Donges et al., , b, 2011)).TWS is closely intertwined with soil-vegetation-atmosphere interactions and is thus expected to show similar spatiotemporal patterns as observed from climate networks (e.g., precipitation network); however, it is well known that climate only plays a partial role in TWS changes.Land use changes and other anthropogenic activities (e.g., deforestation, aquifer mining, and water structures) increasingly stress water availability in many parts of the world and have been shown to produce global-scale impacts on the terrestrial water cycle (Vörösmarty and Sahagian, 2000).Such aspects are usually difficult to be fully captured and quantified without extensive mon-itoring data.The global coverage of GRACE TWSA, thus, becomes especially important.
Different from the global circulation model outputs analyzed by many previous CCN studies, GRACE TWSA is a remote sensing product, subjected to errors and uncertainties caused by instrumentation and data processing.As a result, the actual spatial resolution of GRACE TWSA is not 1 • × 1 • , but much coarser (Houborg et al., 2012).In other words, the intrinsic degrees of freedom of the GRACE TWS are less than its grid dimension.An important question is then how well a complex network constructed using the GRACE TWSA can represent the salient features of the global terrestrial water cycle.Importantly, how these patterns can be corroborated, at least partially, using other existing information.Toward this end, we use the TWS data set (1 • × 1 • ) simulated by global land data assimilation system (GLDAS) for comparison.GLDAS is a global terrestrial modeling system jointly developed by US National Aeronautics and Space Administration's (NASA) Goddard Space Flight Center and US National Oceanic and Atmospheric Administration's National Centers for Environmental Prediction.GLDAS incorporates satellite and in situ observations to produce optimal fields of land surface states and fluxes in near real time (Rodell et al., 2004).Although GLDAS is only a surrogate of in situ observations that are ultimately required to validate the GRACE results, previous studies have shown that GLDAS represents the magnitudes and variability of TWS sufficiently well (Syed et al., 2008).Thus, GLDAS represents a valuable independent source of information for validating GRACE results and has been used by a number of globalscale GRACE studies (e.g., Syed et al., 2008;Landerer and Swenson, 2012;Chen et al., 2005).In this study, the network measures inferred from GRACE data are compared to those built from the GLDAS outputs to cross-examine the two products.Note that GLDAS does not have an explicit representation of groundwater storage, an aspect that needs to be kept in mind when performing comparisons.

Network construction
A network is commonly represented by a graph G(V, E), which is specified by its node set V = {1, . . ., N} and edge set E, with N the number of nodes.Thus, the number of possible edges in an undirected graph (meaning the links are non-directional) is N (N − 1)/2.In the current context, each node corresponds to a grid cell at which a valid monthly time series is available and N is the total number of such cells in a gridded data set.Construction of a network generally proceeds in two steps, network growth and pruning.In the network growth step, similarity between all potential node pairs (i.e., edges) in graph G is quantified.Common measures of similarity are statistical correlation (either Pearson or Spear-man), mutual information, and synchronization (Boers et al., 2013;Donges et al., 2009a).In the pruning step, an appropriate similarity threshold (τ ) is imposed to the edge set to retain only those connections that exceed the threshold.The main purpose of network pruning is to improve network analysis efficiency.If the correlation between two time series is used as a measure of statistical similarity, then τ represents the minimum correlation coefficient (R) above which a pair of nodes is considered connected.The absolute value of correlation is used such that both strongly positive and negative correlations are counted.
Several methods have been used in the CCN literature to determine τ .In the significance testing method (Tsonis et al., 2006), τ is based on the two-sided Student's t test.The critical t value, t c , for a given sample size n s and user-defined significance level α are determined using the Student's t cumulative distribution function (CDF), from which the value of τ can be solved: A similar method uses the probability value (i.e., p value) of test statistics directly: a pair of nodes is considered connected if the p value is less than a critical value; for instance, Steinhaeuser et al. (2011) set the critical value to 10 −10 .Yet another method defines τ from an edge-density function ρ(τ ) defined as where n c is the number of active edges retained in a network when the threshold is set to τ .Thus, edge density is closely related to the CDF of R.
Obviously, all methods involve a certain degree of subjectivity.The selection of τ thus incurs a tradeoff between network maneuverability and preservation of network features: if too many edges are included, the main network features will be obscured, not to mention a significant increase in computational effort required to characterize a large network.In this work, the edge-density method is used because it allows for a direct comparison of network properties computed from different data sets (Donges et al., 2009a).Additional statistical analyses (see Sect. 4) are performed to ensure that all statistically significant features are retained in the constructed networks.

Network measures
The outcome of the network construction process is a Boolean-valued, symmetric N × N matrix, referred to as the adjacency matrix and denoted by A. Elements of A, a ij , are set according to the following rule: in which |R ij | is the absolute value of correlation between time series at nodes i and j .A number of network measures can then be applied to A to quantify network topology.The main metrics adopted in this work include the degree of centrality and connection length.
The degree of centrality of a node, k i , is defined as the number of first neighbors of node i and reflects the importance of node i in a network.Regions having high k i values are referred to as "supernodes" in network theory because these nodes tend to have not only local connections but also long-range connections or teleconnections.However, k i itself does not reveal the actual type of connections.Because of nonuniformity of cell areas at different latitudes, the degree of centrality k i is usually weighted by cell areas, leading to the area-weighted connectivity, AC i (Tsonis et al., 2006;Heitzig et al., 2012): where n i is the set of all first neighbors of the node i, and λ j is the latitude of its j th first neighbor.Thus, AC i is a normalized value representing the fraction of the Earth's surface area that a node is connected to.
A classic measure of network integration is the average distance between node i and all other nodes, D i , and is defined as (Rubinov and Sporns, 2010) where d ij is the number of edges traversed along the shortest path between the node pair (i, j ).If (i, j ) is not connected, d ij is defined as infinity.The characteristic path length of the network is obtained by taking an average of all D i and it represents the average number of edges to be traversed along the distance between two randomly selected nodes in a network.Calculation of pairwise shortest path lengths becomes computationally expensive when the number of node pairs is large.In this work, the average distance between node i and all other nodes, L i , is approximated according to where only the first neighbors of node i are included in the calculation, and l ij is the physical distance between node pair (i, j ) measured by using the respective cell-center latitudes and longitudes, (λ i , φ i ) and (λ j , φ j ).The physicalbased characteristic path length of the network, L, is simply the average of all L i (i = 1, . . ., N).The probability distribution of L i provides a sense of the average edge lengths in a network and L provides a measure of network integration.
The GRACE TWSA data set used in this study was downloaded from the Tellus site of the Jet Propulsion Laboratory (JPL; http://grace.jpl.nasa.gov/index.cfm).The data set is based on RL05 GRACE solutions (in the form of spherical harmonics) released by the Center for Space Studies at the University of Texas Austin.It includes 121 epochs from January 2003 to July 2013 at approximately monthly intervals.The 6 missing months, which are not contiguous, were reconstructed using linear interpolation (temporal only).The grid dimensions are 360 × 180 and ocean area is masked out, resulting in ∼ 25 000 cells in each TWSA grid.In generating the gridded TWSA product, a number of postprocessing algorithms have been applied, as documented in details in Landerer and Swenson (2012).In particular, a destriping filter is applied to minimize the effect of north-south-oriented stripes in GRACE monthly solutions, and a 300 km Gaussian filter is then used to reduce random errors in high-degree spherical harmonic coefficients not removed by destriping.
The GRACE gravity field solutions are typically truncated at a spectral degree less than 60.To restore signal attenuation caused by truncation and filtering, the JPL data set also includes a spatially distributed and temporally invariant scaling factor field.This scaling factor field is not used in this study because it does not affect pairwise correlations.
Outputs from GLDAS's NOAH model were obtained from NASA (http://disc.sci.gsfc.nasa.gov/services/grads-gds/gldas).GLDAS covers latitudes between −60 and 90 • , and does not model permafrost regions such as Greenland and Antarctica (Rodell et al., 2004).Its grid dimensions are 360 × 150 and the temporal span is from January 1979 to the present (GLDAS V1).The number of cells in each GLDAS monthly grid is N = 14 540.The GLDAS TWS is defined as the sum of water mass from all four soil layers represented by NOAH (up to 2 m depth) and snow water equivalent.Thus, GLDAS TWS mainly includes surface and root zone storages, but not the deeper groundwater storage.The GRACE grids are masked using the smaller GLDAS coverage during network construction.To ensure a consistent comparison, the GLDAS data set was processed using the same truncation and filtering techniques applied to the GRACE data, which has been a standard practice in the literature (e.g., Chen et al., 2010;Rodell et al., 2009).
Monthly time series contain high-frequency noise.Because the main interest in this study is on interannual correlations of TWSA, the high-frequency noise in each TWSA time series is removed.Several methods have been used for such a purpose; the z-score method has been employed in the CCN literature to remove seasonal variability (Donges et al., 2009b;Steinbach et al., 2003;Tsonis et al., 2006).It entails normalizing each monthly data point using the mean and standard deviation calculated for the corresponding month and over the entire record length.The least squares method, which is extensively used in the GRACE literature (e.g., Yeh  et al., 2006;Crowley et al., 2006), models the intra-annual variability using Fourier series (two annual sine/cosine terms and two semi-annual sine/cosine terms) and then removes the variability, together with a slowly moving trend.Our numerical tests show the two methods give very similar results.Lags existing between time series may weaken linear correlation.Thus, to examine the effect of temporal lags, the same interannual correlation analysis is repeated using a temporal window of 36 months (i.e., the maximum correlation observed within ±1.5 years of the zero lag).

Edge density
The number of possible edges represented by the TWS data sets is more than 100 million for N = 14 540.After removing seasonal trends from GRACE and GLDAS and calculating the correlation coefficient R for all node pairs, the edgedensity method is applied to determine a similarity threshold τ .Note in the discussion below, R is calculated at zero lag unless otherwise specified.
Figure 1a shows edge-density functions constructed using GRACE and GLDAS TWS data, both are monotonically decreasing (i.e., fewer connected edges at higher τ values) and are similar in shape.As mentioned in Sect.2, edge density provides an indicator of the fraction of connected edges at different threshold values.Figure 1b plots the maximum correlation coefficient as a function of edge length, which is defined as the shortest physical distance between a pair of nodes in this work.To arrive at Fig. 1b, all R values are first sorted according to nodal separation distances, a bin width of 250 km is applied to the resulting distribution, and the maximum R value within each bin is recorded.Figure 1b suggests that the maximum correlation stays relatively high (> 0.7) for most distances.Recall that the main purpose of network pruning is to improve the computational efficiency of network characterization while preserving the most significant network features.In this study, we set the threshold τ to 0.57 because (a) the corresponding fraction of connected edges is relatively small (0.036); i.e., at this level more than 96 % of edges is removed; (b) the edge densities of GRACE and GLDAS happen to be the same at that level; and importantly (c) the cutoff τ threshold is still below the maximum correlation exhibited at all separation distances, as suggested by Fig. 1b.Thus, the selected τ value ensures that all statistically significant network features are retained in the constructed networks.

Basin analyses
A basin analysis is useful for helping visualize the TWSA connection patterns at the basin level.As some examples, Figure 2 shows the results for six river basins around the world.To generate a plot in Fig. 2, a cell is first fixed, and all its edges are colored according to the actual R (not the absolute values).For our purpose, the centroid of each basin is used.While the basin centroid may not be representative of the connection patterns of an entire basin (especially when the basin spans several climatic regions), it serves as a basis for comparing multiple basins at a qualitative level.Figure 3 applies the cutoff threshold τ defined in Fig. 1 to all plots in Fig. 2. Results suggest that interannual TWSA connections in the Amazon and Congo basins are dominated by local connections.The mid-latitude basins (Ganges, Mississippi, and Tigris) generally show more teleconnections, although Yangtze is an exception.In the case of the Tigris Basin, a large number of strongly positive and negative correlations are observed and the local connections extend far beyond the basin boundary.A detailed interpretation of this observation will be given in the next section.
Extensive teleconnection is an advantage from a forecasting perspective because climate indices, such as El Niño-Southern Oscillation (ENSO) and North Atlantic Oscillation (NAO), can be used as possible indicators of future changes.For those basins without strong teleconnection, water resources planning must rely mainly on regional data.Such distinction sheds light on the significance of GRACE data to long-term basin planning and natural hazard mitigation strategies, as we will elaborate on in the following sections.As a sensitivity study, Fig. 4 (left column panels) shows the results of basin analysis for the Mississippi Basin, the largest basin in North America, using different thresholds corresponding to τ values of 0.41, 0.57 (the base case), and 0.76.The corresponding edge density is labeled in the figure .Because the cutoff threshold increases as ρ decreases, a significant reduction in the number of edges can be observed.For comparison, the modeled TWS connections obtained from GLDAS are provided in the second column of Fig. 4. In general, the connections modeled by GLDAS are much weaker (i.e., smaller in spatial extent) than those obtained from GRACE.In some cases, the locations of connections are also different.For example, the negative correlation obtained by GLDAS in northern Africa for ρ = 0.1 is not seen by GRACE.The complex networks thus provide a useful tool for examining the agreement, or the lack of it, between GLDAS and GRACE.

Connectivity
Using the selected cutoff τ , a network adjacency matrix A is formed and various network measures described under Sect. 2 are applied to quantify network topology.Figure 5a shows the area-weighted connectivity map constructed us-ing GRACE data.On the map, red colors highlight regions of high connectivity.Recall that a high degree of connectivity indicates that a node interacts strongly with the rest of the nodes in a network (i.e., a supernode); however, the connectivity map itself does not explain the type of connections per se, and needs to be analyzed jointly with the connection length map to be shown in the next section.The largest cluster of supernodes appears in the Middle East region, where the connected neighbors account for more than 0.16 of the global area.To a lesser extent, the Pacific northwest and east coast of the USA, southern Africa, southern South America, and eastern Australia all show smaller supernode regions.In contrast, most of Asia, central USA, and Europe exhibit little or no connectivity (blue color).These observations are consistent with patterns observed during basin analyses (see Figs. 3 and 4).
The supernode regions shown in Fig. 5a reflect the superposed effects of climate variations and anthropogenic activities.These can be explained in terms of global precipitation and atmospheric circulation patterns.In general, the poorly connected regions have stronger precipitation variations over shorter spatial scales, leading to the emergence of highprecipitation gradients which, in turn, are responsible for re- gional extreme events that are more localized in time and space (Scarsoglio et al., 2013).Those with high connectivity tend to be directly influenced by ocean and climatic oscillations (e.g., ENSO and NAO).Kahya and Dracup (1993) studied streamflow variations in the contiguous USA and identified northeast, northcentral, Pacific northwest, and Gulf of Mexico states as regions with potentially significant streamflow responses to ENSO forcing.These four regions can be easily identified on Fig. 5a, among which the Gulf of Mexico region shows the weakest connection.Similarly, Chiew et al. (1998) reported that the ENSO can be used to help forecast spring runoff in southeast Australia and summer runoff in the northeast and east coasts of Australia.This teleconnection pattern is also indicated clearly by Fig. 5a.
At the global scale, Dai et al. (2009) studied the monthly streamflow records of the world's 925 largest ocean-reaching rivers from 1948 to 2004.They concluded that (a) the interannual variations of streamflows are correlated with the ENSO events for discharge into the Atlantic, Pacific, Indian, and the global oceans as a whole and (b) the effects of anthropogenic activities on annual streamflow are likely to be small compared to those of climate variations; however, anthropogenic activities can create more disturbances in arid and semi-arid regions, where the discharge magnitudes are low (e.g., Indus, Yellow, and Tigris-Euphrates river basins).To elaborate the latter point further, Fig. A1 in Appendix A plots the proportion of total renewable water resources withdrawn by country for human uses in the agricultural, municipal, and industrial sectors, using long-term data compiled by the Food Agricultural Organization of United Nations.  in those regions is not coincidental and can be corroborated using multiple sources.Because interannual TWS anomalies are well connected in clustered supernode regions, these regions tend to exhibit more vulnerability to both climate and human-induced disturbances.
Having elaborated the close relationship between GRACE TWSA and climate patterns, it is important to point out that the TWS also includes effects of soil moisture and groundwater storage (mostly unconfined aquifers) changes that may not synchronize with climate patterns.
Figure 5b shows the same area-weighted connectivity map, but constructed using the GLDAS-NOAH outputs.Although GLDAS-NOAH shows many of the similar patterns detected by GRACE, it also indicates stronger connectivity in the Arabian Peninsula, northern Africa, and in middle South America, and much weaker connectivity in southern Africa.These discrepancies may be caused by GLDAS-NOAH's parameterization and other errors.The other main reason is the lack of representation of the deeper groundwater storage in GLDAS.The discrepancies highlighted here provide additional spatial calibration constraints for land surface models.In areas dominated by shallow TWS components, GLDAS needs to show similar patterns as those derived from GRACE, whereas discrepancies are only expected in areas dominated by deep TWS components and/or impacted by significant anthropogenic activities.We emphasize here the connectivity maps shown in Fig. 5 are for TWSA.Thus, the high-precipitation areas (e.g., Amazon Basin) do not necessarily exhibit high anomaly connectivity after removing the intra-annual variability.So far, all results have been based on zero-lag correlations.The effect of temporal lag on connectivity is examined in Fig. 6, in which the connectivity map is built using the maximum (absolute) correlation found between −18 and +18 monthly lags of each node pair.The figure suggests that incorporation of lagged correlation further strengthens connectivity.The supernode regions are more expanded in space, notably in eastern Australia and in the Colorado River basin and Gulf Coast states in the USA.Further, Appendix B shows the maximum correlation and phase lags for the six basins studied in Fig. 2, which suggest that each river basin is in phase with most cells in itself and the immediate surroundings.However, significant phase lags exist between each river basin and other river basins.

Connection length
Figure 7a shows maps of the physical-based average nodal connection length L i (i = 1, . . ., N). Nodes that exhibit the longest connection lengths are mostly located in the southern part of South America (∼ 12 000 km).Other regions with relatively long connections are found in Pacific northwest, northcentral, Colorado River, and northeastern regions of the USA, southern Africa, and eastern Australia.Interestingly, the Middle East region is mostly characterized by connection lengths less than 5000 km; thus, the supernodes in that region are dominated by local connections.The connection length patterns observed here support the previous discussions in the context of teleconnection and forecasting potential.Importantly, the connection length map can help evaluate the influence of teleconnection on TWS for a particular region.
The average nodal connection length map constructed using GLDAS data suggests much wider connections, although most are local.Again this can be attributed to model parameterization schemes, forcing resolution, and spatial correlation constraints, as discussed before.
The probability distribution of the average connection length, L i , is shown in Fig. 8.Most nodes in the GRACE network are dominated by short-range edges with lengths less than 2000 km, although several other smaller modes appear in the 4000-6000, 6000-8000, and 8000-10 000 km ranges.In contrast, the GLDAS network shows a weaker local connection mode in < 2000 km range, but a wider and more persistent second mode in 4000-6000 km.Interestingly, the two modes of GLDAS coincide with those of GRACE.The characteristic path length (L) is 2300 km for GRACE and 4000 km for GLDAS, respectively.

Summary and conclusions
In this work, the complex network theory is applied to analyzing spatial connection patterns in TWS.A comparative study is conducted using two global TWS data sets derived from GRACE and GLDAS, respectively, with an emphasis on interannual variability.Both data sets are large and have more than 100 million potential connections.An edgedensity method is adopted to define an appropriate network pruning threshold.The constructed networks are further analyzed using the degree of centrality and connection length measures.
Our results show that complex networks and GRACE TWSA can be used to identify global TWSA hotspots or supernode regions.The area-weighted connectivity is a local measure that reveals nodes with a large number of connections (edges), whereas the connection length helps identify the dominating type of connections (i.e., local connections vs. teleconnections).In terms of connectivity, the largest cluster of supernodes appears in the Middle East region, while other prominent ones are found in Pacific northwest and eastern USA, southern Africa, southern South America, and eastern Australia.In terms of connection lengths, the Middle East region is dominated by local connections, whereas regions such as Pacific northwest, northcentral, Colorado River, and northeastern regions of the USA, southern Africa, and eastern Australia all have strong bimodal connections.
While many of the TWSA network features found here can be explained by established climate teleconnection theories, the TWS, as an integrated indicator of global water storage, is unique in its own way.It shows the impact of both climate and anthropogenic activities.Knowledge of both the strength and type of TWS connectivity can help identify useful TWS predictors and provide insight to further improve current land surface models.
GLDAS outputs have been used extensively in validating GRACE results at various scales.Less focused is the consistency of spatial correlation represented by GLDAS and GRACE data.Results from this study statistically quantify the similarity and discrepancies between the two data sets.In this case, the observed discrepancies may be attributed to missing surface and groundwater components in the GLDAS model, or to GRACE uncertainties (Syed et al., 2008;Li and Rodell, 2015).Although data assimilation has been used to reduce discrepancies in land surface models, the geometrical, spatial connection patterns have not been used before.A main conclusion from this work is that network connectivity measures should be incorporated as an additional model calibration and validation criterion when developing the future generation of GLDAS models.

Figure 1 .
Figure 1.(a) Edge-density function ρ(τ ) of GRACE and GLDAS (the value of τ selected for network pruning is 0.57, corresponding to an edge density 0.036); (b) maximum correlation as a function of edge lengths.

Figure 2 .
Figure 2. Patterns of connection inferred from GRACE TWSA for six river basins, in which connection pattern is based on correlation between the basin centroid and all other cells in the grid.

Figure 3 .
Figure 3. GRACE connection patterns after cutoff threshold τ = 0.57 is applied (the green solid line delineates basin boundaries).

Figure 4 .
Figure 4. Sensitivity of connection patterns to cutoff threshold, demonstrated using Mississippi River basin's centroid.Left column panels: GRACE results; right column panels: GLDAS results.
Figure A1 indicates that the Middle East and northern African countries show the highest withdraw proportions.In a recent GRACE study focusing on northcentral Middle East, Voss et al. (2013) reported that GRACE data show an "alarming rate" of decrease in TWS of approximately 143.6 km 3 during 2003-2009.Thus, the resemblance between Figs. 5a and A1

Figure 6 .
Figure 6.Effect of lagged correlation on GRACE area-weighted connectivity, where the window of lagged correlation is [−18, 18] months.

Figure 7 .
Figure 7. Map of average node connection lengths derived based on (a) GRACE and (b) GLDAS.

Figure 8 .
Figure 8. Distribution of average edge lengths in GRACE and GLDAS networks, where L i denotes the average distance between node i and its neighbors.

Figure B2 .
Figure B2.Phase lag of maximum correlations obtained for the six river basins shown in Fig. B1 (normalized by 18).