© Author(s) 2009. This work is distributed under the Creative Commons Attribution 3.0 License. Nonlinear Processes in Geophysics

Abstract. A basic task of exploratory data analysis is the characterisation of "structure" in multivariate datasets. For bivariate Gaussian distributions, natural measures of dependence (the predictive relationship between individual variables) and compactness (the degree of concentration of the probability density function (pdf) around a low-dimensional axis) are respectively provided by ordinary least-squares regression and Principal Component Analysis. This study considers general measures of structure for non-Gaussian distributions and demonstrates that these can be defined in terms of the information theoretic "distance" (as measured by relative entropy) between the given pdf and an appropriate "unstructured" pdf. The measure of dependence, mutual information, is well-known; it is shown that this is not a useful measure of compactness because it is not invariant under an orthogonal rotation of the variables. An appropriate rotationally invariant compactness measure is defined and shown to reduce to the equivalent PCA measure for bivariate Gaussian distributions. This compactness measure is shown to be naturally related to a standard information theoretic measure of non-Gaussianity. Finally, straightforward geometric interpretations of each of these measures in terms of "effective volume" of the pdf are presented.


Introduction
A fundamental question in exploratory data analysis is: given observations of two variables x 1 and x 2 , to what extent is the joint distribution of these variables "interesting", in the sense that it is "structured"? Different kinds of structure can be considered, among which some of the most important are: Correspondence to: A. H. Monahan (monahana@uvic.ca) I Dependence: to what extent does knowledge of x 1 imply knowledge about x 2 ?
II Compactness: to what extent is variance shared between x 1 and x 2 ; that is, how tightly concentrated around a lower-dimensional surface is the joint probability density function (pdf) p(x 1 , x 2 )?
That these are distinct measures of structure is illustrated by the the pdfs displayed in Fig. 1, all of which by construction have the same total variance var(x 1 )+var(x 2 ). The Gaussian pdf (a) describes uncorrelated variables x 1 and x 2 without any clustering around a lower-dimensional surface; it possesses no structure in either of the senses described above. In contrast, the Gaussian pdf (b) is characterised by more variance along one axis than the other. While x 1 and x 2 are independent, their joint pdf possesses compactness. The Gaussian pdf (c) is equally concentrated around a single axis as is (b), but in such a way that the variables x 1 and x 2 are correlated and thus also characterised by dependence.
For bivariate Gaussian distributions with pdf (assumed without loss of generality to be mean zero), where x T = (x 1 , x 2 ) and the covariance matrix is defined as then measures of structure associated with dependence and compactness are associated respectively with ordinary leastsquares regression (OLS) and principal component analysis (PCA) (the second of which is closely related to orthogonal least squares regression). For OLS, the natural measure of structure is ρ 2 , the fraction of variance "explained" by the Published by Copernicus Publications on behalf of the European Geosciences Union and the American Geophysical Union.  (d) and correlated in (e). Pdfs (b) and (d) have the same covariance matrix, and would not be distinguished by traditional linear measures of dependence and compactness; the same is true of (c) and (e). All pdfs have the same total variance var(x1)+var(x2) = TrΣ.
where λ 1 is the larger eigenvalue of Σ. Values of ρ 2 and F for pdfs (a)-(c) are given in Table 1-1. When the joint distribution p(x 1 , x 2 ) is not Gaussian, the issue of characterising structure is more subtle. Panel (d) in Figure 1 contours the pdf (for which mean(x 1 ) = mean(x 2 ) = 0, var(x 1 ) = σ 2 1 , var(x 2 ) = 2a 2 σ 4 1 + σ 2 2 ), where the parameters (a, σ 1 , σ 2 ) have been chosen so that the pdfs in (b) and (d) have the same covariance matrices. It is evident that the pdf (d) is concentrated around a low-dimensional (nonlinear) curve, and therefore also is characterised by compactness. In fact, visual inspection suggests that the degree of concentration of the pdf (and therefore of compactness) is greater for (d) than for (b), but the traditional linear measure of compactness F would not distinguish between them. Furthermore, x 1 and x 2 in (d) are dependent, despite being uncorrelated: strongly positive and negative values of x 1 are associated with strongly positive values of x 2 . The traditional linear measure of dependence ρ 2 does not characterise this structure. The compactness of pdf (d) has contributions from both the anisotropy of the covariance matrix (shared with pdf (b)) and from the degree of non-Gaussianity. In order to tease these apart, it is desirable to also define a third measure of "interesting" structure: III Non-Gaussianity: to what extent does the joint distribution differ from a bivariate Gaussian?
Such a measure would allow the determination for a given pdf of the relative contribution to compactness of non-Gaussianity and covariance anisotropy. The pdf (e) (constructed by rotating the pdf (d) through 45 • ) has the same covariance matrix as (c); again, these pdfs would not be distinguished by the measures ρ 2 and F . By inspection, the degree of compactness of pdf (e) is the same as that of pdf (d): the degree of concentration of a pdf around a lower-dimensional curve should not depend on its orientation. However, the degree of dependence between x 1 and x 2 has changed relative to (d) (for example, the conditional pdf p(x 2 |x 1 ) is much tighter for x 1 > 0 than for x 1 < 0). This example further illustrates the fact that the ideas of compactness and dependence are distinct. Finally, the fact that the discussion of dependence, compactness, and non-Gaussianity can be framed in terms of the plots in Figure 1 suggests that measures of each of these should have straightforward geometrical interpretations.
The above discussion motivates the consideration of measures of compactness that are invariant under orthogonal rotations, and which reduce to PCA for the case of a bivariate Gaussian; of measures of dependence which are not invariant under rotation and which reduce to ordinary least-squares for the case of a bivariate Gaussian; and of measures of non-Gaussianity. The notion of "interesting" structure of course is a relative concept, and can only be measured relative to specified "uninteresting" distributions. In the construction of these measures, we are thus confronted with the need to measure the difference between two pdfs: one data-driven, the other some specified background reference. A natural framework for measuring the difference between two pdfs is provided by information theory (e.g. Cover and Thomas, 1991;Majda et al., 2005), through which such differences can be related to the new "information" provided by the data-driven pdf relative to the background pdf.
In fact, a well-known general measure of dependence is provided by information theory: this is multiinformation (which in two dimensions is also known as mutual information). Similarly, information theory provides a natural mea- The variables x 1 and x 2 are uncorrelated in (d) and correlated in (e). Pdfs (b) and (d) have the same covariance matrix, and would not be distinguished by traditional linear measures of dependence and compactness; the same is true of (c) and (e). All pdfs have the same total variance var(x 1 ) + var(x 2 )=Tr .
regression; for PCA it is F , the fraction of variance explained by the first PCA mode: where λ 1 is the larger eigenvalue of . Values of ρ 2 and F for pdfs (a)-(c) are given in Table 1. When the joint distribution p(x 1 , x 2 ) is not Gaussian, the issue of characterising structure is more subtle. Panel (d) in Fig. 1 contours the pdf  Fig. 1. ρ 2 is the fraction of variance explained by ordinary least-squares regression, F (p) is the fraction of variance explained by the first PCA mode, M(p) is the generalised measure of dependence (Eq. 16), C(p) is the generalised measure of compactness (Eq. 22), S(p) is the compactness measure transformed to correspond to F (p) for a bivariate Gaussian (Eq. 27), and ν(p) is the measure of non-Gaussianity (Eq. 31). (for which mean(x 1 )=mean(x 2 )=0, var(x 1 )=σ 2 1 , var (x 2 )=2a 2 σ 4 1 +σ 2 2 ), where the parameters (a, σ 1 , σ 2 ) have been chosen so that the pdfs in (b) and (d) have the same covariance matrices. It is evident that the pdf (d) is concentrated around a low-dimensional (nonlinear) curve, and therefore also is characterised by compactness. In fact, visual inspection suggests that the degree of concentration of the pdf (and therefore of compactness) is greater for (d) than for (b), but the traditional linear measure of compactness F would not distinguish between them. Furthermore, x 1 and x 2 in (d) are dependent, despite being uncorrelated: strongly positive and negative values of x 1 are associated with strongly positive values of x 2 . The traditional linear measure of dependence ρ 2 does not characterise this structure. The compactness of pdf (d) has contributions from both the anisotropy of the covariance matrix (shared with pdf (b)) and from the degree of non-Gaussianity. In order to tease these apart, it is desirable to also define a third measure of "interesting" structure: III Non-Gaussianity: to what extent does the joint distribution differ from a bivariate Gaussian?
Such a measure would allow the determination for a given pdf of the relative contribution to compactness of non-Gaussianity and covariance anisotropy. The pdf (e) (constructed by rotating the pdf (d) through 45 • ) has the same covariance matrix as (c); again, these pdfs would not be distinguished by the measures ρ 2 and F . By inspection, the degree of compactness of pdf (e) is the same as that of pdf (d): the degree of concentration of a pdf around a lower-dimensional curve should not depend on its orientation. However, the degree of dependence between x 1 and x 2 has changed relative to (d) (for example, the conditional pdf p(x 2 |x 1 ) is much tighter for x 1 >0 than for x 1 <0). This example further illustrates the fact that the ideas of compactness and dependence are distinct. Finally, the fact that the discussion of dependence, compactness, and non-Gaussianity can be framed in terms of the plots in Fig. 1 suggests that measures of each of these should have straightforward geometrical interpretations.
The above discussion motivates the consideration of measures of compactness that are invariant under orthogonal rotations, and which reduce to PCA for the case of a bivariate Gaussian; of measures of dependence which are not invariant under rotation and which reduce to ordinary least-squares regression for the case of a bivariate Gaussian; and of measures of non-Gaussianity. The notion of "interesting" structure of course is a relative concept, and can only be measured relative to specified "uninteresting" distributions. In the construction of these measures, we are thus confronted with the need to measure the difference between two pdfs: one data-driven, the other some specified background reference. A natural framework for measuring the difference between two pdfs is provided by information theory (e.g. Cover and Thomas, 1991;Majda et al., 2005), through which such differences can be related to the new "information" provided by the data-driven pdf relative to the background pdf.
In fact, a well-known general measure of dependence is provided by information theory: this is multiinformation (which in two dimensions is also known as mutual information). Similarly, information theory provides a natural measure of non-Gaussianity known as negentropy. Less wellestablished is a general measure of compactness. Previous approaches to this problem, using tools such as Nonlinear Principal Component Analysis (e.g. Monahan et al., 2003), have been hampered by the lack of a rigorous theoretical framework and by the methodological difficulties of nonlinear nonparametric function estimation.
The goal of the present study is to further develop measures of "interesting" structure for general non-Gaussian pdfs that can provide a rigorous basis for non-Gaussian exploratory data analysis. In particular, we will propose a general measure of compactness with firm foundations in information theory and which reduces to PCA for bivariate Gaussians. This measure will be contrasted with the wellestablished measures of dependence and non-Gaussianity provided by mutual information and negentropy. The measure of compactness will be seen to be a combined measure of Gaussianity and covariance isotropy, and therefore to have a natural connection to the standard information theoretic measure of non-Gaussianity. This discussion presents a unifying notion of "structure" in probability distributions: each of the measures of dependence, compactness, and non-Gaussianity are defined in terms of the information theoretic "distance" (as measured by relative entropy) between the given pdf and the appropriate "unstructured" pdf. Finally, it will be shown that these measures have natural geometrical interpretations in terms of the "effective volumes" of the associated probability distributions. A similar measure of compactness was introduced in Peña and van der Linde (2007); the present study demonstrates the connection of the compactness measure to PCA, emphasizes the fundamental difference between it and mutual information as measures of structure, and illustrates how all of these measures of structure can be expressed as relative entropies. There has been considerable recent interest in information theoretic measures of predictability in geophysical systems (e.g. Del-Sole, 2004;Kleeman and Majda, 2005;DelSole and Tippett, 2007); the present study considers the applicability of these ideas to exploratory data analysis. This study does not address the problem of estimating these measures from finite datasets: since the proposed measures apply to non-Gaussian data the estimation problem is considerably more difficult than those of the corresponding Gaussian measures, as the underlying pdfs are not known a priori to be parameterised with a finite set of coefficients. Nevertheless, one must have a clear idea of what constitutes "interesting structure" without regard to estimation questions before complexities due to finite data can be addressed.

Information theoretic entropy
A natural starting point for the characterisation of the structure of the pdf p(x) of an N-dimensional random variable x is the entropy (e.g. Cover and Thomas, 1991). This quantity arises naturally as the measure of the "information content" of a pdf, and is characterised by the following properties relevant to the discussion of "structure": 1. Under a diffeomorphic coordinate transformation x→x =G(x), H (p(x )) = H (p(x)) − p(x) ln det ∂x ∂x dx (6) (Majda et al., 2002). In particular, under a linear rescaling of each variable: and under a unitary transformation x =Ux, H is invariant H (p(x )) = H (p(x)).
2. We have the inequality: where H (p G ) is the entropy of a Gaussian random variable with the same covariance matrix as p(x) (e.g.  Cover and Thomas, 1991). Thus, of all distributions with the same covariance matrix, the Gaussian has the largest entropy (and so is minimally "informative"). This point deserves further comment. The three univariate pdfs (illustrated in Fig. 2) are each of variance σ 2 but of respective entropies: H (p 1 )= ln( The entropy of the Gaussian distribution p 1 (x) is larger than those of the other two distributions, and so it is less "informative" than either: the boxcar distribution p 2 (x) because it does not display long tails, and the exponential distribution p 3 (x) because it is sharply peaked around x=0. The Gaussian distribution combines sufficient flatness around its median value with sufficiently thick tails to be maximally entropic (that is, minimally informative).
3. A pdf is said to be sphered if all of the eigenvalues of its covariance matrix are equal; that is, if it its covariance matrix is proportional to the identity matrix (note that while the covariance matrix of a sphered pdf is invariant under rotation, the pdf itself is not necessarily isotropic). Given a pdf p(x ) with covariance obtained from the sphered pdf p S (x) by a linear rescaling of the coordinate axes such that the total variance Tr is fixed, then H (p S )≥H (p). That is, the entropy is maximised by the sphered pdf among all pdfs related by linear rescalings of the axes such that the total variance is maintained. To see this result, fix the matrix (with eigenvalues λ i ) and consider the sphered pdf p S (x) with covariance matrix (Tr /N )I N , where I N is the identity matrix. The pdf p(x ) with covariance matrix is obtained through a linear transformation x i = √ γ i x i , where γ i =N λ i /Tr and the x i are aligned along the eigenvectors of . From Eq. (7), it follows that where the desired result (given by the final inequality) follows from the arithmetic-geometric inequality: (where equality holds when all λ i are equal). Thus, if a pdf is stretched along some axes and compressed along others such that the total variance is unchanged, then the pdf with maximum entropy arises when all axes carry equal variance.
Note that it follows from this and the previous property that of all pdfs with the same total variance, the entropy is maximised by a sphered Gaussian. This fact can be proved directly using standard maximum entropy methods (e.g. Cover and Thomas, 1991); properties 2 and 3 have been presented separately in order to highlight the distinction between spheredness and Gaussianity in the context of maximum entropy distributions.
A natural measure of the difference between two pdfs p(x) and q(x) is the relative entropy: (Cover and Thomas, 1991). This quantity is non-negative (taking the value of zero only if p=q) and is invariant under an arbitrary invertible coordinate transformation x⇒x =G(x). While relative entropy is not a Euclidean distance measure (in particular, it is not symmetric: D(p||q) =D(q||p)), it is a useful measure of the difference between two pdfs. The measures of dependence, compactness, and non-Gaussianity to which we now turn will each be defined in terms of the relative entropy between the given pdf and an appropriate "unstructured" pdf.

Measures of structure: dependence
By definition, the components of the random variable x are independent if and only if their joint distribution factors as the product of the marginals: p(x)= N i=1 p x i (x i ). The wellknown result follows that a natural measure of dependence in a multivariate pdf is the multiinformation (Schneidman et al., 2003) I where the second equality follows from the definitions of marginal distributions and of entropy. It follows that the quantity is a measure taking values between 0 and 1, with M(x)=0 when the x i are mutually independent and M(x)=1 when at least two variables are fully dependent (that is, x j =f (x 1 , ..., x j −1 , x j +1 , ..., x N ) for some j ). For the measure of dependence, the "unstructured" pdf against which the given pdf is compared is given by the product of the marginals along each x i which by construction has no dependence among any of the variables. For the bivariate case (N =2), I (x) is known as the mutual information (e.g. Cover and Thomas, 1991). For a bivariate Gaussian, it is well known that where ρ is the correlation coefficient between x 1 and x 2 , from which it follows that In the limit that p(x 1 , x 2 ) is bivariate Gaussian, then, M(p) corresponds to the fraction of variance accounted for by an ordinary least-squares regression between x 1 and x 2 . The integral (16) defining mutual information is invariant under an arbitrary coordinate transformation, and therefore might be considered to also be a natural general measure of compactness. In fact, the mutual information is not invariant under an orthogonal rotation of x. This is most easily seen in the context of a Gaussian distribution, for which the correlation coefficient ρ 2 is not invariant under rotations: in particular, under a rotation of (x 1 , x 2 ) such that the coordinate axes are aligned with the principal component axes, the correlation coefficient vanishes. The resolution of this apparent paradox is that while the integral in Eq. (16) is invariant under the unitary transformation x→x =Ux, the integral does not retain its identity as mutual information. This is because under the rotation the product of the marginal distributions of the original variables, p x 1 (x 1 )p x 2 (x 2 ) is not transformed into the product of the marginals of the rotated variables, p x 1 (x 1 )p x 2 (x 2 ) (a detailed discussion of this point in the context of a bivariate Gaussian distribution is presented in Appendix A). Like ρ 2 , mutual information is not invariant under a unitary transformation that mixes the two variables: in general, M(p(x )) =M(p(x)). Mutual information (and more generally multiinformation) therefore does not provide the desired compactness measure, to which we now turn.

Measures of structure: compactness
As was discussed in the Introduction, we seek a measure of compactness of multivariate distributions; that is, a measure of the extent to which the full distribution is concentrated around a lower-dimensional surface. Such a measure should be invariant under unitary transformations (the degree of concentration should not depend on the orientation of the distribution in state space). The dependence measure M(p) is not such a measure, as it is not invariant under unitary transformations.
We suggest measuring compactness based on the degree to which p(x) differs from a sphered Gaussian with the same total variance Tr . The pdf of such an equivalent sphered Gaussian is from which it follows that (note that the relative entropy can be expressed as a difference between two entropies as a consequence of the special form of Eq. 20). This measure vanishes for a sphered Gaussian and is never negative. In analogy with Eq. (16), we define the compactness of p(x) as which is bounded between 0 and 1, vanishes for a sphered Gaussian, and is invariant under unitary transformations. The compactness measure can be factored as The first factor in parentheses is the exponential of the ratio of the entropy of p(x) to that of a Gaussian distribution with the same covariance matrix; by inequality Eq. (9), this ratio is bounded between zero and one. The second factor 62 A. H. Monahan and T. DelSole: Measures of dependence, compactness, non-gaussianity in parentheses is bounded between zero and one by the inequality Eq. (14) and the fact that the eigenvalues of are all non-negative, such that the ratio achieves its maximum value when all eigenvalues of are equal (i.e. when the distribution is sphered). This factorisation illustrates that our proposed measure of compactness is fundamentally a combined measure of Gaussianity and covariance isotropy. For a bivariate Gaussian distribution, C(p) reduces to: For such a distribution, the classical measure of compactness is the fraction of variance accounted for by the first principal component. This measure is expressed mathematically as: The measure is a general measure of compactness that in the limit of p(x) Gaussian reduces to F (p). The quadratic equation for λ 1 following from the facts that det =λ 1 λ 2 and Tr =λ 1 +λ 2 can be solved to yield In the same way that for a bivariate Gaussian M(p) had a straightforward relationship to the fraction of variance explained by an ordinary least-squares regression, for a bivariate Gaussian C(p) is naturally related to the fraction of variance explained by the first PCA mode.

Measures of structure: non-gaussianity
The compactness measure C(p) combined measures of covariance isotropy and Gaussianity and therefore cannot distinguish between situations in which the measure is large because (a) the pdf is Gaussian (or nearly so), such that the variance of the first principal component is much larger than that of the second, or because (b) the pdf is narrowly distributed around a nonlinear curve. For this, a direct measure of non-Gaussianity is needed; such a measure is the negentropy (Lee et al., 2000), defined as the relative entropy between p(x) and the Gaussian pdf with the same covariance matrix: where and Eq. (28) follows because p G (x) is Gaussian. Defining we obtain a measure taking values between 0 and 1, with ν(p)=0 if and only if p(x) is Gaussian (as by construction both p(x) and p G (x) have the same covariance matrix) and ν(p) increasing as p(x) becomes increasingly non-Gaussian. Note that ν(p) contains contributions from both the compactness of the pdf and the degree of covariance anisotropy; for a sphered distribution Tr /N=(det ) 1/N and ν(p)=C(p). Furthermore, for fixed, ν(p) increases as C(p) increases: among all distributions with the same covariance, the more compact distributions are the more non-Gaussian.

Measures of structure: geometric interpretation
The quantity is an extensive variable which can be interpreted as a measure of the "effective volume" of a pdf. For instance, for a Gaussian distribution p G (x) with covariance matrix , The volume enclosed by a surface of constant probability density for the same distribution is = α N/2 | | 1/2 π N/2 (N/2 + 1) .
Comparing these two expressions shows that, aside from factors that depend only on the dimension of the space, V (p) is related to the geometric volume of the isoprobability ellipsoid. More generally, V (p) is the volume of a "typical set", as reviewed in Cover and Thomas (1991). Because of inequalities Eqs. (9) and (13), the pdf with maximum volume for a given covariance matrix is Gaussian, and the pdf with maximum volume for given total variance is a sphered Gaussian. The measures of dependence, compactness, and non-Gaussianity introduced above have natural interpretations in terms of effective volumes (in the sense of Eq. 32): That is: -M(p) is one less the square of the ratio of two effective volumes: that of the full pdf, and that of the pdf produced by the product of the marginals. Dependence among the variables x i implies a concentration of probability around some lower-dimensional surface, with an associated reduction in V (p) and an increase in M(p).
-C(p) is one less the square of the ratio of two volumes: that of the full pdf, and that of the equivalent sphered Gaussian. Similarly to M(p), C(p) is a measure of the degree to which the pdf p(x) clusters around a lowdimensional surface; but unlike M(p), C(p) is rotationally invariant as the effective volume of p SG (x) does not change under a coordinate rotation (in contrast to the effective volume of the product of the marginals).
ν(p) is one less the square of the ratio of the effective volume of the full pdf to that of the Gaussian with the same covariance matrix.
In general, the degree of structure in a pdf increases as the effective volume decreases relative to that of the "unstructured" pdf against which it is compared. This result provides a useful geometrical interpretation of the measures of structure.

Conclusions
This study has considered three measures of structure for multivariate datasets, all defined in terms of the relative entropy (the information-theoretic distance) between a given pdf p(x) and an appropriate "unstructured" pdf.
-Dependence is measured in terms of the relative entropy between p(x) and the pdf q(x)= N i=1 p x i (x i ) consisting of the product of the marginal distributions along each individual component x i of x -Compactness is measured in terms of the relative entropy between p(x) and the equivalent sphered Gaussian p SG (x) (the Gaussian with the same total variance but equal variance along each coordinate direction). This is a combined measure of Gaussianity and covariance isotropy, and is invariant under an orthogonal rotation of the variables.
-Non-Gaussianity is measured in terms of the relative entropy between p(x) and the equivalent Gaussian (the Gaussian with the same covariance matrix). This measure has a natural connection with the measure of compactness.
All of these measures admit useful geometrical interpretations in terms of the ratio of the "effective volume" of the pdf to that of the associated "unstructured" pdf against which it is compared. The dependence measure M(p) is not invariant under an orthogonal rotation of the variable vector x, despite the fact that the integral defining it is in fact invariant. This study has demonstrated that this apparent paradox is resolved by the fact that under the rotation the integral no longer retains the identity of the dependence measure (as the rotated product of marginal distributions is not the product of the rotated marginals). Table 1 presents values of the various measures of dependence, compactness, and non-Gaussianity considered in this study for the distributions in Fig. 1. For the Gaussian distributions, the dependence measure M(p) (Eq. 16) and compactness measure C(p) (Eq. 22) coincide with the corresponding measures from ordinary and orthogonal leastsquares regression, as expected. For the non-Gaussian distributions, the new measures are larger, demonstrating their better characterisation of dependence and compactness relative to that of their Gaussian counterparts. Note that the compactness of (b)-(e) is measured through comparison with (a); visual inspection demonstrates that (b)-(e) are all more tightly concentrated around a lower dimensional curve (and are therefore have smaller "effective volume") than is (a). Non-Gaussianity of (d) or (e) is measured through comparison with (b) or (c), respectively; it is evident from inspection of (d) that the same probability mass is concentrated in smaller volume in (d) than in (b) [and similarly for (e) and (c)], consistent with the geometric interpretation of our measure of non-Gaussianity.
The measures of dependence, compactness, and non-Gaussianity considered in this study are defined by the distance between the given pdf and an appropriate reference pdf, as measured by the relative entropy. Many other distance measures between pdfs have been proposed, such as Bregman's distance, Bhattacharyya distance, the chi-squared statistic, and the Kolmogorov-Smirnov distance (e.g. Pardo, 2006). Despite the availability of a wide class of measures, we feel that the measures that we have proposed are especially attractive because they connect to more traditional measures used in geophysics (e.g. the fraction of variance explained by least-squares regression or PCA).
For bivariate distributions, the information theoretic measures of dependence and compactness considered in this study are generalisations of the corresponding measures of covariability obtained from the classical linear measures provided with ordinary least-squares regression and Principal Component Analysis. A fundamental challenge with the use of these information theoretic measures in exploratory data analysis is their estimation from a finite sample. Estimators of the measures of structure themselves, as well as the associated sampling error, are required for their practical application. Classical hypothesis testing (e.g. determining if one of the proposed measures is significantly different from zero) will require the development of parametric or non-parametric techniques for computing confidence intervals which is beyond the scope of the present study. The estimation problem for information theoretic measures is an active field of research (e.g. Kleeman and Majda, 2005;Haven et al., 2005); we are confident that as robust estimators become available, the measures of dependence, compactness, and non-Gaussianity discussed in this study will demonstrate their utility as practical tools for exploratory data analysis in geophysical data sets.
The product of the marginal distributions in the untransformed coordinates is q(x) = p x (x)p y (y) = 1 2πσ x σ y exp − 1 2 x T C −1 x , (A6) which is Gaussian with covariance matrix Under the coordinate transformation, q(x) remains Gaussian with new covariance matrix C = cos 2 φσ 2 x + sin 2 φσ 2 y cos φ sin φ(σ 2 x − σ 2 y ) cos φ sin φ(σ 2 x − σ 2 y ) sin 2 φσ 2 x + cos 2 φσ 2 y .(A8) Clearly, C is not the covariance matrix of the product of the marginals in the transformed coordinate system: with σ x and σ y given by Eqs. (A3) and (A4). That is, the transformed product of the marginal distributions is not equal to the product of the transformed marginal distributions. While the integral defining mutual information is invariant under an orthogonal rotation mixing variables, its identity as the mutual information is lost.