Climate projections simulated by Global Climate Models (GCMs) are often used for assessing the impacts of climate change. However, the relatively coarse resolutions of GCM outputs often preclude their application to accurately assessing the effects of climate change on finer regional-scale phenomena. Downscaling of climate variables from coarser to finer regional scales using statistical methods is often performed for regional climate projections. Statistical downscaling (SD) is based on the understanding that the regional climate is influenced by two factors – the large-scale climatic state and the regional or local features. A transfer function approach of SD involves learning a regression model that relates these features (predictors) to a climatic variable of interest (predictand) based on the past observations. However, often a single regression model is not sufficient to describe complex dynamic relationships between the predictors and predictand. We focus on the covariate selection part of the transfer function approach and propose a nonparametric Bayesian mixture of sparse regression models based on Dirichlet process (DP) for simultaneous clustering and discovery of covariates within the clusters while automatically finding the number of clusters. Sparse linear models are parsimonious and hence more generalizable than non-sparse alternatives, and lend themselves to domain relevant interpretation. Applications to synthetic data demonstrate the value of the new approach and preliminary results related to feature selection for statistical downscaling show that our method can lead to new insights.

Climate change is one of most challenging problems facing humankind. Its
impacts are expected to influence policy decisions on critical
infrastructures, management of natural resources, humanitarian aid, and emergency
preparedness along with numerous regional-scale human economic and social
activities. Therefore, it is imperative to accurately assess the impacts of
climate change at regional scale in order to inform stakeholders for
appropriate decision making related to mitigation policies. Global climate
models (GCMs) are the most credible tools at present for future climate
projection that accounts for the effects of greenhouse gas emissions under
different socio-economic scenarios. Although GCMs perform reasonably well in
projecting climate variables at a larger spatial scale (

A complementary approach for regional projection is statistical downscaling
that uses statistical models to learn empirical statistical relationships
between large-scale GCM features (predictors) and regional-scale climate
variable(s) (predictands) to be projected. The statistical approaches of
downscaling can be categorized into three broad classes – weather typing,
weather generators, and the transfer function approaches

In this paper, however, we are interested in transfer function based
regression models that learn a linear or nonlinear mapping between large
scale predictors and regional scale predictand variables. Regression models
are conceptually the simplest of the three classes since they provide a direct mapping
between the predictor and predictand values. However, the success of the
regression models depends on the accurate choice of predictors. Sparse
regressions based on constrained L1-norm

However, large complex climate data sets often exhibit dynamic behavior

Although the number of different components may not be known, prior knowledge
often exists about whether a pair of observations belong to the same
component. For example, it is reasonable to assume that two observations
close in time from the same location may exhibit similar behavior. We allow
soft “must link” constraints between pairs of data-points that encourage
the pair to belong to the same mixture component. Such constraints are
incorporated in our Bayesian model with the help of a Markov random field
(MRF) prior over the cluster indicator variables

Variational Bayesian (VB) inference has been shown to be much faster than
stochastic alternatives for nonparametric Bayesian models

We have extensively demonstrated the performance of our algorithm on synthetic data. We have also applied our method to the feature selection problem for statistical downscaling of annual average rainfall over two regions on the west coast of the USA. Preliminary results from the application of our algorithm to select features for regression based statistical downscaling show that our method may lead to improved prediction and discovery of new insights.

In this section, we provide brief descriptions of the methods in the context they were used to build our model.

Let us assume that we are given a data set

However, in a Bayesian setting, the sparsity can be imposed by a Laplace
prior (also known as double exponential distribution) on

An MRF is represented by an undirected graphical model
in which the nodes represent variables or groups of variables and the edges
indicate dependence relationships. An important property of MRFs is that
a collection of variables is conditionally independent of all others in the
field given the variables in their Markov blanket. The Hammersley–Clifford
theorem states that the distribution,

The Dirichlet process (DP) was first introduced in statistics literature as
a measure on measures

Draw

Draw

Generate

For each data-point

Draw

Draw

We can truncate the construction process at

Now, let us assume that we are given a data set

We introduce

For this mixture of sparse regressions model, each component has a separate
parameter set

Graphical representation of the complete Bayesian hierarchical model.

The graphical model that represents the dependence relationships between all
the parameters involved in this current mixture model is shown in
Fig.

Prior knowledge about must link constraints between pairs of data-points can
be enforced via an MRF prior on the indicator variables

Let us consider all the unknown parameters in our model as latent variables
and denote all the latent variables by

Once we apply Eq. (

1. Distribution of

The first part of the variational posterior of

In order to automatically generate a sparse constraints set, we first
implemented all the constraints in the form of edges and then used a graph
partitioning algorithm

Left panel: ability of nonparametric unregularized and sparse regressions (unconstrained and constrained) to correctly identify clusters in presence of increased number of actual components in the data. Right panel: ability of nonparametric unregularized and sparse regressions (unconstrained and constrained) to correctly retrieve the sparse structure within each cluster.

The parameters of each of the distributions has dependency on moments of one
or more of the other variables. We therefore find a locally optimum solution
via an iterative process that starts with random initial values of the
relevant moments and stops when the indicator variables

One computational bottleneck of the proposed VB algorithm is the inversion of
the

We have truncated the infinite DP at

We have evaluated our method on both synthetic and climate data sets. Typical
values used for the hyper-parameters were

We compared the performance of both constrained and unconstrained versions of our method with the non-parametric mixture of linear regression (NPMLR) model without any regularization. We set up three experiments: (1) to test whether or not our algorithm can learn the correct number of clusters; (2) to evaluate the effect of constraints; and (3) to check the sensitivity of our approach to noise.

For all our experiments involving synthetic data, we used

The second experiment was performed to evaluate the effect of number of
“must link” constraints on the performance of the constrained version of
the algorithm. Here, the actual number of clusters was fixed at

In our third experiment, we evaluated the effect of noise on the performance
of our algorithm. Again, we kept the number of clusters fixed at

We measured two aspects of the performance of our algorithm. First,we measured whether
it can cluster the data-points correctly. We put a data-point into one of the
possible 20 components (since we truncated the infinite DP at

Performance of the constrained version of the algorithm (in terms of NMI (more the better)) with number of “must link” constraints.

A second metric is used to evaluate the quality of the sparse regression
model estimated within each discovered cluster. Here we are only interested
in finding whether our algorithm picks the non-zero coefficients correctly.
We use

We can see the performance of all three algorithms are comparable in terms of
identifying the clusters correctly, although the NMI value of NPMLR degrades
significantly for

Left panel: ability of nonparametric unregularized and sparse
regressions (unconstrained and constrained) to correctly identify
clusters (indicated by NMI) with increasing noise. Right panel: ability of
nonparametric unregularized and sparse regressions (unconstrained
and constrained) to correctly retrieve the sparse structure within
each cluster (indicated by average

The increased flexibility of non-parametric methods comes at a cost of hitting local optima being more likely and finding solutions that are not interpretable. Adding more constraints may decrease this probability but at the same time restricts the variational method from finding solutions leading to a larger lower bound, especially in the presence of more components in the data. Therefore increasing the number of constraints may result in more interpretable solutions, but not improved accuracy. It is also encouraging to see that our method is relatively robust to added noise, a major challenge with the real data sets, especially in terms of correctly identifying the sparse structure.

A grand challenge in climate science relevant for adaptation and policy
remains our inability to provide credible stakeholder-relevant “statistical
downscaling”, or to develop statistical techniques for more accurate,
precise and interpretable high-resolution projections with lower-resolution
climate model data

Existence of multiple states or patterns is acknowledged in regression-based
statistical downscaling literature for rainfall

Since rainfall follows a log-normal distribution

Map showing climatologically homogeneous regions over continental US.

Left panel: distribution of average rainfall over all sites in the western US. Right panel: distribution of average rainfall after transformation.

Potential features used can fall in one of two broad categories – local
atmospheric variables and large-scale climate indices. Local covariates
originate from each station and exhibit both spatial and temporal
variability. Annual and seasonal averages of maximum temperature fall in
this category along with sea level pressure (SLP), and convective
available potential energy (CAPE). A dependence on any of these variables
roughly indicates dominance of local convective rainfall in the region. Daily
rainfall station data were obtained from US Historical Climatology
Network (USHCN)

Potential features used for statistical downscaling of rainfall.

Climate indices are global variables that represent large-scale signals in
climate variables. A list of covariates used for each category is given in
Table

Left panel: location of stations and their cluster membership in the western region. Right panel: location of stations and their cluster membership in the northwestern region.

We could use the covariates between 1979 and 2011 as SLP and CAPE is
available only for that period. Also, if more than 50 % of the daily
observations in a year are found to be missing for any covariate at
a specific location, we simply discarded all covariates for that year and for
that specific location. We averaged monthly climate indices and daily local
variables over a year. Finally the annual/seasonal average time-series of
predictors for each station were merged for a homogeneous region under
consideration. West (CA, NV) and northwest (WA, OR, ID) regions are shown by
gray shaded areas over the US map in Fig.

We applied spatial “must-link” constraints among pairs of data-points
from the same location. Ideally, if there are

A quick look at the histogram of target variable (right panel in
Fig.

DPMs automatically find the number of clusters

In this paper, we propose a nonparametric Bayesian mixture of sparse regression models for simultaneous clustering and discovery of covariates within each cluster using a DP mixture model. Moreover, our model can accommodate prior knowledge about “must link” constraints between the pair of data-points using a Markov Random Field prior on the cluster membership variables. Our major contribution is to develop an efficient and scalable variational inference algorithm for inference on the fully Bayesian model. We applied our method to both synthetic and real climate data and successfully discovered multiple underlying behaviors in the data. Preliminary results of applying our method to feature selection for statistical downscaling of rainfall show promise towards finding new climate insights with appropriate caveats. Going forward, we would like to incorporate priors for diversity among the clusters in order to discourage merging of close but dissimilar clusters. We intend to extend our model for predictive analysis and build a full-scale statistical downscaling method using the features selected by the current model.

This work was funded by the NSF Expeditions in Computing grant “Understanding Climate Change: A Data Driven Approach”, award number 1029166. We thank the anonymous referees for their valuable suggestions and comments. Edited by: V. Kumar Reviewed by: three anonymous referees