Introduction
Empirical evidence of past climatic conditions are fundamentally based on
reliably constructed proxy records. Proxies are measurable quantities, such
as pollen, isotope fractions, grain size, etc., that correlate with past
physical variables that cannot be directly measured, such as temperature and
precipitation , and these are obtained from natural archives
like lakes, stalagmites, and peats. In this paper, we use the term “proxy
record” to denote exclusively the set of proxy values (or probabilities of
proxy values) over time.
The interpretation of a proxy record is often not straightforward due to
unknown factors (e.g., anthropogenic impacts, internal dynamics, level of
correlation with the climatic variable – cf. ) that
might have influenced the proxy. The situation is further complicated by the
fact that every proxy record has, within it by construction, errors related
to the dating of the archive from which it was obtained. Clearly, proxies
cannot be measured directly along time. They are measured along the depth of
an archive, and the archive depths must then themselves be dated in a
separate set of observations. In this sense, the proxy record as a function
of time is a derived estimate, and it is one in which the time axis is not
error free, because the radiometric ages of the archive depths have
non-negligible errors of measurement. As a consequence of the latter, it is
challenging to estimate the uncertainty of proxies as well as to carry out
proxy record intercomparisons.
Quantifying the uncertainty of the proxy at a given time in the past is of
paramount importance. Even though the proxy record is a derived measurement,
it is still, in its essence, a measurement, and the representation of any
measured quantity without an uncertainty (error) of measurement can lead to
misleading conclusions. For instance, the assessment of correlations between
data sets can be dramatically influenced by whether or not the uncertainty of
the data is properly represented seeIntro..
Recent studies have raised this issue along with the fact that – although
there are various approaches to estimating age–depth relationships for a
given archive and constraining their uncertainties – relatively little focus
has been directed towards the propagation of the age uncertainties in the
proxy record from the age model . Notably, in Fig. 2 of
, we find one of the first ever representations of the
uncertainty of a proxy record (shown as a grayscale of possible proxy
values), which quickly draws attention to the parts of the record that are
more reliable and those parts that are not. This idea was further extended
and applied in later studies
. More recently,
estimated the uncertainty of the proxy
record with regard to trend and periodicity estimation of the proxy.
have put forth COPRA – Constructing Proxy Records
from Age – models a heuristic numerical method that treats age modeling as
an intermediate step in the construction of the proxy record, and thereafter
presents the final proxy records along with their uncertainties of estimation.
However, a general, mathematical, and non-numerical framework
(based on thorough estimation-theoretic principles) that can guide the
estimation of the uncertainty of proxy record estimations is still lacking.
The main objective of this paper is to provide mathematical expressions for
the probabilities of possible proxy values at a given point in time that can
then be used to estimate the proxy record and its uncertainties. We achieve
this by using a Bayesian analysis that tells us which depths are more likely
than others for every calendar age – and not, as is the case in many
studies, the other way round. Using conditional probabilities, we represent
the proxies on an error-free timescale with the intent of allowing easier
intercomparison of proxies. We test our approach using synthetic examples and
then apply it to reconstruct Holocene proxy records of groundwater inflow and
surface erosion from India.
Our approach is generic in the sense that it is valid for all radiometric
dating information. Furthermore, to make it more widely applicable, we do not
use any sedimentation model. In cases where a convincing and more specific
sedimentation model is available, it can be included in our Bayesian
framework as prior distributions, as is typically done in several
age-modeling approaches . In this
regard, the approach presented in this study also has the potential to be
incorporated into existing age modeling frameworks such as COPRA
, StalAge , clam
, or OxCal . Moreover, it could also be
modified and extended to be applicable to other dating methods such as
dendrochronlogy, luminescence dating and tephrochronology.
Lastly, we stress that the fundamental result of our analysis is the
probability distribution of proxy values at individual time points in the
past (we return to this point in more detail in
Sect. ). As such, the two distributions derived
for the proxy values at two distinct time points tell us almost nothing about
the relationship between these two proxy values, for which one would rather
need to estimate the joint distribution of the two – a task that is not
attempted here, but that could be dealt with with similar methods. A similar
situation occurs in the estimation of the radiocarbon calibration curve, and
is discussed in . Still, in visualizing the distributions
(and mean/median estimates derived from them) obtained at numerous time
points over a contiguous time period, the observer might “see” trends and
patterns of variability, and may be tempted to interpret these as trends and
variations in the actual paleoclimatic conditions. In order to quantify
trends and variations in the proxies, the analysis presented in the following
sections has to be modified with the specific goal of arriving at, let us say
for example, the probability distributions of the amount of change in the
proxy values at individual time points in the past. Conclusions about
different aspects of the proxy record based on the current analysis, on
anything other than the probability distribution of the proxy at an
individual point in time, can at best be qualitative, and is strictly limited
by the level of uncertainty involved. The greater the uncertainty, the less
we can infer.
Outline of posterior proxy probability derivation. The box presents
a rough sketch of the derivation of the posterior proxy probability
densities. We denote the proxy, depth, time and radiometric age with the
random variables X, Z, T, and R, respectively. The essential quantity
of interest to be estimated is the posterior conditional density P(x|t),
which is finally expressed in terms of the measured/given quantities using
the above steps. For a detailed explanation, please refer to
Sects. and .
Theory
Preliminary considerations
We distinguish between two types of radiometric dating methods: (i) those
that do not require calibration, e.g., U/Th dating, and (ii) those that
require radiometric calibration, e.g., 14C dating. In the following
subsections, we restrict ourselves to the more complicated case of 14C
dating, since the first type can formally be considered to be a special case
of the second one, i.e., with a trivial diagonal calibration curve without
uncertainty.
The information that is given is (i) a set of radiometric ages against the
depth of an archive, viz., a dating table, (ii) a much larger set of
proxy values against the depth of the archive, and, (iii) for the case of
radiocarbon dated archives, a 14C calibration curve, viz., IntCal13
. All three data sets have errors of measurement, although
the errors in proxy measurements are negligible in several cases.
Using the above data, we wish to answer the question what were the
most likely climatic conditions at a given point in time in the past? We can
rephrase this in terms of proxies as at a given time point in the
past, which proxy values are more likely to have occurred? The answer is
obtained using a probabilistic framework; i.e., we want to arrive at
probability distributions for the proxy value at all past time
points.
We use a Bayesian framework, since this allows us quite naturally to
transform partial knowledge about one type of information (here, about the
errors in measurement, dating, and calibration) into partial knowledge about
other types of information (here, the age–depth and age–proxy relationships
we are interested in). The relevance of a Bayesian approach is also
exemplified by its use in most analytical investigations into radiocarbon
dating and radiocarbon calibration e.g.,.
Readers may also refer to the introductory sections of and
for further explanatory notes on the use of Bayesian
statistics in age-modeling approaches.
The goal of this analysis is to arrive at the posterior distributions of the
given proxy at all points in time. A gist of the derivation of the posterior
proxy probability distributions is presented in
Fig. . A detailed explanation follows in the
subsequent sections and also in the Appendix.
Necessary assumptions
Before we proceed, it is important that we state and review the assumptions
inherent in our approach. We assume that
The 14C ages, as well as the proxy measurements, are
sufficiently well described by a normal distribution, with the mean at the
measured value and a standard deviation equal to measurement error. This is
motivated by the fact that, in most cases, observations (and associated
errors) are adequately described by the mean (and standard deviation) of a
Gaussian process. Still, in principle, our method can be used for any
probability distribution.
The errors in depth measurements are negligible. The precision of
depth measurements motivate this second assumption, and it is made for the
sake of analytical simplicity. We note that most of the existing age-modeling
techniques involve this assumption, and also that our approach would have to
be modified in cases where depth uncertainties cannot be neglected.
All radiocarbon ages and depths are assumed
to be equally likely a priori. This prior belief is then updated within the
analysis using the set of radiocarbon age–depth measurements.
The radiocarbon ages provided do not contain outlying values (age
reversals) caused, e.g., by contaminated material. Dealing with outliers is
beyond the scope of this analysis.
All obtainable growth-related information about the archive is
contained in the given radiocarbon dating points. We note that, although at
times, additional sources of growth-related information are available, we do
not consider such situations in this preliminary exposition.
The radiocarbon age of a certain position in the archive is an
unknown, smooth, continuous function of its depth, denoted here as the
radiometric (RM) age model. The presence of a hiatus is a crucial
issue regarding this. We advise that if a hiatus is known to have occurred at
a particular depth (from sources other than the radiometric ages), the RM age
model can be split at the hiatus depth into two smaller, independent RM age
models – an approach used in .
Depth-spanning weight functions
In his discussion on deposition models for chronological records of
paleo-archives, has aptly articulated the fundamental idea
behind the construction of age models as “What we are aiming to do
mathematically is […] to find a representative set of possible ages for
each depth point in a sedimentary sequence.” Mathematically speaking, the
fundamental idea behind our approach is similar, as we too seek to establish
a probabilistic relation between the depth and age as a first step. However,
our approach differs from the idea above in that we obtain this relation
in the opposite direction; i.e., we aim to find an ensemble of depths
for each calendar age.
Depth-spanning weight functions (DWFs). (a) Radiocarbon
calibration curve representing the mean (red) and ±2 standard deviations
(blue) of the distribution P(r|t). Inset: calibration curve around 5.1 kBP
detailing non-monotonicities. (b) (i) 14C measurements from a
simulated archive (circles with error bars representing the distribution
P(r|zkr)) and the RM age model obtained from them: mean (red)
and ±2 standard deviations (blue) of the distribution P(r|zjx)
that results from the used regression method. (ii) DWFs for the calendar ages
2.2 kBP (purple) and 9.5 kBP (orange). The corresponding colored dashed
lines with arrows indicate how a given calendar age is related to a
distribution over probable depths via the DWF. All uncertainty bounds
correspond to ≈ 95 % confidence. (Color
online.)
Let Z, R, and T be three variables that denote depth, radiocarbon age,
and calendar age, respectively. Also, let the unknown proxy variable be
denoted by X. We begin the task of finding such an ensemble of depths for
each calendar age by asking: what is the probability that a depth
Z = z corresponds to a given calendar age T = t? By the law
of total probability, this is proportional to the probability of finding a
radiocarbon age R = r at T = t multiplied by the probability
that the depth Z corresponds to r, integrated over all possible
values r of R. Formally,
P(z|t)∝∫drP(r|t)P(z|r).
Now, using Bayes' theorem, we see that the second term on the right-hand side
of Eq. () is a form of a posterior distribution, and is thus
proportional to the product of some prior probability P(z) and the
likelihood P(r|z) (for a more detailed discussion, see
Sect. ). Assuming a priori that all depths are equally
likely, we use the “flat” prior P(z)∝ const. Thus, if the proxy is
measured at depths Z = zjx for j = 1, 2, …, N,
using the example of Eq. (), we can define a
depth-spanning weight function (DWF) at all proxy measurement depths
zjx as
wtzjx=∫drP(r|t)Pr|zjx.
The term P(r|t) is simply the radiocarbon calibration information that
gives the probabilities of possible radiocarbon ages r given a particular
calendar age t. The second term in Eq. (), P(r|zjx), is
the RM age model that gives the probability of the radiocarbon age r given
a particular depth zjx.
In combining these two terms under the integral, the DWF wt(zjx)
constructed is thus proportional to the probability that a depth zjx
will correspond to a given calendar age t in the archive. For each
different value t of the calendar age T, a new DWF has to be constructed
over the depths zjx. A schematic of two representative DWFs for a
simulated archive is shown in Fig. along with the
information used in their construction. A couple of points to note from
Fig. are (i) that the shape of the DWF can be quite
different for different calendar ages, and (ii) that the peak of the DWF
denotes the most probable value of Z for a given RM age model and a given
value of T. This approximately corresponds to the value of Z at which the
expected value of R will equal the expected value of R for the given
value of T. We illustrate this point with the help of the arrowed dashed
lines in Fig. .
Note: conventionally, the radiocarbon calibration involves estimating the
probabilities of all possible calendar ages for a given radiocarbon age along
with the uncertainty in estimation. This is analogous to proceeding in an
counterclockwise direction in Fig. . In our analysis, we
avoid “calibration” in this sense by choosing to estimate a relationship
between calendar ages and corresponding depths in the other direction – as
shown by the clockwise sense of the arrows in Fig. .
At this stage, it becomes important that we elaborate on the construction of
the RM age model and its relevance.
The RM age model
The construction of the DWFs involves the term P(r|zjx), which has to
be well defined at each depth where the proxy is measured. However, RM age
observations are limited to significantly fewer numbers of depth points. Let
us say that the number of radiocarbon age measurements is M, where
M ≪ N, and the corresponding depths are denoted by
zkr, k = 1, 2, …, M. Then, from this set of
measurements, we get M conditional probability distributions
P(r|zkr) for R, k = 1, 2, …, M. Since M is
much less than N, we need to be able to use the set of RM age–depth
observations to construct a data set that gives us the N probability
distributions P(r|zjx), j = 1, 2, …, N. This is achieved
by estimating an RM age–depth relation, i.e., the RM age model.
In principle, the role of the RM age model can be seen as
Pr|zkr⟶RM age modelPr|zjx,
where k = 1, …, M, and j = 1, …, N. This resonates
with the conventional framework of age modeling, except that we use
radiocarbon ages instead of calendar ages.
In this study, we use a non-parametric Taylor-polynomial-based regression
method given by , a data-driven approach that uses a
combination of Bayesian updating and Taylor expansion about a point of
interest to provide an estimate of the smooth curve from which the
observations have been sampled. However, any regression method that estimates
a posterior probability distribution could be used equally, as well as any
Monte Carlo based method that generates ensembles of interpolated RM
age–depth relations from the set of observations to arrive at a mean/median
estimate along with a standard deviation.
Note: in the case of dating methods that do not require radiometric
calibration (U/Th dating being one such example), the first term of the
right-hand side in Eq. () is equal to 1 if r = t, and 0 if
r ≠ t. In such cases, even though the fundamental idea behind
Eq. () remains intact, the age model involved in it is no
longer the RM age model, but is, in fact, the (calendar) age model – or what
is simply known as the age model of the archive.
Estimating the proxy and its uncertainty at an individual time point
Once we have estimated the set of wf functions, it is
straightforward to estimate the proxy and its uncertainty. For this, we need
to consider the probability encoded in wtf(zjx), for
each depth zjx, as a weight for the corresponding proxy measured at
that depth. Since X denotes the unknown proxy at a given calendar age t,
we thus estimate the probability P(x|t) as
P(x|t)=Weighted mean ofPx|zjx,
where the weights are derived from the corresponding DWFs at T = t
(see Appendix for details).
We now have a probability distribution for the proxy values at any given
time t and, using this, we can estimate the mean/median, as well as
uncertainty bounds constructed using percentiles or variance. In this study,
we restrict ourselves to median proxy values and represent the associated
uncertainty of estimation with (i) a 95 % confidence band constructed from
the region lying between the 2.5th and 97.5th percentiles, and (ii) a 50 %
confidence band constructed from the region lying between the 25th and
75th percentiles of the distribution P(x|t). Appendix
provides a more thorough treatment of the above. Also, in the following
sections, the mean/median curves are represented as dotted lines rather than
as continuous curves, to emphasize that they are derived from the probability
distribution at individual time points and do not have any relation to values
at other time points.
Incorporating monotonic growth
The DWFs constructed in Sect. relates any given calendar age
probabilistically to different depths, based on how likely they are to
correspond to that age. This relation does not, however, include one specific
feature of sedimentary records: the constraint of stratigraphically
ordered growth. In other words, a stratigraphically deeper layer cannot be
younger than any layer above it. We thus need to incorporate this constraint
into the initial DWFs derived in Eq. (), henceforth
denoted as wi, and obtain a final set of DWFs
(henceforth wf) that takes it into account.
The DWFs are essentially a set of probability distribution functions, and to
impose an unambiguous monotonicity constraint on such a set is non-trivial.
To overcome this, we consider the cumulative probabilities of each DWF. A
cumulative depth-spanning weight function (CDWF) is the probability
that a given calendar age t corresponds to any depth less than or equal to
a depth djx. Formally, after ordering depths such that
d1x ≤ d2x ≤ ⋯ ≤ dNx, we put
Wtizjx=∑l=1jwtizlp.
The CDWF, Wti(zjx), by definition, increases along the
depth axis zjx from 0 to 1. (Note: we denote the cumulative
probability distribution with an uppercase letter “W” to distinguish it
from the corresponding probability distribution “w” denoted by the
lowercase letter.) Our task now is to ensure that it is monotonically
decreasing along the age axis T. This means that if we take a depth
zjx and two ages t1 and t2 such that t2 is greater
(i.e., older) than t1, the total probability that t2 will
correspond to a depth ≤ zjx cannot be more than the total
probability that t1 will correspond to a depth ≤ zjx –
which is the condition of monotonic growth. Formally: we would like that
Wt2(zjx) ≤ Wt1(zjx).
The above condition for monotonic growth along the age axis is violated
slightly but noticeably on many occasions. This is shown in
Fig. a, where the wiggles in the white grid lines
parallel to the calendar age axis illustrate the non-monotonicity. Since it
would be quite difficult to enforce the desired monotonicity already in the
step where P(r|zjx) is estimated from P(r|zkr), we
instead fix the slight non-monotonicities after having derived
Wi. This results in a final CDWF Wf that can then be
transformed into the final DWF wf via
wtfzjx=Wtfzjx-Wtfzj-1x.
To estimate the final Wf that adheres to monotonicity, we use the
principles of relaxation dynamics, and the details of this estimation process
are discussed in Sect. . In short, we start with a suitably
chosen set of CDWFs that are already monotonic in T, and then iteratively
drag (pull and push) this function in order to minimize its distance from the
initial CDWF, Wi, as far as it is possible to do so without
violating its monotonic nature. The final equilibrium set of CDWFs is that
which cannot be moved any closer to Wi by any form of
dragging without compromising its monotonicity. We denote this set as
Wf (shown in Fig. b).
The age–depth sea cliff. (a) The initial CDWF
Wi obtained using Eq. (), and (b) the final
CDWF Wf obtained after imposing monotonic growth, for the same
example as used in Fig. . Values of W = 0 are shown
in blue and, for W = 1, they are shown in green. Intermediate values
of W are shown with increasing darkness from light brown to dark brown, as
the value increases from 0 to 1. The overlay of white lines over the CDWF
surface highlights the differences between the pre-monotonicity and
post-monotonicity scenarios. The purple and orange lines are the CDWFs for
the ages 2.2 and 9.5 kBP as used in Fig. , respectively.
The effects of imposing the condition of monotonic growth on the CDWFs is
seen more prominently in the white grid lines within the marked yellow
circles. (Color online.)
The age–depth sea cliff
The visualization of the CDWFs in Fig. is likened to
a sea cliff where W = 0 is shown as the blue sea, and W = 1 as
the green highland. All intermediate values of W are contained in the
sudden rise of the brown cliffs.
As stated earlier, the fundamental idea behind age modeling is to arrive at a
relationship between the calendar ages and the depths of a paleo-archive. A
set of functions that perform this function can be thought of, in a broader
sense, as an “age model”. The CDWFs visualized in
Fig. are thus analogous to an age model in our
analysis. However, we wish to emphasize that the construction of the CDWF
relations did not involve assumptions about the growth and/or sediment
accumulation of the archive, and was entirely a data-driven estimation, with
the sole input of the principle of monotonic growth of the core.
In this sense, the age–depth sea cliff is a formal age model that, in future
studies, could be developed further to incorporate specific growth
conditions, leading to a better estimation of the (calendar) age–depth
relation of the archive.
A review of the steps involved in proxy estimation
Before we present the applications and results, it is useful briefly to
summarize the salient steps involved in estimating the proxy records and
their associated uncertainty using our approach.
We construct the RM age model using an appropriate regression method
that provides the posterior distributions of radiometric ages at the proxy
measurement depths.
We estimate the DWF that relates any given calendar age to the proxy measurement depths in terms of the likelihood that
they correspond to the chosen calendar age.
Next, we construct cumulative weight functions (CDWFs) from the initial
set of DWFs obtained in the previous step. The CDWFs are used to impose the
constraint of stratigraphically ordered growth of the archive.
We thus obtain a final set of CDWFs that are consistent with such
monotonic growth over time, and we derive a final set of DWFs from them.
For each chosen calendar age, we use the corresponding
stratigraphically ordered DWF to weight the proxy measurements over depth and
thus obtain a posterior proxy probability distribution.
We use the posterior proxy probability distribution at each calendar
age to estimate quantities of interest such as the mean or median proxy
values for that age. Furthermore, we also estimate uncertainty estimates such
as quantile ranges or variance.
Applications
We first consider two synthetic examples in which we know the actual proxy
record as well as the age model, and test the performance of our approach in
estimating proxy records. Next, we estimate the groundwater inflow and
surface erosion proxies from the Lonar lake in central India and compare our
results with those obtained by using an age model generated using OxCal. A
discussion of the results follows in the next section.
U/Th-dated synthetic stalagmite. Legend as in
Fig. . (a) The straight line of slope 1 used in
place of a calibration curve. Inset: unlike a real calibration curve, this
line has no error. (b) U/Th measurements from a synthetic
stalagmite (circles with error bars) and the estimated radiometric age model
obtained from them by regression (red line with blue ≈ 95 %
confidence band), along with the actual age model (black dashes).
(c) The proxy curve along stalagmite depth obtained from noise-free
proxy measurements. (d) The actual proxy record (black), shown
alongside the estimated median proxy record (red, dotted), along with the
associated uncertainty of estimation (sky blue denotes the interquartile
range, i.e., 50 % confidence, whereas light blue denotes the region between
the 97.5th and 2.5th percentiles, i.e., 95 % confidence). The estimated
record in D is demarcated into three distinct regions in terms of the
frequencies it resolves: (a) when it resolves both frequencies of the true
sinusoidal proxy record (green), (b) when it resolves only the lower
frequency (orange), and (c) when it is unable to resolve either of the two
frequencies (purple). (Color online.)
Synthetic examples
To illustrate our method as well as to test its efficacy, we consider two
types of paleo-archives: (i) a stalagmite extending over 0–28 kBP dated
with U/Th, and (ii) a lake sediment core extending over 0–11 kBP dated
with 14C. From our perspective, the crucial difference between the two
is that for the lake sediment, the radiocarbon ages have to be calibrated
using IntCal13, whereas this is not needed for the U/Th ages. To simulate
sediment growth, we follow , such that the sediment
accumulates with an initial growth rate of 20 yr cm-1. At subsequent
depths, a non-negative growth rate is chosen from a normal distribution that
has the growth rate of the previous year as its mean and a fixed standard
deviation of 7 yr cm-1. In both cases, the proxy values are simulated
as a sinusoidal signal consisting of two components with different time
periods. Also, the proxy data sets were generated annually, i.e., with a
proxy value for every year. We simulate a few noisy radiometric age
measurements and a much higher number of almost perfect proxy measurements
(error of 0.001). These “observations” are then used to estimate the proxy
record with our method.
U/Th-dated archives
The results for the synthetic stalagmite are shown in
Fig. . In this case, the calibration curve (as shown
earlier in Fig. ) is replaced by a straight line of slope
1, without any error (Fig. a). This is possible because
the U/Th radiometric ages can be identified with the calendar ages. The
observational noise for the U/Th age measurements increases with the depth
of the stalagmite to a maximum of 5 % (Fig. b). The
proxy signal has two components with time periods of 2000 and 400 years
(Fig. d). Note that the proxy signal can be distorted in
the depth domain, depending on the nature of the actual age–depth relation
(Fig. c and d).
14C-dated synthetic lake sediment. Legend as in
Fig. . (Color online.)
14C-dated archives
Figure shows the results of our method as applied to
the synthetic lake sediment core. In Fig. a, we see
the irregularities of the radiocarbon calibration curve and its estimation
uncertainty. The error in radiocarbon age measurements in
Fig. b increases with depth, as in the previous case.
The proxy signal used in this case has two time periods of 1000 and
200 years. Here too, one can see that the proxy signal is distorted in
Fig. c, when compared to the one in
Fig. d; however, the distortion in this case is
mediated not only by the irregular RM age–depth relation, but also by the
calibration curve.
Holocene proxies from central India
As an application to a real-world scenario, we consider the set of age–depth
14C measurements from the Lonar lake in central India
. The radiocarbon ages used for the analysis are
tabulated in Appendix and labeled in
Fig. b. This included two 14C ages after 1950, L21
and L20a, for which we use the Northern Hemisphere 3 (NH3) “post-bomb”
calibration curve , and 17 pre-1950 ages, L19–L1, for which
we use IntCal13 .
For the proxy records, we take the Ca-area proxy for groundwater inflow and
the Al-area proxy for surface erosion from the same archive at Lonar. The
links of both the Ca to groundwater inflow (evaporitic carbonate (CaCO3)
formed during periods of low lake levels) and that of the Al to surface
erosion (lithogenics brought on by rain events) have been validated in
.
Holocene proxy records from central India. (a) The IntCal09
calibration curve. Inset: the post-bomb calibration curve that was used.
(b) The RM age model for the Lonar lake sediment core. For details
of the radiocarbon age–depth measurements, cf. Sect. .
(c, e) The groundwater inflow (Ca-area) and surface erosion
(Al-area) proxies as measured along the depth of the core.
(d, f) The corresponding proxy record estimates as obtained using
the Bayesian approach detailed in the text. The legend in these panels is the
same as that of Fig. d; “kcps” denotes “kilo counts
per seconds”. (Color online.)
These were obtained from a continuous down-core X-ray fluorescence (XRF)
(Avaatech XRF Core Scanner III) scanning of the Lonar lake sediment core
surface. The relative abundances of the elements (Ca, Al, Ti, Si, and K) were
recorded every 5 mm with the X-PIPS SXP5C-200-1500 detector from Canberra,
while the tube voltage was kept at 10 kV . The Al counts
were found to be strongly correlated with the Ti, Si and K counts obtained
from the XRF scanning (see
Appendix ). Due to this,
and combined with the findings of , where they show the
relation of the Al abundance to catchment erosion as well as the lithogenic
contents, we choose this as a representative proxy for the Lonar lake surface
erosion. We note that due to the difficulties in representing errors in XRF
measurements, we consider the proxy observations along depth to be error
free. This, however, does not change the fundamental objective of our
analysis, which is to estimate the final proxy uncertainties in an analytical
fashion and to investigate how they are impacted by proxy–depthvariability.
If the proxy measurements were to have errors, these would simply be added to
the final errors, as is indicated by Eq. ().
The final proxy records estimated are shown in Fig. .
We compare our results with proxy records obtained from a typical mean age
model of the archive. The age model involved OxCal P-sequence modeling with
three sedimentological boundaries imposed a priori.
Figure a and b compares the final proxy estimates
obtained using the OxCal P-sequence age model with those obtained using our
approach – for both Ca area and Al area.
Discussion
Proof of concept
The synthetic examples shown in Sect. illustrate the
validity of our approach. In Figs. d
and d, the first finding to note is that the 95 %
confidence band consistently contains well over 95 % of the black curve,
and the 50 % band consistently contains about half of the black curve. In
addition to this general fit between the true record and the estimated
confidence bands, one can compare the median estimate (red dotted curve) with
the true record (black curve), and distinguish three broad regions: (a) the
youngest portion of the proxy records (green region), where the median
estimates follow the true proxy series closely, and reproduce even faster
oscillations of 1/400 yr-1 (for the stalagmite) and
1/200 yr-1 (for the lake sediment core) accurately; (b) the
intermediate portion of the proxy records (orange region), where, for the
most part, the median estimates show only the slower sinusoidal component due
to larger dating uncertainties, but follow the lower frequencies of the true
proxy curves (1/2000 yr-1 for the stalagmite, and 1/1000 yr-1
for the lake sediment core) closely; and (c) the oldest portion of the
records (purple region), where the median estimates are almost flat curves,
due to the high uncertainties. The differences between these three regions
are due to the associated uncertainty in estimation of the proxy record,
which increases (as seen from the confidence bands) progressively from the
youngest to the oldest portions of the record, becoming as large as the range
of values in the end. The proxy uncertainty depends strongly on the errors of
the corresponding RM age models (Figs.
and b), which increase towards the oldest portion of
the cores as well, and the errors of the RM age models are themselves
influenced by the errors of the radiometric age measurements. We
would like to emphasize that the objective of our method is not to estimate
frequencies or variability, but to represent the available knowledge about
the proxy value itself at each given point in time in the best possible way.
The seeming inability of the proxy estimates to reproduce oscillations is a
necessary consequence of the posed research question when dating
uncertainties are large. Still, in regions where the errors in measurement
are small, the estimates capture the oscillations at both frequencies.
Comparison of results with OxCal. The proxy records estimated by the
present approach (red, dotted) compared to proxy records obtained by using an
OxCal P-sequence age model (dark gray) for the groundwater inflow (Ca-area,
a) and surface erosion (Al-area, b) proxies from the Lonar
lake. The legend for the confidence bounds to the Bayesian proxy estimate is
the same as in Fig. d and f. (Color
online.)
This discussion highlights two crucial factors: (i) proxy estimation errors
depend on contingent errors of age measurements (and, to a large extent, the
errors of calibration and proxy measurements), and (ii) the interrelation
between the estimated proxy uncertainty and the variations that are resolved
in the record. Both these issues are discussed in the subsequent sections.
Uncertainty of proxy estimations
From the proxy estimates shown in
Figs. d–d, it is apparent that
even though the proxies are measured to high precision along the depth of the
respective archives, the uncertainty in the proxy value for any given
time point is not negligible. This is in agreement with the results of
earlier studies, e.g., and . Also,
the final uncertainty is not the same as the error of the proxy–depth
measurements. The confidence bands span the whole range of values of the
proxy for error levels of ≈ 5–10 % of the radiometric age
measurements.
Uncertainty in proxy estimations (schematic). The proxy curve has
high variations in region A (purple) and low variations in region B (orange).
Two time points t1 and t2 are chosen such that their corresponding
DWFs cover most parts of regions A and B, respectively. The estimation
uncertainty is far greater for the proxy estimate at t1 than for t2
because of the corresponding high variability of the proxy in region A. Note:
the final uncertainty in the proxy estimate is not equal to the
corresponding variability in the depth domain, but is only proportional to
it. Hence, we denote the impact of proxy variability over depth (marked with
the dashed lines) on the final proxy uncertainty with an open arc. (Color
online.)
At a first glance at Figs. –, it
is obvious that the final proxy uncertainty is influenced by the calibration
uncertainty, RM age model uncertainty, and the proxy measurement error (if
any). However, a closer inspection of Fig. d and f
around 3–4.5 kBP reveals an additional factor. At around 3–4.5 kBP, we
find that Al area has much higher uncertainty in comparison to Ca area, even
though both have the same calibration curve and age model. Moreover, even the
proxy measurement errors for both were considered negligible in the analysis.
Then why is the uncertainty much higher for Al area?
To answer this, we proceed clockwise in Fig. from the
interval around 3–4.5 kBP to find that this range of calendar ages would
roughly correspond to the depth range of around 500–800 cm, using the given
calibration curve and RM age model. The critical difference between the two
proxies in this depth range is that the Al area has a much larger variance in
comparison to the Ca area. Thus, given the same DWF, the Al-area proxy
estimates would have a much larger spread relative to the Ca area.
To understand this in more detail, consider the schematic in
Fig. . The proxy curve shown in the figure has
distinctively high fluctuations in the purple portion of the curve, and is
then confined within a narrow band of values in the orange portion. On the
left side of the figure, we consider two points t1 and t2 on the
calendar age axis, such that the DWF wt1f(djx) of
t1 covers mainly the high-variability region A in
Fig. and, similarly, the DWF of t2 covers the
low-variability region B in Fig. . Thus, we get a
smaller uncertainty for the proxy at t2 than at t1 when we weight
the proxy values with the height of the respective DWFs. Hence, the
variability of the proxy measured along the depth also contributes to the
final uncertainty of its estimation.
The final uncertainty thus depends on four contingent factors:
calibration uncertainty (if any),
uncertainty of the RM age model,
errors in measuring the proxy along depth, and
variability of the proxy signal along the depth domain.
Variability of the proxy record
From Sect. , it is evident that the uncertainty
in proxy estimation is influenced by its own variability in the depth domain.
However, the proxy uncertainty is also closely related to the variability of
the median proxy estimate as well. This is seen in
Figs. –, where regions of high
estimation uncertainty are associated with low variability of the
median estimates along time and less resolution of higher frequencies, and
vice versa. This does not mean that the proxy itself does not have
faster variations. It simply implies that, in regions with high proxy
uncertainty, we cannot reliably comment on the fast variations of
the proxy. In order to quantify analytically the various fast/slow varying
components of the proxy and their uncertainties in a thorough fashion, we
should, in principle, as stated before, proceed with a separate analysis.
This is because the knowledge of the probability distributions of the proxy
at a given time are not sufficient to comment on the variations, especially
in the presence of non-negligible uncertainty in estimation.
The best one could say about the variability of the proxy-vs.-time in regions
of high dating uncertainty would be to estimate some aggregate measures of
variability such as the slope or curvature of the proxy curve or the
momentary amplitude of a certain sinusoidal component (at each calendar age).
For example, a simple way of obtaining a “central” estimate of the slope
dx/dt would be to use MoTaBaR to find the mean estimates
R = r(t), D = d(r), and X = x(d) that correspond to
given values T = t, R = r, or D = d, and then to use
the chain rule to calculate
dx/dt ≈ x′(d(r(t)))d′(r(t))r′(t). Equivalently, in
graphical terms, follow the dashed lines from t via r and d to x and
multiply the corresponding slopes of the calibration curve r(t), the RM age
model d(r), and the proxy-vs.-depth curve x(d) that you encounter on the
way (the calibration curve might have to be smoothened for this). In a
similar fashion, the second derivative can be estimated by applying the
product rule:
d2x/dt2 ≈ x′′(d(r(t)))d′(r(t))2r′(t)2+x′(d(r(t)))d′′(r(t))r′(t)2+x′(d(r(t)))d′(r(t))r′′(t).
Finally, if the proxy-vs.-depth curve shows a sinusoidal component of
amplitude ξ and period length Δd around depth
D = d(r(t)) (as could be seen, e.g., from a wavelet analysis), one
can conclude that the true climate-vs.-time curve contains a sinusoidal
component around time T = t of the same amplitude ξ and a period
length that can be estimated as
Δt ≈ Δd/d′(r(t))r′(t).
Interpreting the posterior probabilities: a note on proxy variations
A critical point arising out of the previous subsections is that the final
proxy estimate – such as the mean/median – when visualized over a period of
time, may not reveal short-time variations. For paleoclimatic studies
focussed on transitions taking place over short timescales, this can be a
major hurdle. Even in studies that wish to address climatic patterns
operating in the higher-frequency region, a proxy estimate that does not
resolve such frequencies is of little practical utility.
We stress that it is misleading to conclude that the proxy record does not
contain high-frequency components based on figures such as
Figs. –. As already stated in the
previous subsection, the fast varying components of the proxy are not ruled
out by the probability distributions. Rather, only in estimating the mean or
median might we be unable to say anything about them with confidence.
To understand how this is possible, note that the primary and foremost result
of our approach is a probability distribution of the proxy values at
each value of calendar age (shown in Fig. a, b as a color
map). Such a visualization is in principle similar to Fig. 2 of
– only that we obtain the visualization from mathematical
expressions, and not as a histogram of ensemble members.
Posterior probability distribution and proxy ensemble members.
(a, b) The posterior proxy distributions for Ca area (in a)
and Al area (in b) obtained by our approach. At every chosen value
of the calendar age on the vertical axis, our method provides a probability
distribution for the proxy along the horizontal proxy axis. The probabilities
are indicated by the color bar, with white representing zero probability.
(c, d) Two randomly chosen ensemble members (blue and green curves)
for Ca area (in c) and Al area (in d) out of all possible
proxy records given the probability distributions in (a) and
(b), respectively. Such individual ensemble members retain the
high-frequency components as well, indicating that the high-frequency
information is still contained in the posterior probabilities. (Color
online.)
We interpret the distributions as representing the probability densities of
an ensemble of possible proxy records; i.e., each member of this ensemble is
a record of one of many possible past climatic histories that fit the
available set of measurements and data. This is shown in
Fig. c and d, in which two such members of the ensemble are
shown for each of the two proxies from Lonar lake. They are constructed
by drawing random numbers from each proxy probability distribution at every
calendar age. It is immediately clear that the individual members of the
ensemble retain the high-frequency components as well. However, since we have
no way of knowing which of the infinite possible ensemble members actually
constituted the actual climatic history, we estimate the mean/median climatic
history and our confidence in it. The uncertainty bounds shown in this study
represent the impossibility (given a set of measurements) of determining
precisely the mean proxy value and hence, by extension, the mean
paleoclimatic condition that it would represent.
In order to be able to have a very narrow uncertainty range, efforts must be
made to reduce the various sources of error that contribute to the final
proxy error. We discuss the possibilities and limitations of this in the next
section.
Reduction of uncertainty
Let us take the example of the Lonar lake observations and ask: how
can we reduce the final proxy uncertainty? For this, we have to look at the
four factors that determine it. Among these, the calibration uncertainty
cannot be reduced until a more tightly constrained calibration curve is
released, the proxy–depth variance is beyond our control, and the proxy
measurement error is already set to zero in our analysis. Thus, we are left
with the sole option of reducing the RM age model error. This can be achieved
with additional radiometric dating of the archive, or by incorporating
layer-counted segments of the record that have relatively less error.
However, since we do not consider layer-counted data in our approach
presently, we will consider below the effect of adding more radiocarbon
dating points.
We might plan to make a few more measurements, especially around those depths
where the RM age model is not very precise, e.g., at around 700 cm
(Fig. b). Still, a significant portion of the final
uncertainty might also be due to the intrinsic variance of the proxy along
depth, and we thus need to understand fully exactly how much of the final
uncertainty is contributed by the age measurement errors. The highly non-trivial way in
which the final uncertainty is related to the RM age model uncertainty (via the
DWFs) makes it almost impossible to find a precise analytical answer to
questions of the type: if we make two 14C age measurements at depths d1
and d2 with a maximum error of ε, by what fraction z will the
uncertainty at calendar age t going to go down?
We can nevertheless get some insight into how much error is contributed by
the age uncertainty by considering a simple thought experiment. Let us assume
that we are able to reduce the RM age model uncertainty to zero by taking N
error-free radiocarbon age measurements at the precise depths of proxy
measurements. The variance of the DWF will then depend solely on the
calibration uncertainty and, in conjunction with the proxy's intrinsic
variance, this will determine the final proxy uncertainty. We can compare the
uncertainty levels of the proxy before and after setting the age model error
to zero. This is shown in Fig. a and b for the Ca-area
and Al-area proxies (Fig. c and d). We can see from the
figure that the final uncertainty of the proxy is not reduced by a great
amount (Fig. b and d) – even when the uncertainty of the
RM age model had been set to zero. Among the two proxies, the reduction of
uncertainty in the Ca-area record is more than that of the Al-area record.
This is because the Al-area signal has relatively higher variability than the
Ca-area signal (cf. Fig. c and e), and so the relative
contribution of the age uncertainties to the final proxy uncertainty is less
for the Al-area record than the Ca-area record. For both records, a reduction
in age uncertainty resolves more higher-frequency variations than before, but
not by a great amount. Furthermore, even if age uncertainty is reduced to
zero, the proxy records still differ a great deal from the records
constructed by using the OxCal P-sequence model as shown in
Fig. .
Contribution of age uncertainty to proxy estimation uncertainty.
(a, c) The proxy records for Ca area (in a) and Al area (in
c) for the original set of observations (red curves for the median,
light blue area for 95 % confidence bands) along with the proxy records
after making the RM age model uncertainty identical to zero (dark gray curves
for the median, orange area for 95 % confidence bands). (b, d) The
uncertainties for each time point (95 % confidence bands shown
in a and c) and for the two cases: original observations
(light blue), and after setting the RM age model error to zero (orange).
(Color online.)
Coming back to the issue of improving the Lonar proxy records with the help
of additional measurements at around 700 cm, we first note that these depths
would roughly correspond to calendar ages of around 4–5 kBP (see
Fig. B, starting at around 700 cm, and going
counterclockwise from the depth axis to the RM age model curve, to the
calibration curve and finally to the calendar age axis). From
Fig. , we find that the region around 4–5 kBP shows
almost no improvement from setting the age uncertainty to zero! This
indicates that the Lonar lake radiocarbon age measurements are not the
primary source of the final proxy uncertainties. Rather, it is more likely
that the major part of the proxy uncertainties in the Lonar records are due
to the proxy fluctuations in the depth domain and the calibration
uncertainties.
The thought experiment illustrates several points:
Even though the final uncertainty of proxy estimation is linked to
the age uncertainty in a complicated manner, it is possible to understand the
relative contributions of the age errors by setting them to zero.
The final proxy uncertainty can, in some cases, be determined more by
its own variability in the depth domain, rather than the age uncertainty.
Given a set of observations and an RM age model, it is possible to
obtain a limit to the precision with which the proxy can be estimated.
The variability of a proxy signal is inherently linked to the kind of
paleoclimate variations that it will allow to be investigated, and also to
the level of precision with which such studies can be carried out.
Precise, error-free timescale
One important consequence of having an age-uncertain timescale for
representing proxy records is that the intercomparison of records from
different archives becomes difficult and ambiguous. In the approach outlined
in this study, we overcome this difficulty because of the use of conditional
probabilities. The use of conditional probability implies that every time we
consider a particular calendar age and then estimate the DWF for it, we
“fix” the calendar age precisely and then obtain the likelihoods of the
proxy depths for that age. This means that the final proxy estimate we obtain
is represented on a timescale that is error free; i.e., it is without any
uncertainty. Such a notion of a precise timescale has already been
introduced for speleothems in , where it is termed an
“absolute” timescale that corresponds effectively to the time of deposition
of the proxy material on the archive. They illustrate the utility of an
error-free timescale with the help of Monte Carlo age modeling approaches. In
the present paper, we generalize this idea for all radiometrically dated
archives, and also provide an analytical framework for it.
Typically, if we represent paleoclimate proxies on an age-uncertain
timescale, the interpretations of paleoclimate events are constrained by not
being able to know when exactly an event took place in the past. This is
overcome if the uncertainty is transferred from the age axis to the proxy
axis, ensuring that the timescale of representation is always error free. An
analogy to visualize this process is to think of the uncertainty as a bag of
errors that can be carried either by the time axis or by the proxy axis. In
paleo-investigations till now, this bag of errors had been left on the
shoulders of the time axis, but we choose to transfer it to the proxy axis
instead. This comes at a price, because in doing so, we are not certain any
longer about the high-frequency variability of the proxy. Thus, it is not a
question of whether this particular representation of proxies is more correct
than the conventional age-uncertain one – the choice of representation is
context dependent and is determined by the goals of the paleoclimatic
investigation. One can conceive a study in which the interest in the
high-frequency variability is outweighed by the problems of vagueness induced
by an age-uncertain time axis. In such a scenario, it is reasonable to use an
existing framework of age modeling that establishes a representative set of
ages for each depth level of the core, provided the approximations used for
dealing with the irregularly calibrated age distributions are reasonable.
A methodological advantage of having an error-free time axis is that time
series analysis methods are more readily applicable to them. Although there
has been recent work that was able to extract the climate spectrum
as well as timescale-dependent trends with errors
, interpretations of time series analyses such as the
construction of paleoclimate networks have to be cautious
of the time uncertainties that were inherent in the original data and that
were left unresolved. Our approach provides a clear platform to carry out
such analyses and interpret the results.
General comments
We note here a few additional points relevant to our approach that deserve
attention.
The RM age model need not be monotonic if a non-monotonic
radiometric calibration curve is involved. This is apparent in
Fig. , but we note that the true RM age model shown as
a black curve in Fig. is itself not monotonic. This
is due to the fact that the up–down wiggles of the calibration curve carry
over to the RM age–depth curve. This however does not imply that the true
age model is non-monotonic.
In Fig. , the general trends of both curves
are similar, with the only difference that the proxy curve obtained with the
OxCal age model contains high-frequency variations. This indicates that the
age–depth interrelations underlying these curves are fairly similar. We
highlight this point as we do not expressly use an age model, nor do we
involve assumptions about the sedimentation process. The MoTaBaR regression
method used to get the RM age model is a data-driven regression method that
is not specific to any kind of sedimentation process.
It is also important to note that our approach in its present form
however presents several limitations. Firstly, it cannot deal with discrete
proxy variables and with data sets in which the depth measurement errors are
not negligible. Moreover, it assumes a perfect sedimentation record without
hiatuses or age reversals, and is not equipped to deal with data sets
possessing these features. It also does not consider the incorporation of
additional dating information such as layer counting of well-laminated
sections. It thus needs to be placed within a larger framework of proxy
record construction, involving approaches such as COPRA, StalAge, clam, or
OxCal, that can deal with such issues as well.
Conclusions
We present a Bayesian approach for the estimation of sedimentary proxy
records as well as their associated uncertainty. This is a novel approach
that, for the first time, details analytical relations between the different
uncertainties involved in paleoclimatic proxy record estimation. We
circumvent approximations typically required to deal with the irregularities
of the age distributions of calibrated radiocarbon ages by finding the
likelihood of depths for each calendar age rather than the other way
round. We establish an RM age model to obtain the relation between a time point and its related
set of depths, and avoid constructing an age model as is understood
conventionally. We provide analytical expressions for posterior proxy
probability distributions for any given calendar age that can be used for
further analyses.
We test our approach on two synthetic examples designed to mimic U/Th and
14C dating methods, and demonstrate its validity. We then reconstruct
groundwater inflow and surface erosion proxy records from the Lonar lake in
central India.
Our analysis shows how the variability of the proxy in the depth domain is an
intrinsic factor determining proxy uncertainty and how the final proxy
uncertainty in turn reduces the resultant variability of the median proxy
estimate. We extricate the contribution of the age uncertainty alone to the
final proxy uncertainty from the rest of the involved errors. We show that
for the Lonar lake set of observations, the age measurements are already
quite close to the limit of precision in terms of the final proxy estimates.
Such an exercise could be used to test the efficacy of a set of age
measurements or that of the proxy itself for being a suitable record for the
kinds of questions that the investigator wishes to pursue. Lastly, we
successfully extend the notion of a precise timescale for representing proxy
records to radiocarbon-dated archives.
Keeping in mind that the uncertainty (error) of any measurement is critical
to the proper use of that measurement, our analysis provides a way of
deriving the uncertainty of a proxy measurement at chosen points in time.
This allows for a more critical, and still more valuable, understanding of
how much, and how precisely, we know about the paleoclimate via proxy
measurements. Even though our analysis does not focus explicitly on
estimating other characteristics of the proxy such as its variability, power
spectra, transitions, etc., we have sketched how it can provide a general
direction in which such aspects can be estimated and the corresponding
uncertainties of estimation quantified.
This study is not without its limitations. It needs to be part of a more
expansive agenda of paleoclimate proxy record construction to be of practical
utility. Moreover, it has the potential to be extended to estimate the proxy
records for archives dated with procedures such as dendrochronological
methods and luminescence techniques. In the case of dendrochronology, it
would be possible to consider the time axis as the control variable (without
error) and the depth assigned to the time increments as the source of error,
and in the case of luminescence dating, a reasonable approximation of the
error distribution around the reported luminescence ages would suffice to
adapt the method suitably.
The RM age model involved in our approach is a critical step where further
information could possibly be included. For instance, deposition models that
are available for different types of paleo-archives, such as the P-, V-, and
U-sequence models of OxCal, could be combined with the radiocarbon age–depth
measurements to constrain the RM age model uncertainties further. This can
have a significant impact in reducing the final proxy uncertainty, especially
in cases where the dating measurements are poor.