Is there robust evidence of solar variability in palaeoclimate proxy data?

This is my EGU 2015 poster which I am presenting this evening. Poster B25 if any readers are at EGU and want to see it nailed to the board.

With my coauthors Kira Rehfeld and Scott St George, I have done a systematic review of high-resolution proxy data to detect possible solar-signals. It is an attempt to avoid the publication bias and methodological problems in the existing literature on solar-palaeoproxy relationships. A manuscript in in preparation.

There is no prize for finding any typos.

Posted in solar variability | 7 Comments

A deliberately misleading title?

How many readers at WUWT will read today’s headline

Strong evidence for ‘rapid climate change’ found in past millenia

as

Strong evidence for ‘rapid climate change’ found in past millennium,

a subtle difference with a very different meaning? (Yes there is a typo in his title – I only mention this so certain readers do not think it mine)

Watt’s introduction to the press release

From the University of South Carolina, comes this paper that offers strong evidence of ‘rapid climate change’ occurring within less than a thousand years, with some occurring over just decades to centuries, near the same scale that proponents of man-made climate change worry so greatly about today.

doesn’t give much away.

The paper Watt’s is referring to investigates δ15N in Cariaco Basin, off the coast of Venezuela during Marine Isotope Stage 3, over 36 thousand years ago. Not surprisingly, the record responds to Dansgaard-Oeschger events which have been known about for thirty years and are absolutely not analogous to current warming as Watt’s strives to imply.

Posted in Uncategorized | 3 Comments

All age-depth models are wrong, but getting better

Today at EGU, Mathias Trachsel presented an update to my 2005 paper “All-age depth models are wrong: but how badly?“. He looked at the performance of the Bayesian age-depth models that have been developed over the last decade. Generally, they perform better than the classical age-depth models, but there are some problems setting parameters.

His presentation can be downloaded here.

A manuscript based on the same analyses is almost ready for submission.

Posted in Uncategorized | 3 Comments

Limits to transfer function precision

Transfer functions are widely used to reconstruct past environmental conditions from fossil assemblages using the relationship between species and the environment in a modern calibration set. Naturally, palaeoecologists want to generate reconstructions that are as precise as possible, and take steps to achieve this:

  • taxonomic resolution can be improved in the hope that the new taxa will have narrower ecological niches than the aggregate taxa they replaced
  • larger calibration sets can be generated, which can improve precision but can also worsen it if the new observations are not comparable with the old
  • maximising the environmental gradient of interest while minimising nuisance environmental variables will usually improve calibration set performance (but not necessarily the reconstructions)
  • developing and using new transfer function methods
  • increasing the spatial density of observations in an autocorrelated environment (and using transfer function methods, such as the modern analogue technique, that are not robust to autocorrelation)

I want to suggest that there are limits to the precision that can be achieved in practice due to the inherent noise in species-environment relationships and that papers that report transfer functions with exceptionally good performance should be treated with caution. Temperature is one of the most commonly reconstructed environmental variables as it is a key climatic variable and is ecologically important, so I am going to focus on this.

With all the certainty of a hunch, I am going to place my threshold for dubious precision (the root mean squared error of prediction; RMSEP) at 1°C for transfer functions with long temperature gradients (i.e. equator to pole), and somewhat lower if the temperature gradient is small.

Several transfer functions have been declared to have performance better than this threshold. I’m going to focus on the planktonic-foraminifera sea-surface temperature (SST) transfer functions as I know these fairly well; the system is relatively simple (compared with diatoms in lakes at least); and there are some interesting issues to explore.

Pflaumann et al (2003) reported a planktonic foraminifera-SST transfer function with a standard deviation of residuals (similar to RMSEP if bias is low) of 0.75°C for winter and 0.82°C for summer using the SIMMAX method. SIMMAX was (hopefully I am correct in using the past tense) a version of the modern analogue technique (MAT) that weighed analogues by their geographic proximity to the test site during cross-validation. Since SST is spatially autocorrelated, giving high weights to close analogues will tend to make the predictions appear more precise. But this is a spurious precision, bought at the expense of the independence of the test observation, otherwise known as cheating. Since Telford et al (2004) described the problem with SIMMAX, it has been little used.

Waelbroeck et al (1998) introduced the revised analogue method, another version of MAT that attempted to merge the properties of MAT and response surfaces. Unfortunately the response surface was only calculated once rather than repeatedly during cross-validation. This means that the impressive performance for their planktonic foraminifera-SST transfer function, with a standard deviation of residuals of 0.7°C for winter and 0.91°C for summer, is biased by the failure to ensure that the test observation is independent of the calibration set during cross validation. I’ve not seen RAM used much since Telford et al (2004) described the problem with it.

Artificial neural networks (ANN) were used by Malmgren et al (2001), with a reported RMSEP of 0.99°C for winter and 1.07°C for summer. ANNs learn by iteratively adjusting a large set of parameters, which are initially set at random values, to minimize the error between the predicted and actual output. If trained for too long, ANNs can over-fit the data, learning particular features of the modeling set rather than the general rules. This is normally controlled by using splitting the data, training the models on on portion of the data and testing the models with a second portion and stopping the training when the model stops reducing the RMSEP of this second portion. Typically many ANN models are generated from different random initial conditions and configurations and the best model used judged using the second portion. By selecting models that give the lowest RMSEP for the second data partition, the RMSEP is biased low. A third data partition is needed to give an unbiased estimate of model performance (again, see Telford et al (2004) ). Malmgren et al did not use the this independent test set, so their results are biased low.

MAT is perhaps the most widely used transfer function method for reconstructing SST from planktonic foraminifera. Telford & Birks (2005) report an RMSEP of 0.89°C for winter SST in the North Atlantic (Kucera et al (2005) report a larger RMSEP of 1.32°C for winter and 1.42°C for summer – I don’t know what causes the difference). As Telford & Birks (2005) show, this low RMSEP is biased by spatial autocorrelation in the calibration set which means that the test observation is not independent of the calibration set during cross-validation.

All of these low RMSEP are demonstrably biased. To have an RMSEP of 1°C, species need to have very clean responses to the temperature. Nuisance variables and noise make this unlikely. With short gradients, the magnitude of error that is possible decreases, so lower RMSEPs are expected (but also lower r²). So for example, the Norwegian pollen-July temperature RMSEP of just over 1°C is plausible. This model has none of the problems outlined above and uses methods that are reasonable robust to autocorrelation.

In reality, different threshold are needed for different proxies. When the relationship between the organisms and the environmental variable being reconstructed is less direct (for example between chironomids and air temperature) or there are large nuisance gradients (again e.g. chironomids), the threshold at which I start to wonder is raised.

The same logic outlined here holds for transfer functions for reconstructing other variables – if the results look too good to be true, there might be problems. For example, There is at least one transfer function where I suspect that the authors have forgotten to cross-validate their model, so good is the performance. Unfortunately, short of acquiring the data and re-running the analyses, there is little that can be done to check such cases.

A question for readers, do you know of any transfer functions with suspiciously good performance that ought to be examined?

Posted in transfer function | Tagged | Leave a comment

In eight-dimensional space, no one can hear your data scream

Mauri et al (2015) make a gridded climate reconstruction for Europe over the Holocene based on almost 900 pollen stratigraphies. The hope is that the reconstruction will be useful for evaluating climate models. This is a useful goal – if the climate models can generate realistic Holocene climates, our confidence in their predictions for the future should increase. This hope will only be realised if the reconstructions are reliable, unbiased, and with realistic estimates of reconstruction uncertainty.

Mauri et al reconstruct eight climatic variables: mean summer (JJA), winter (DJF) and annual temperature and precipitation, mean annual GDD5 (growing degree days over 5 °C) and mean annual P–E (precipitation minus evaporation).

Mauri et al are not the first to try to reconstruct many environmental variables simultaneously, nor do they have the record for the most variables (I seem to recall one paper reconstructing over 20 variables from pollen, but cannot find the reference). But like so many papers, Mauri et al do not consider whether they really can reconstruct so many variables. I do not dispute that all the variables reconstructed by Mauri et al are important to plants (actually I do – I don’t think that plants care about annual mean temperature per se, but are instead sensitive to seasonal temperatures and their combination), but I am sceptical that all can be reconstructed at all sites. It would seem more plausible to me if some variables could be satisfactorily reconstructed in some sites, and other variables in other sites depending on which variables are limiting species abundances.

Mauri et al have this to say about their choice of environmental variables:

Here we do not attempt to justify our choice of parameters, other than to point to the extensive peer reviewed literature in which these parameters have already been applied.

Not an entirely satisfactory justification. Would it not have been better to try to determine which variable can be reconstructed and where before providing gridded reconstructed climate? One can hardly expect the climate modeller to do this.

Mauri et al use the modern analogue technique (MAT, aka k-nearest neighbours) based on 4700 modern pollen and climate observations. MAT is sensitive to spatial autocorrelation which makes the transfer function model appear more precise that can be justified. Mauri et al are aware of the problem describing the problem accurately.

The performance of a training-set is often estimated using cross-validation techniques, but performance can be over estimated as a result of spatial autocorrelation from geographically close analogues.

But then comes this

The extent of this problem has not generally been considered to be significant enough to limit the application of the MAT technique, and indeed the spatial structure in the data may still be an important function of the climatic response, especially at regional scales (Bartlein et al., 2010).

It is perhaps true that spatial autocorrelation has not “been considered … enough”. Telford and Birks (2009) demonstrate that spatial autocorrelation can greatly bias the performance estimates of pollen transfer functions that use MAT. If the true uncertainty is, say, 50% higher than apparent, we risk finding that climate models do not agree with the data when the  agreement is actually reasonably good. I’ve no idea what Mauri et al mean by the second part of this sentence – time to look at Bartlein et al (2011) who have a paragraph about spatial autocorrelation.

Standard goodness-of-fit statistics such as R2 may overestimate the predictive power of climate reconstructions (especially those made with the modern analogue technique) due to unaccounted-for spatial autocorrelation in the response variables (e.g. Telford and Birks 2005).

True.

The extent of this effect in published pollen-based climate reconstructions cannot easily be quantified.

Mainly because no one has tried.

However, it should be noted that the spatial autocorrelation of vegetation composition at a regional scale derives almost entirely from its causal relation to climate, provided that attention is confined to variables that influence the growth, establishment and regeneration of plants (Harrison et al. 2009).

Harrison et al do not appear to have tested this conjecture. This conjecture can only be true if non-climate factors such as soil are not important and there is no dispersal limitation. Evidence for dispersal limitation in European trees can be found in Nogués-Bravo et al (2014) and in the ability of European trees to naturalise far outside their native range (e.g., Abies alba in Denmark)

Spatial pattern in pollen data thus constitutes valuable information for the reconstruction, to be retained rather than rejected (Legendre 1993; Legendre and Legendre 1998).

This is true only if you make the assumption that the spatial structure of climate did not change in the past. This assumption is very unlikely to be valid.

In any case, spatial autocorrelation in pollen data becomes non-significant at length scales of 200–300 km, and is slight at any scale when full taxon lists are used (Sawada et al. 2004).

Sawada et al did not examine the autocorrelation in the pollen data but the autocorrelation in MAT residuals. Since MAT uses the spatial structure in the data to artificially improve its fit, the autocorrelation in the residuals will be smaller than that in the data. In any case, 200-300 km is sufficiently large to contain many potential analogues in cross-validation and bias the performance estimates.

Guiot and de Vernal (2007) showed that the goodness-of-fit (as measured by the R2 statistic) is an appropriate measure when spatial autocorrelation in the pollen data arises from the underlying climate and not from processes internal to the vegetation system.

Guiot and de Vernal (2007) is simply wrong. The authors were equally wrong in 2011.

Bartlein et al. appears not to give any robust support to the argument in Mauri et al.

Back to Mauri et al. They attempt a solution to spatial autocorrelation

 In evaluating our transfer function, we have tried to take account of the auto-correlation problem by adopting an n-fold-leave-one-out cross validation which provides a more reliable estimate of the model performance than simple leave-one-out cross-validation (Barrows and Juggins, 2005).

[n-fold cross-validation provides] a more reliable estimate of the model performance than is provided by simple leave-one-out cross-validation, especially using MAT where the effect of spatial autocorrelation can otherwise cause uncertainty to be under-estimated (Barrows and Juggins, 2005).

Barrows and Juggins (2005) contains nothing about spatial autocorrelation; it cannot be used to justify n-fold cross-validation as a solution to spatial autocorrelation.

With MAT, spatially close observations in the calibration set are often selected as analogues. If they were being selected just because they are similar in the environmental variable of interest, there would be no problem. However, they may be being selected as analogues because they are similar for many environmental variables. One way to deal with spatial autocorrelation is to exclude observations that are spatially close to the test observation during cross-validation – h-block cross-validation (Telford and Birks 2009)n-fold cross-validation will remove some spatially close observations, on average 1/n of them, but leaving (n-1)/n of them to affect the analysis. If you think n-fold cross-validation is a solution to spatial autocorrelation, I have a sieve you can use as an umbrella.

Mauri et al plot their reconstruction uncertainties which is commendable, they just don’t tell the reader how they calculated the uncertainties, nor how many analogues were used. The uncertainties resulting from the calibration set being dominated by moss-polsters while the reconstructions are from lakes and bogs is also not discussed. Perhaps these are terribly tedious methodological details, but they are needed to be able to properly evaluate the paper.

Mauri et al is certainly a better analysis that Davis et al (2003) which it supersedes, but falls short of what could have been achieved.

Posted in Peer reviewed literature, transfer function | Tagged , | 2 Comments

Significance tests of transfer-function reconstructions

The presentation I gave at the seminar in honour of John Birks can be found here. I discussed how reconstructions from transfer functions can be evaluated, focusing on the significance tests of transfer function reconstructions that I developed in Telford and Birks (2011).

I show that significant results can arise because the variable of interest is an ecologically important determinant of variability in the fossil species data, or that it is correlated with one in the modern calibration set. This means that the significance tests do not circumvent Steve Juggins’ “sick science” problem.

Non-significant results can arise because the variability in the environmental variable of interest is small relative to the transfer function model uncertainty, other environmental variables are varying, or in some circumstances, because the tests have low power.

Whether the reconstruction is significant or not, low amplitude variability should be interpreted with caution as it may represent noise.

In questions, Cajo ter Braak suggested an alternative strategy for testing reconstruction significance. Initial tests (only two lines of code needed changing) suggest this alternative might be more powerful.

Exceedingly brief summaries of most of the other presentations can be found on twitter under the tag #HJBB2015.

 

 

Posted in transfer function | Tagged | Leave a comment

John Birks – At the frontiers of palaeoecology

This week, the University of Bergen is holding a seminar in honour of John Birks and his academic career so far. He retired earlier this year, at least from teaching and administration duties.

Several well known palaeo/ecologists who have either been supervised or otherwise worked with John will be presenting – see the programme for details.

@alistairseddon, @richardjtelford and probably others will be live-tweeting the seminar: follow #HJBB2015.

This seminar follows a special issue (branch?) of The Holocene that was published in January which we managed to keep a secret until it was published. We didn’t do so well in keeping the seminar a secret (someone apologised to John for not being able to come – whoops).

Posted in climate, transfer function | Tagged | 1 Comment