Pollen from the garden of forking paths

Most transfer functions for reconstructing past environmental changes are based on a calibration-in-space approach, with a modern calibration set of paired microfossil assemblages and environmental data. The alternative approach is calibration-in-time, with well-dated fossil assemblages and contemporaneous environmental data.

I’ve previously shown that the three chironomid calibration-in-time transfer functions all misreport the performance statistics. All report the apparent performance as if it was the cross-validated performance. But what of the single calibration-in-time model developed for pollen assemblages that I am aware of (please let me know of other calibration-in-time models for microfossil assemblages)?

Kamenik et al (2008) report a calibration-in-time model for the pollen stratigraphy from the mire Mauntschas in the SW Swiss Alps, dated with 29 14C dates and other chronological information. The paper reports that the r2 between predicted and modelled April-November temperature is 0.44. In ideal circumstances this would represent a very respectable performance. However, I have a couple of concerns about the analysis in this paper.

Firstly, the pollen and climate data are smoothed with a three-year triangular filter prior to any analysis. This will induce temporal autocorrelation into the data and thereby violate the assumptions of the statistical methods used.

Secondly, a large number of choices are made on the route to the selected model. Choices include:

  • The climate variable reconstructed (mean temperature or precipitation over 1-12 months with a lag of 1-11 months – a total of 288 responses). May to August air temperature is the best predictor in an RDA, but April-November air temperature is reconstructed because it reduced the transfer function error.
  • The predictor variables (6 out of 11 pollen taxa are used in the final model)
  • The transformation of the pollen data (accumulation rate or percent, detrended or not)
  • The statistical model (ordinary least squares regressions (OLSR), time series regressions (TSR), ridge regression (RidgeR), principal components regressions (PCR) and partial least squares regressions (PLSR))

The impact of these choices – forks in Andrew Gelman’s garden of forking paths – will be to inflate the estimate of model performance.

It is possible to explore the impact of some of these choices by simulation which can help gauge how impressed we should be with an r2 of 0.44. To simplify the simulation, I’m only going to investigate the importance of the induced autocorrelation and the selection of pollen taxa.

I simulated 49 years of climate data and pollen data with a Gaussian distribution 10,000 times for each case below (code on github).

The upper panel in the figure below shows the distribution of leave-one-out cross-validation r2 of OLS models fitted to six simulated pollen spectra (data from two years are removed to make the data set comparable with the smoothed data in the next step). The reported r2 is far above the 95th percentile of the distribution.

The middle panel shows the distribution of r2 when the six pollen spectra and the climate variable are smoothed with a three-year triangular filter, as used by the authors. The 95th percentile of the distribution has moved towards the reported r2.

In the lower panel, I show the distribution of the r2 of the OSL when the best subset (1–11 variables) of smoothed pollen spectra is chosen by BIC. The 95th percentile of the distribution has moved beyond the reported r2. No claim to statistical significance can be supported.


Distribution of r2 values from 10000 simulations with different treatments. The red line shows the reported r2.

Allowing for model selection, choice of data transformations, and selection of the climate variable, reconstruction would move the simulated r2 even further to the right.

None of the choices made by the authors are bad. All can be defended, except perhaps the inclusion of autumn temperatures in their climate mean when most plants have flowered months earlier. The problem is that the authors have sought the path which yields the best performance. Pollen data from an another core from the same mire would be unlikely to give the same performance with the selected model.

As with the three chironomid calibration-in-time models, the Mauntschas Mire model cannot be relied upon. A critical difference though is that the performance of the chironomid models is misrepresented.

About richard telford

Ecologist with interests in quantitative methods and palaeoenvironments
This entry was posted in Peer reviewed literature, transfer function and tagged , , . Bookmark the permalink.

1 Response to Pollen from the garden of forking paths

  1. Pingback: A mean wind blows over Lake Żabińskie | Musings on Quantitative Palaeoecology

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s