Extrapolating with transfer functions

Reconstructions of past environmental conditions can be made using transfer functions based on the modern relationship between paired observations of species assemblages and the environmental variables of interest in a calibration set. One of the requirements of calibration sets is that they span the likely range of past environmental conditions being reconstructed.

It is noted (Birks 2010), though, that some transfer-function methods can extrapolate. ter Braak (1995) explores this with some simulated data. His calibration set has two environmental variables; observations are taken from an “L”-shaped area. The test set fits into part of the space within the “L”. Thus, although the test set requires extrapolation into environmental space not covered by the calibration set, it does not require extrapolation to extreme values of either environmental variables. In ter Braak’s (1995) example, a one-dimensional multinomial logit model (related to the more common maximum likelihood method) and the modern analogue technique performed poorly. Weighted averaging partial least squares, which is designed to incorporate information from secondary environmental variables performed well, almost as well as a two-dimensional multinomial logit model, which explicitly incorporates secondary environmental gradients.

I want to test something slightly different, how well transfer function models behave when extrapolated to extreme values of the environmental variable of interest.

The test I’m going to use is simple. I’m going to develop transfer function models on the SWAP diatom-pH calibration set truncated at either the 25th or 75th percentile and make predictions for the remaining 25% of the data. The predictions are compared with the observed values, and for context, I also examine the leave-one-out cross-validation predictions for this part of the environmental gradient from transfer function models trained on the entire calibration set.

Here is the code for testing weighted-averaging.

library(rioja)
data(SWAP)
summary(SWAP$pH)
keep<-SWAP$pH<6.225
#WA
mod0<-crossval(WA(SWAP$spec,SWAP$pH))
mod1<-crossval(WA(SWAP$spec[keep,colSums(SWAP$spec[keep,])>0],SWAP$pH[keep]))
p1<-predict(mod1, SWAP$spec[!keep, ])$fit[,1]

plot(SWAP$pH, mod0$predicted[,1], xlab="Measured pH", ylab="Predicted pH")
title(main="WAinv")
points(SWAP$pH[keep], mod1$predicted[,1], col=2)
points(SWAP$pH[!keep], p1, pch=16, col=4)
abline(h=6.225, v=6.225, a=0, b=1)

I ran this test for weighted averaging with inverse deshrinking (WAinv), weighed-averaging with monotonic spline deshrinking (WAmono), a two-component weighted averaging partial least squares model (WAPLS), maximum likelihood (ML) and the modern analogue technique (MAT) with five analogues.

Predicted against measured pH for the entire calibration set (black), the truncated calibration set (red) and the extrapolation test set (blue) for different transfer function methods. The plots on the left show extrapolation to low pH, those on the right to high pH.

Predicted against measured pH for the entire calibration set (black), the truncated calibration set (red) and the extrapolation test set (blue) for different transfer function methods. The plots on the left show extrapolation to low pH, those on the right to high pH.

Mean bias and r² of the lower 25% of the calibration set
Method Leave-one-out cross-validation Extrapolation
Mean bias Mean bias
WAinv 0.17 0.24 0.12 0.28
WAmono 0.17 0.23 0.50 0.28
WAPLS2 0.11 0.26 0.37 0.28
ML 0.02 0.16 0.32 0.11
MAT 0.10 0.20 0.53 0.16
Mean bias and r² of the upper 25% of the calibration set
Method Leave-one-out cross-validation Extrapolation
Mean bias Mean bias
WAinv -0.15 0.32 -0.56 0.22
WAmono -0.15 0.30 -0.61 0.21
WAPLS2 -0.13 0.28 -0.40 0.17
ML -0.07 0.19 -0.50 0.09
MAT -0.18 0.29 -0.83 0.14

As expected, with leave-one-out cross-validation of the entire calibration set, mean bias is positive at the low end of the gradient (pH is over-estimated) and negative at the high end (pH is under-estimated). The r² for these short portions of the gradient is lower than that for the full gradient.

With extrapolation from the truncated calibration set, absolute mean bias increases in all but one case, and the r² decreases in most cases, but surprisingly increases at the acid end of the pH gradient with some methods.

Weighted averaging works by calculating the pH optima of each species as the abundance-weighed average of its occurrences. The weighed average of the optima of the species in the test observation is then calculated. Because averages are taken twice, the estimates will span a smaller range than the original observations. A deshrinking step is used to stretch the estimates to best match the observations. It is this deshrinking step that can allow weighted averaging methods to extrapolate. There is a choice of methods in the rioja package: inverse deshrinking, classical deshrinking, and monotonic spline deshrinking.

Strangely, at the acid end of the gradient, WAinv performs better by extrapolation than by leave-one-out cross-validation. At the alkaline end, performance is worse. WAmono has a larger mean bias than WAinv, especially at the acid end, but a similar r².

WAPLS is designed to cope with secondary gradients in the calibration set, but can also work by correcting edge effects in WA. With the SWAP calibration set, WAPLS has only marginally better cross-validation performance than WAinv; its extrapolation performance has a lower mean bias at the alkaline end, but it otherwise does not outperform WAinv.

I thought ML would perform well by extrapolation as it fits a curve to each species which can be extrapolated. However, with these examples ML does not perform well, perhaps because there are many species with poorly defined optima and tolerances.

MAT is predictably hopeless at extrapolating. The predictions are the mean pH of the five taxonomically most similar observations in the calibration set. The maximum possible prediction is the mean of the five most alkaline observations in the calibration set, the method cannot extrapolate.

These tests show that in some circumstances extrapolation with WA can be good, in others it is poor. WAmono and ML performed worse than I thought they would.

The difference in the extrapolation performance at the acid and alkaline ends of the gradient is curious. If it is possible to work out why this occurs, it may be possible to predict when it is safe to extrapolate (slightly), and when extrapolations are less trustable.

Thanks to Sakari for prompting this post.


Birks, H.J.B., Heiri, O., Seppä, H., Bjune, A.E., 2010. Strengths and weaknesses of quantitative climate reconstructions based on late-Quaternary biological proxies. The Open Ecology Journal 3, 68–110.

ter Braak, CJF 1995. Non-linear methods for multivariate statistical calibration and their use in palaeoecology: a comparison of inverse (k-nearest neighbours, partial least squares and weighted averaging partial least squares) and classical approaches. Chemometrics and Intelligent Laboratory Systems, 28, 165-80.

Advertisements

About richard telford

Ecologist with interests in quantitative methods and palaeoenvironments
This entry was posted in transfer function and tagged . Bookmark the permalink.

2 Responses to Extrapolating with transfer functions

  1. The simplest explanation for the difference in extrapolation performance at the acid and alkaline ends of the gradient is that the first quartile of the data spans 0.61 pH units, whereas the fourth quartile spans 1.0 pH units. It might have been fairer to run the test for a fixed length of gradient rather than a fixed proportion of the observations.

    When testing extrapolation to the extreme 0.5 pH units of the calibration set, performance at the acid end is qualitatively similar to the test above. At the high end of the gradient, the cross-validation performance is very poor (r² all below 0.1). The extrapolations have larger mean bias, but also a higher r².

    Perhaps the simplest test of whether a transfer function can extrapolate is how well it performs at the end of the gradient. If it does not perform well at the end of the gradient under cross-validation it is very unlikely to extrapolate well.

  2. Pingback: Effect of incomplete sampling of environmental space on transfer functions | Musings on Quantitative Palaeoecology

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s