When I wrote a requiem for the dinocyst-salinity transfer function, one of the issues I raised was the effect of incomplete sampling of environmental space such that transfer functions were forced to interpolate (extrapolation I’ve examined elsewhere). I raised the same issue when I wrote a comment on Klein et al at Climate of the Past Discussions, but was worried that I could not cite any paper to support this claim. Hence, I resolved to investigate this problem as soon as my postdoc Mathias had compiled an R package for simulating species abundances along environmental variables, an excuse to test the code.
The problem is that the dinocyst-salinity calibration set is patchily sampled in environmental space; there are clumps and lacunae.
The immediate effect of this is to make some taxa appear to have bimodal relationships with salinity. Bimodality could also occur if multiple taxa with distinct environmental niches are lumped together because they cannot be distinguished morphologically. The transfer functions have to interpolate from the clumps into the lacunae – how well do they do it?
I was going to simulate species abundances for calibration sets with lacunae forming either zebra stripes or a chessboard pattern and then test the difference between cross-validation performance and the performance estimated using a uniformly distributed validation set for different transfer function methods.
But then, I realised that this is essentially the same problem as I explored in Telford and Birks (2011), “Effect of uneven sampling along an environmental gradient on transfer-function performance”. Lacunae are simply the extreme case of uneven sampling, and it makes no obvious difference whether a single environmental variable is unevenly sampled or if environmental space as a whole is unevenly sampled. The effect will be the same: transfer function performance estimated by cross-validation will be optimistic, with the modern analogue technique much more optimistic than weighted averaging.
I don’t think I need to repeat the analyses of Telford and Birks (2011) to demonstrate this. Persuade me otherwise and I can run through an analysis and submit it somewhere.
I have some other plans for testing the species simulation code (will be submitted to CRAN when tested, otherwise available on request).