It seems reasonable to assume that reconstructions based on fossil assemblages that are unlike any of the observations in the modern calibration set will be less reliable than fossil assemblages that have good analogues in the calibration set. I think analogue quality — defined as the taxonomic distance from each fossil assemblages to the most similar modern observation — is a key diagnostic of reconstruction quality and really ought to be included in papers presenting new reconstructions.
I’m going to show how the best analogue distance can be calculated with the rioja package, it can also be calculated with the analogue package.
I’ll start by loading some data, this time I’m going to use the Imbrie and Kipp planktonic foraminifera sea-surface temperature calibration set and fossil assemblage data from a Caribbean core.
library(rioja) data(IK) spp<-IK$spec/100 fos<-IK$core/100 env<-IK$env$SumSST sppfos<-Merge(spp, fos, split=TRUE) spp<-sppfos$spp fos<-sppfos$fos depths<-as.numeric(rownames(fos))
I divided the assemblage data by 100 to convert it from percent to proportion data.
I now fit a MAT( ) (modern analogue technique) model to the data. MAT is used regardless of which transfer function method we intend to fit. The function predict( ) not only makes the palaeoenvironmenal predictions, but also calculates some diagnostics. We are interested in the dist.n component which is the taxonomic distance from each fossil assemblage to the k most similar modern assemblages. k is 5 by default. We want the first column of this matrix – the distances to the best analogue.
There are many distance metrics that could be used to find the taxonomic distance from the fossil assemblages to the modern assemblages. The default in rioja is the squared chord distance.
mod<-MAT(spp, env) pred<-predict(mod,fos) plot(depths, pred$dist.n[,1], ylab="Squared chord distance", xlab="Depth")
Fossil assemblages that are identical to a modern assemblage will have a best analogue distance of zero (in practice this only happens with mono-specific assemblages). This is independent of the distance metric used. The maximal distance possible is method dependent – it will be two for the squared chord distance on proportion data and one for the Bray-Curtis distance when there are no taxa in common. Obviously we need to specify which distance metric was used.
With the code above we can identify which fossil assemblages have better analogues and which worse, and any distances that approach the upper bound should cause alarm, but we cannot tell if a squared chord distance of 0.25 is good or bad.
If the shortest distance between a fossil assemblage and the modern assemblages is typical of distances between similar assemblages in the calibration set, then we can declare that the analogue match is good. The usual rule-of-thumb is that distances shorter than the 5th percentile of all distances between calibration set assemblages represent good analogues, and distances greater than the 10th percentile represent no-analogue assemblages. Some people use a Monte Carlo approach for estimating these thresholds (this is implemented in the analogue package), but I don’t think this gains much.
The distances between all modern assemblages can be calculated with paldist( ) – more convenient that dist( ) as it uses the same distance metrics as MAT( ).
goodpoorbad<-quantile(paldist(spp), prob=c(0.05, 0.1)) abline(h=goodpoorbad, col=c("orange", "red"))
Most of the assemblages below 50cm depth lack good modern analogues. Although we can still reconstruct palaeoenvironmental variables for these assemblages, we should be wary about relying on such reconstructions – they are more uncertain than the transfer function performance statistics would suggest. How much more uncertain? This is difficult to know. Techniques like bootstrapping will calculate observation specific uncertainties (see a forthcoming post) but do not account for the extra uncertainty due to poor analogue quality.