A transfer function will generate numerical reconstructions provided that at least one taxon is present in both the fossil and modern data sets. Obviously, the reconstruction will be more reliable if most of the fossil taxa are in the modern calibration set and have well defined optima. I find a simple plot useful to diagnose how good the coverage of the fossil taxa is.
First I load the SWAP modern calibration set and the fossil diatom data from the Round Loch of Glenhead (a beautiful loch but rather cold for swimming in) from the rioja package in R.
library(rioja) data(SWAP) data(RLGH) spp<-SWAP$spec env<-SWAP$pH fos<-RLGH$spec sppfos<-Merge(spp, fos, split=TRUE) names(sppfos) # "spp" "fos"
I’ve renamed the data sets to aid code reuse. The Merge( ) function makes sure the modern and fossil data sets have the same species list (you need to harmonise the taxonomy and spelling yourself). With the argument split=TRUE it returns a list with elements named after the merged data; with split=FALSE, it returns a single dataframe.
Next, I calculate the maximum abundance of each taxon in the modern and fossil data. This is done efficiently with sapply( ) which applies the function max( ) to each column in turn. I also calculate the effective number of occurrences in the modern calibration set of each taxon with Hill.N2( ), which I will use to indicate which taxa are likely to have well defined optima. I use Hill’s N2 rather than the number of occurrences (N0) as a taxon with several occurrences at 1% and one occurrence at 90% has its optima determined mainly by the 90% so effectively has just over one occurrence rather than several. Taxa missing from the calibration set are given an infinite N2 by this function; I replace this with zero.
sppmax<-sapply(sppfos$spp,max) fosmax<-sapply(sppfos$fos,max) n2<-Hill.N2(sppfos$spp) n2[is.infinite(n2)]<-0
Now I simply plot the maximum abundance of each taxon in the fossil data against its maximum abundance in the calibration set, and colour red and change the symbol of any taxa likely to have poorly defined optima because of a low N2 (here set at N2≤5). identify( ) can be used to identify any taxa of interest.
x11(4.5,4.5);par(mar=c(3,3,1,1), mgp=c(1.5,.5, 0)) plot(sppmax,fosmax, xlab="Calibration set maximum", ylab="Fossil data maximum", col=ifelse(n2<=5, 2,1), pch=ifelse(n2<=5, 16,1)) abline(0,1, col=2) #add a 1:1 line. identify(sppmax,fosmax, labels=names(sppmax), cex=0.7)
Any taxa above the 1:1 line have a greater maximum abundance in the fossil data than the modern data. Small discrepancies don’t matter much, but taxa that are much more common in the fossil data are a cause for concern, indicating that there might be bad analogues. Taxa highlighted with filled red symbols, those with a low N2 and so likely to have poorly defined optima, are also a cause for concern if they are abundant in the fossil data. Standard tests (forthcoming post) for analogue quality ignore these taxa. Poorly defined optima (more generally coefficients) are only directly a problem for transfer function methods such as weighted averaging and maximum likelihood, but I would still be wary of using the modern analogue technique if the environmental niches of some important taxa were unclear.
The Round Loch of Glenhead fossil diatom data is very well behaved. Only two species have a higher maximum abundance in the fossil data than the modern data, one minimally so, the second is not found in the calibration set and is not very abundant in the fossil data. No taxa with poorly defined optima are abundant in the fossil data. Overall, there are no signs of major problems with this data set, which is not always the case.
This graph is very useful to look at when checking for potential problems, but I would be unlikely to include it in a publication. Instead, I would include a numerical summary, or simply some text to confirm that there were no problems.
For example, the maximum sum of taxa in the fossil data that are not present in the modern data is
max(rowSums(sppfos$fos[,n2==0 ,drop=F])) # 1.24875
In the next post in this series on transfer function and reconstruction diagnostics, I’ll show why I think taxa in this data set with N2≤5 occurrences are likely to have poorly defined optima.