Overlaying core and calibration set samples in an ordination

One visualisation technique I frequently use is to make an ordination showing the modern calibration set sites, the fossil core data, and contours of an environmental variable.
calibration-core NMDS
Here, I’ve used non-metric multidimensional scaling to ordinate the core and calibration data simultaneously, and then drawn contours of an environmental variable over the ordination.
With this plot it is easy to tell if

  1. The core samples have good analogues in the calibration set. If so they will be within the calibration set cloud.
  2. The environmental variable of interest could plausibly have driven the assemblage changes down core. If so the trajectory of the core will be ~perpendicular to the contours of the environmental variable. If not, the core samples will either be in a compact cloud, or parallel to the contours.

It would be possible to add contours of multiple environmental variables, but this is likely to resemble a plate of spaghetti very quickly. Perhaps more usefully, it is possible to add multiple cores to the figure. If all share the same trajectory, then perhaps each core has the same forcing factors.
It is possible to predict the value of the environmental variable for each fossil sample using the contours (analogous to Goring et al. (2009), who use this technique for a single fossil sample at a time), but I don’t think this is a good idea. The problem is that the fossil samples tend to be similar and so clump together, potentially distorting the space around them. This clumping and distorting will get worse with high resolution fossil records, so it may be necessary to thin these, for example only using every tenth sample. Alternatively, the fossil samples could be added passively to an ordination. This cannot be done with NMDS, but it could be done with an eigenvector based method such as correspondence analysis. There is a different problem here however, if the fossil data have an extreme position on axes one or two, it will be obvious, but it may be difficult to detect if the fossil data have an extreme position on a higher axis, e.g., ordination axis seven.

data(SWAP)#SWAP training set data
data(RLGH)#Core data from the Round Loch of Glenhead

allspp<-Merge(SWAP$spec, RLGH$spec)

par(mar=c(3,3,1,1), mgp=c(1.5,.5,0))
plot(mod, type="n", display="sites")
points(mod, display="sites", pch=20, col="lightblue", select=1:nrow(SWAP$spec))
points(mod, display="sites", type="l", select=-(1:nrow(SWAP$spec)))
ordisurf(mod, c(SWAP$pH, rep(NA, nrow(RLGH$spec))), add=T)
legend("topright",legend=c("Calibration sites", "pH", "RLGH"), col=c("lightblue", "red", "black"), pch=c(20,NA,NA), lty=c(NA,1,1))

Lines 1:4 load the libraries and built-in datasets. Line 6 Merges the calibration and fossil data.frames into a single data.frame that is then used in the NMDS on line 8. The plotting makes use of the plotting functionality of the vegan library, see help(plot.cca) for details (although we are actually using plot.metaMDS). First the code plots an empty graph, then selects just the calibration sites, and plots these, then selects everything but the calibration sites and plots this as a line. Since the environmental values for the fossil data are unknown, they are coded as NA for ordisurf.

With correspondence analysis, the code is a little different, as we need to predict the position of the fossil data.

allspp<-Merge(SWAP$spec, RLGH$spec, split=T)

pred<-predict(CA, newdata=sqrt(allspp$RLGH), type="wa")

par(mar=c(3,3,1,1), mgp=c(1.5,.5,0))
plot(CA, type="n", display="sites")
points(CA, display="sites", pch=20, col="lightblue")
points(pred[,1:2], type="l")
ordisurf(CA, SWAP$pH, add=T)
legend("topright",legend=c("Calibration sites", "pH", "RLGH"), col=c("lightblue", "red", "black"), pch=c(20,NA,NA), lty=c(NA,1,1))

Goring et al. (2009) A new methodology for reconstructing climate and vegetation from modern pollen assemblages: an example from British Columbia. Journal of Biogeography 36 626-638.


About richard telford

Ecologist with interests in quantitative methods and palaeoenvironments
This entry was posted in EDA, R and tagged , , , , . Bookmark the permalink.

4 Responses to Overlaying core and calibration set samples in an ordination

  1. Pingback: Transfer function and palaeoenvironmenal reconstruction diagnostics | Musings on Quantitative Palaeoecology

  2. ucfagls says:

    Just came across this post Richard. A couple of points to note;

    I don’t think you need my analogue package here to do what you wanted to, just vegan. rioja and analogue don’t play nicely together; Steve and I use similarly named functions which have different definitions and which clash. Depending on the order in which the packages are loaded, you’ll either see my functions or Steve’s. We should fix this, but that motivation has hit home just yet…, and
    there is now (probably wasn’t when you wrote this) a function in my analogue package to do the passive overlays of core samples for you; I call this timetrack().

    timetrack() works for unconstrained (PCA, PCoA, & CA) and constrained ordinations (CCA, RDA, capscale()) but not NMDS as I’m not convinced it is a good idea to use that method for this yet:

    The fossil samples influence the ordination you are projecting into; there is no consistent underlying model for the training set anymore but one that has to be regenerated every time you “overlay” a new core into the data
    There is no way to do this for a constrained ordination with NMDS and this is a useful way to look at how well samples in the fossil set are fitted by the model based on the training set.
    The response surface can be fitted to all the data use to build the underlying ordination; as you mention above, you need to include NAs for the fossil samples, which is a bit of a fudge

    • Not sure why analogue got loaded, perhaps there were other things in the analysis I did that used it.

      I would prefer to use an unconstrained/constrained ordination and add the fossil data passively, but the problem is that it only identified non-analogue fossil assemblages if they are non-analogue on axis 1 or 2 of the ordination. If the fossil data are weird on axis 16, this will be overlooked.

      One strategy would be to use goodness-of-fit or nearest neighbour distance to identify problems, but these lack the visual punch of an NMDS where the fossil data is nowhere near the calibration set.

      If you are not interested in analogue quality (because you have checked it another way) then NMDS is probably not optimal. I’ll try timetrack( ) next time I want to overlay fossil and calibration set data.

      • ucfagls says:

        There’s nothing stopping you looking at the later axes ūüôā

        With the constrained analysis you can get more information on the goodness of fit than just a visual look-see at the ordination.

        Agreed that for the unconstrained analysis this is more difficult; one could compute the Euclidean distance between the fitted point (on axes 1 and 2) and the point in the full dimensions, simply by setting the coordinates for the fitted point to zero for all later dimensions. Whilst a point may not be *visually* an outlier in the axis 1 and 2 space (it sits up or down out of the page), large residual distances would indicate samples that don’t project well onto the principal component plane defined by axes i and j.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s