Fossil stratigraphies are highly multidimensional – many species, many samples – which makes it difficult to visualise the main patterns in the data. One way to cope is to use ordinations to find the main axes of variability in the data. I run ordinations on every core I analyse. In this post, I will run a variety of ordinations on the Lake Żabińskie chironomid stratigraphy from Larocque-Tobler et al (2015). I will compare the results with analyses of the chironomid training set and of the Round Loch of Glenhead diatom stratigraphy which shows the response to acid rain over a 140 year period.

This post will inevitably include an alphabet soup of methods that I don’t have space enough or time to fully explain. Apologies.

### Detrended correspondence analysis

I always start my ordination analysis with a detrended correspondence analysis (DCA). I’m interested in the length of the first axis of the DCA in standard deviation (SD) units, a measure of the amount of species turnover along the ecological gradient. An axis length of 4 SD would give almost complete turnover of species (think of a normal distribution, four standard deviations goes from one side to the other).

This axis length helps me choose an appropriate ordination method for further analyses.

library(vegan) library(rioja) data(RLGH)#load Round Loch of Glenhead data zdca <- decorana(sqrt(fos)) rdca <- decorana(sqrt(RLGH$spec)) keep <- colSums(spp > 0) >= 3 cdca <- decorana(sqrt(spp[, keep]))#training set

The length of the first axis of the Round Loch of Glenhead diatom stratigtaphy is 0.68 SD. This short axis length is typical for a core. The length of the first axis of the Żabińskie chironomid stratigraphy is 3.13 SD. This is remarkably long. By comparison, the first axis length of the chironomid training set is 3.23 SD. So although the temperature range spanned by the training set (3 – 27.5°C), is more than three times larger than the 20^{th} century temperature range at Żabińskie (12.9 – 20.7°C estimated from the reconstructions), and the training set covers a diverse range of lake depths and other ecologically important environmental variables, the first axis lengths are practically the same. This, like so much with the Żabińskie record, is remarkable.

### Unconstrained ordination

The first axis of an ordination is always the most important one. We can visualise how much more important than the other axes it is with a screeplot.

library(ggplot2) g1 <- ggscreeplot(rda(sqrt(RLGH$spec)), bstick = FALSE, title = "Round Loch of Glenhead PCA") g2 <- ggscreeplot(cca(sqrt(spp[, keep])), bstick = FALSE, title = "Chironomid training set CA") g3 <- ggscreeplot(cca(sqrt(fos)), bstick = FALSE, title = "Żabińskie chironomids CA")

For the Round Loch of Glenhead, the first axis is very much more important than the remaining axes. The samples form a baguette-shaped cloud. This implies that a single environmental gradient (probably pH) is responsible for the changes in the diatom stratigraphy.

For the chironomid training set, the first axis is much more important than the second, which in turn is much more important than the third. The importance of the second axis is probably inflated by an arch effect (visible in a biplot of the ordination), a common artefact in correspondence analysis which could be corrected with a DCA. The samples form a croissant-shaped cloud. This shape implies that there is one main environmental gradient through the training set – almost certainly temperature or something highly correlated with temperature.

For the Żabińskie chrionomid record, the first axis is only slightly longer than the second, which is only slightly longer than the third. The samples form the shape of a slightly deflated football. This shape, together with the long DCA axis 1, would only be expected if 1) the chironomid community was being driven by multiple strong and uncorrelated environmental variables, or 2) the chironomid count sums are very small so there is lots of counting noise. In either case, a very poor temperature reconstruction would be expected. But yet the reconstruction is near-perfect. Remarkable.

We can ask if the environmental variable of interest is correlated with the first few ordination axes. This cannot really be done for the Round Loch of Glenhead as we only have a reconstruction of pH. For Żabińskie, we should use the measured temperatures, but these have not been archived, so I am going to use the reconstruction instead which will bias the result in favour of the temperature being important.

camodm <-cca(sqrt(spp[,keep])) #plot(camodm)#Arched envfit(camodm, env, choices = 1:2)

## ## ***VECTORS ## ## CA1 CA2 r2 Pr(>r) ## [1,] -0.98340 0.18147 0.4872 0.001 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## Permutation: free ## Number of permutations: 999

With the chironomid training set, temperature is associated with the first axis (CA1) of the ordination, and the r^{2} is relatively good. This is expected from a training set with a large temperature range.

camod <-cca(sqrt(fos)) #plot(camod) envfit(camod, recon$temp, choices = 1:2)

## ## ***VECTORS ## ## CA1 CA2 r2 Pr(>r) ## [1,] -0.8398 0.5429 0.0461 0.104 ## Permutation: free ## Number of permutations: 999

envfit(camod, recon$temp, choices = 1:3)

## ## ***VECTORS ## ## CA1 CA2 CA3 r2 Pr(>r) ## [1,] -0.31427 0.20316 -0.92734 0.3302 0.001 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## Permutation: free ## Number of permutations: 999

With the Żabińskie chrionomid record, reconstructed temperature is not significantly correlated with the first two ordination axes. It is, however, correlated with the third axis and has a moderate r^{2}. This suggests that temperature is not particularly important in driving community composition.

### Constrained ordinations

Constrained ordinations let us know in a more direct way how important an environmental variable is.

mod1 <- cca(sqrt(spp[, keep]) ~ env) mod1

## Call: cca(formula = sqrt(spp[, keep]) ~ env) ## ## Inertia Proportion Rank ## Total 3.29854 1.00000 ## Constrained 0.24295 0.07365 1 ## Unconstrained 3.05560 0.92635 86 ## Inertia is mean squared contingency coefficient ## ## Eigenvalues for constrained axes: ## CCA1 ## 0.24295 ## ## Eigenvalues for unconstrained axes: ## CA1 CA2 CA3 CA4 CA5 CA6 CA7 CA8 ## 0.30349 0.25509 0.16501 0.12440 0.10133 0.09869 0.09055 0.08762 ## (Showed only 8 of all 86 unconstrained eigenvalues)

With the chironomid training set, temperature explains about 7% of the inertia (which is low but not unreasonable), and the ratio of the inertia of the constrained axis to the first unconstrained axis is 0.8. If temperature was (correlated with) the most important ecological gradient in the data set, we would expect this ratio to be greater than one.

mod2 <-cca(sqrt(fos) ~ temperature, recon) mod2

## Call: cca(formula = sqrt(fos) ~ temperature, data = recon) ## ## Inertia Proportion Rank ## Total 3.38291 1.00000 ## Constrained 0.12734 0.03764 1 ## Unconstrained 3.25558 0.96236 49 ## Inertia is mean squared contingency coefficient ## ## Eigenvalues for constrained axes: ## CCA1 ## 0.12734 ## ## Eigenvalues for unconstrained axes: ## CA1 CA2 CA3 CA4 CA5 CA6 CA7 CA8 ## 0.22205 0.20687 0.18918 0.18282 0.16599 0.15822 0.14259 0.13470 ## (Showed only 8 of all 49 unconstrained eigenvalues)

With the Żabińskie chironomid record, reconstructed temperature explains about 4% of the variance, and the ratio of the inertia of the constrained axis to the first unconstrained axis is only 0.57. Even the eighth unconstained axis has more inertia than the axis constrained by temperature. All this suggests, yet again, that temperature is not the most important control on chironomid assemblages in Lake Żabińskie.

### Conclusions

It is very difficult to reconcile the strange ordinations and the poor performance of temperature as a predictor of the chironomid assemblages with the remarkably good performance of the August-air temperature reconstruction.

I’m kinda curious as to where this is going, other than you clearly not trusting the Żabińskie dataset. Actually, I have a question: do you already know the answer, and are teasing it out; or do these posts genuinely reflect each stage of you exploring the dataset?

The RHS Żabińskie PC plot looks odd to me; from what I recall of my days doing PCA,, even random datasets usually have a higher PC1 than that. Doesn’t the scree more usually fall off as a curve than linearly? I may be mis-remembering.

I don’t know the answer, but I am hoping to encourage those who have the authority, if not the enthusiasm, to find out.

Some of these posts (like this one) are genuinely exploratory, others are exploring issues I’ve been aware of for some time.

A fully random dataset would look much like the Żabińskie. Try screeplot(princomp(matrix(rnorm(10000), ncol = 10)))

Even a small amount of correlation between the variables would give a reverse-J shaped curve

Pingback: Replicating the Lake Żabińskie reconstruction | Musings on Quantitative Palaeoecology

Pingback: The Humpty Dumpty theory of palaeoecology | Musings on Quantitative Palaeoecology

Pingback: why would anyone not trust the author???? | Musings on Quantitative Palaeoecology