Sea ice is an important climatic variable, affecting the Earth’s albedo and the transfer of heat and gas between the ocean and the atmosphere. Naturally palaeoclimatologists want to be able to reconstruct past variability in sea ice and several different methods have been developed.

One popular method is to develop a transfer function based on the relationship between sea ice and microfossil assemblages (e.g., dinocysts or diatoms) in modern ocean sediments, and use this to estimate past sea ice cover from microfossils in sediment cores.

After the diatoms are counted, the first step in developing a transfer function is often to show that the environmental variable of interest explains a significant and independent portion of the variance in the modern assemblage data in a constrained ordination such as a constrained correspondent analysis. (This post is going to be slightly technical.)

In this post I want to explore the possibility that the apparent ability of sea-ice to explain an significant independent fraction of the variance may be an artefact of the methods used.

Artefacts, features that appear in the analysis that are not present in the data, are a problem for ordination methods. For example, a principal component analysis can show a horseshoe artefact given species data with a large amount of taxonomic turnover. Correspondence analysis, an ordination method that copes better than PCA when there is a large amount of taxonomic turnover, can show an arch effect if the first dimension of the data is much longer than the second dimension. The arch effect is generally a less problematic artefact than the horseshoe, and is usually absent from constrained correspondence analysis (CCA) which includes environmental predictors in the analysis.

I’m going to start by considering Justwan & Koç (2008) who present a 99-observation calibration set of modern diatom assemblages and paired environmental data. They run a CCA of the diatom data constrained by August sea surface temperature (SST) and May sea ice.

Take a moment to notice the strong arch in their CCA.

Together, SST and sea ice explain 59% of the inertia (think “variance”) in the diatom data, with 30% explained uniquely by SST and 23% uniquely by sea ice. The statistical significance of these values is not given, but they are undoubtedly highly statistically significant under standard tests. To a diatomist used to working with limnological datasets, these values are amazingly high.

Given these impressive numbers, why am I sceptial? I am worried about that arch – almost certainly an artefact – and how it is affecting these results.

The arch can occur in a CCA when the constraining environmental variables provide no real constraint, typically when many predictors are used, and can be encouraged by using quadratic terms as predictors in the model (eg SST-squared). With only two predictors and no quadratic terms, this shouldn’t be a problem. Should it?

But the SST gradient probably lies very close to the main axis of the diatom data. This will mean that the unconstrained and constrained correspondence analysis will look rather similar and that SST is not a strong constraint. Sea ice is, in many datasets, a non-linear transformation of SST. Here is an example from the southern ocean.

My worry is that this non-linear relationship is sufficient to provoke the arch effect, and that the large amount of inertia explained by sea ice is just the inertia of the arch effect rather than a real ecological effect.

It’s not easy possible to test this hypothesis with real data, but it is easy to test it with simulated data. I’ve simulated the abundance of species across a grid of environmental variables. The long axis of the grid, representing SST, is about three times the length of a second variable which we are not interested in (representing salinity or nutrient conditions). There are about 20 species in the simulation, they have unimodal relationships with the environment and a turnover of >3 SD units. Sea ice is represented by inverting the SST gradient, adding the mean SST, and then truncating the sea ice values at zero. Sea ice is **not** used to simulate the species data.

As a sole variable, SST explains 58 % of the inertia in the simulate species data. SST together with sea ice explain 71% of the inertia. This is almost as much as SST and SST-squared explain (72%). Both SST and sea ice are highly significant (p>0.001) predictors of the simulated species data.

With simulated species data that captures some of the characteristics of diatom-sea ice calibration sets, sea ice appears to be an important predictor even though it does not affect the species assemblages. This is an artefact of the non-linear relationship between SST and sea-ice which allows sea-ice to appear to explain the inertia captured by the arch. If this problem can occur in simulated data, it could also occur in real data.

It is not immediately clear to me how the methods used by Justwan & Koç and others could be adapted to demonstrate that sea-ice has a real influence of the biotic assemblages. I wonder if a detrended-CCA would work. I might need to dig out my copy of CANOCO and give it a try.

Justwan, A & Koç, N (2008) A diatom based transfer function for reconstructing sea ice concentrations in the North Atlantic. *Marine Micropaleontology* 66: 264–278.

Richard – what is meant by taxonomic turnover? Diversity?

Beta-diversity – Replacement of species along the environmental gradient.

CCA is often described as a unimodal/non-linear ordination method, but what is overlooked is that this relates only to the response variable. The effects of the constraints are distinctly linear, unless you do something to allow for non-linear (in the English language sense) effects. This is just the same as in a linear model (in fact CCA is just a weighted multivariate linear model). If the envisaged effect of a constraint is nonlinear, there is nothing (beyond the problems of too many constraints, correlations between constraints, etc) in making a CCA with polynomial terms for Sea Ice or SST, or even generating a spline basis as using those basis functions as constraints. This is really easy to do in vegan, but in Canoco only polynomial terms would be trivial to add.

Another, very useful tool would be

`ordisurf()`

, applied over an unconstrained ordination of the data. As`ordisurf()`

fits a smooth surface using splines to the ordination it allows for non-linear “responses” in the ordination space, and that would be a way to check if the linearity assumption implicit in CCA is reasonable.Alternatively, using VGAM or mvabund to fit proper statistical models to each species allowing for non-linear effects of Sea Ice or SST variables alongside the usual unimodal response model for each taxon.