Requiem for a transfer function

Organic walled dinocysts, the resting stage of some dinoflagellate species, were a promising microfossil group for reconstructing past environmental conditions based on the relationship between the modern environment and dinocyst assemblages in modern sediment.

  • The cysts are resistant to the dissolution that affects calcareous microfossils such as foraminifera and siliceous microfossils such as diatoms in some key parts of the ocean.
  • There are reasonably diverse assemblages even at high-latitudes, where planktonic foraminifera are mono-specific.
  • They appear to be useful for reconstructing several important oceanographic variables.

One fraction of the dinocyst-user community believes that they can generate precise palaeoenvironmental reconstructions from dinocyst assemblages using the modern analogue approach (MAT). For each fossil sample, MAT finds the k most taxonomically similar samples in the modern calibration set (where k is typically 5-10), and calculates the (weighted-) mean of the environmental variable of interest. MAT makes few assumptions about the data, unlike some other transfer function methods which assume, for example, that the species have a unimodal or linear relationship with the environment.

I first became interested in dinocyst-reconstructions in 2005, when the MARGO last glacial maximum (LGM) reconstructions were published, showing warmer-than-modern conditions in the middle of the Norwegian Sea. This was difficult to reconcile with other reconstructions and models of LGM climate. On examining the data, it became clear that the dinocyst calibration set had serious problems. For example, all 7 observations from the Celtic Sea found all their analogues within this set of observations. As the environment varies little within the Celtic Sea, any cross-validation estimates of any environmental variable, including geomagnetic field strength, will appear to be precise. I published my findings, the “Limitations of dinoflagellate cyst transfer functions“, in which I concluded that

the uncertainty of the summer temperature and ice cover transfer functions has been substantially underestimated because the strong spatial structure in the data set has been ignored, and that there is little evidence that either winter sea-surface temperature or salinity can be independently reconstructed.

Some key members of the dinocyst-user community have not accepted this. Instead, they have authored  a fault-filled paper attacking me and my work; continue to generate and publish possibly spurious reconstructions without critical appraisal of the methods; and continue to train PhD students to use these methods.

I wrote about the dinocyst sea-ice transfer function a couple of months ago. Now, having read Balestra et al (2013), who generate dinocyst-inferred salinity reconstructions, and needing to run some analyses on the dinocyst calibration set to support a manuscript on a Baltic-Skagerrak salinity transfer function that I am working on with several collaborators, it is time to revisit the dinocyst-salinity transfer function.

Balestra et al write that the salinity transfer function has a prediction error of 1.7 psu, citing Bonnet et al (2012). I have the data (see this training manual), and I calculate the error to be 2.3 psu (Milzer et al (2013), using a slightly older version of the data, report 2.4 psu). I am confident that this discrepancy is not due to methodological details as I can tolerably replicate the prediction error for all the other environmental variables in Bonnet et al (2012). Instead, I am convinced that it is a simple transcription error: Bonnet et al report that the summer temperature model has a prediction error of 1.73°C (estimated by leave-one-out cross-validation) and 1.55°C (estimated from a validation dataset), and give the same numbers for the salinity model. To get the exact same answer once would be surprising, twice is implausible.

This prediction error, taking it at face-value, is considerable given the salinity variability in the open ocean: most of the open ocean is within the range 35+/-2.4 psu. This suggests that the model will be of limited utility in the open ocean, but, optimistically, might be more use in coastal systems where larger changes in salinity are possible.

Fig 1. Sea surface salinity

A plot of the predicted against measured salinity suggests this optimism was misplaced. The apparent performance is not uniform along the gradient, but very much worse at the low salinity end. At sites below 25 psu, the r2 between predicted and measured salinity is 0.05. Below 22 psu, the correlation is negative! The best that can be hoped for with this transfer function is that low salinity system are identified as such, the absolute reconstructions given are meaningless.

Figure 2. Predicted vs measured salinity

Fig 2. Predicted vs measured salinity

There are at least two reasons for this poor performance in low salinity sites. The first is that this is a result of the uneven distribution of observations along the salinity gradient. There are many more saline sites than brackish sites in the training set, so the saline sites have many potential analogues to choose from, and so are more likely to get a good match (see Telford and Birks 2011 for details).

Fig 3. Histogram showing distribution of sampling effort along the salinity gradient. Red line shows prediction error for sites in each tenth of the gradient. Blue line shows naive model prediction error.

Fig 3. Histogram showing distribution of sampling effort along the salinity gradient. Red line shows prediction error for sites in each tenth of the gradient. Blue line shows naive model prediction error.

The second reason is the lack of specialist low salinity dinocysts in the calibration set : taxa are either restricted to high salinities, or have broad distributions. In principle, the inherently variable salinities of brackish systems could be blurring the distribution of specialist low salinity species to make them appear to have broad distributions. However, it makes no material difference whether the absence of specialist low salinity species is real or only apparent.

Salinity ranges of dinocysts in the calibration range. Black lines mark the total range (first to last occurrence), blue the range with abundances >1%, red the range with abundances >10%. Note that this is not a robust analysis - a single outlier can extend the range.

Fig 4. Salinity ranges of dinocysts in the calibration range. Black lines mark the total range (first to last occurrence), blue the range with abundances >1%, red the range with abundances >10%. Note that this is not a robust analysis – a single outlier can extend the range.

So if the dinocyst-salinity transfer function is of doubtful utility in low salinity waters, what about high salinity waters? From figure 3 it would appear that the model has very high precision at normal ocean salinities. Can this be trusted?

I am going to argue that it cannot be trusted.

First, there is the problem that I raised in Telford (2006), that during cross-validation many calibration set observations select observations from the same geographic cluster as analogues. The Celtic Sea, mentioned above, is not the only cluster with 100% specificity and exclusivity, 13 observations off south Morocco find all their analogues within this cluster and are not used as analogues by any other observations. At a larger scale, 97.7% of the analogues selected by observations in the Mediterranean are from the Mediterranean and these 79 observations are selected as analogues by other calibration set observations only 13 times. Larger scale still, 93.2% of the analogues selected by Pacific observations are from the Pacific.

This is a problem. We can tell there is something ecologically (or taxonomically) special about these clusters with high specificity and exclusivity, but we cannot tell what. Salinity might be important in making these clusters ecologically distinct, but we simply cannot tell. If could be any one of many environmental variables, alone or on combination, that makes each cluster special. To determine which variables are important, we need to find repeatable patterns in the data (i.e. a certain assemblage is characteristic of certain conditions). This requires replication. The current calibration set has little replication. Instead, each cluster samples essentially the same patch of ocean: this is pseudo-replication. As much of the environmental variance is between clusters rather than within clusters, any observation that identifies analogues from the same cluster  during cross-validation will have an accurate prediction, regardless of the ecological importance of the environmental variable. This would be a problem for any transfer function method, but it is worst with MAT as MAT does not attempt to find general patterns in the data.

The second problem concerns the limited sampling of environmental space. For example, there are no warm brackish sites in the calibration set (Fig 5). This means that if an assemblage is recognised as being characteristic of warm waters, it is automatically classified as being from full marine salinity. This will give the transfer function an artificial performance boost.

Summer salinity against temperature

Figure 5. Summer salinity against temperature

The final issue I want to raise concerns the dinocyst-environment relationships. One of the assumptions of transfer functions (see eg Birks et al 2010) is that “the mathematical methods used adequately model the species responses” to the environment. Some methods assume a unimodal species-environment response, others a linear response, still others just assume a smooth response. MAT makes no assumptions about the shape of the species-environment relationship. This might seem like an advantage. It might be in cases when there the species-environment relationships are complex, but it makes MAT very sensitive to incomplete sampling of the species’ environmental niches.

Fig 6. Dinocyst relative abundance against salinity (click to enlarge)

Fig 6. Dinocyst relative abundance against salinity (click to enlarge)

Some of the dinocysts tolerate a broad range of salinities, others are only found over a narrow salinity range. These narrow-ranged taxa would normally be considered to be excellent environmental indicators.

Consider for example, the taxon coded GYMN (bottom row). It is only common (>5%) in the salinity range 36.1-36.7 psu.  If this taxon is common in a fossil assemblage, we would normally expect the palaeosalinity to be in this narrow range, a very precise estimate of a difficult to reconstruct parameter. However, this conclusion is only valid if the taxon’s potential niche has been adequately sampled. Consider the following schematic diagram which represents alternative mechanisms for generating a narrow range for a taxa in a training set and their consequences. 

Schematic showing the taxon's niche (colours) in environmental space. Ovals with solid outlines represent the sampling clusters in the training set. Oval with dashed outline represents a cluster after environmental change.

Fig 6. Schematic showing the taxon’s niche (colours) in environmental space. Ovals with solid outlines represent the sampling clusters in the training set. Oval with dashed outline represents a cluster after environmental change.

The upper panel represents the case where the taxon genuinely has a narrow niche. If the environment changes, this taxon will no longer be found. The lower panel represents the case where the taxon appears to have a narrow environmental range because only part of the taxon’s niche has been sampled (note it is possible that not all of the taxon’s niche is available in the environmental space of the modern ocean). If the environment changes, this taxon will still be present, and reconstructions based on its presence will be erroneous.

Is it possible to tell if the narrow salinity ranges of taxa in the calibration set represent narrow niches (good) or undersampling of environmental space (bad)? I think that the number of taxa with bimodal relationships with the salinity gradient is indicative of undersampling. Bimodal species-environment relationships could indicate a genuinely bimodal relationship, perhaps because of competitive exclusion from part of the range, or because two or more cryptic species with different environmental requirements are being considered together. Alternatively, a bimodal species environment relationship could indicate that the environmental space has been under-sampled. In view of the lacuna in the summer salinity-temperature space (fig 5), it is probable that the species’ niches have been undersampled, and thus the transfer function will generate erroneous reconstructions.

In view of these problems, the dinocyst-salinity transfer function is dead. Its reconstructions cannot be trusted. It should not be used in its current state.

But perhaps it is just pining for the fjords.  Perhaps there are environments where dinocysts could be used to reconstruct salinity, with a more focused transfer function rather than the melange of environments in the current calibration set. This work is in progress (honest).



About richard telford

Ecologist with interests in quantitative methods and palaeoenvironments
This entry was posted in climate, Peer reviewed literature, transfer function and tagged , , , . Bookmark the permalink.

3 Responses to Requiem for a transfer function

  1. Jim Bouldin says:

    This looks really interesting Richard, looking forward to reading it.

  2. Pingback: Dinocysts reconstructions vs. climate models | Musings on Quantitative Palaeoecology

  3. Pingback: Effect of incomplete sampling of environmental space on transfer functions | Musings on Quantitative Palaeoecology

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s