I don’t always comment on papers that use transfer functions but neglect to consider how spatial autocorrelation in the modern calibration set might make the reconstructions spuriously precise. It gets tedious, especially when the same authors make the same mistakes time and again. But sometimes I am asked to review such papers, and I oblige.
One such paper was Wary et al (2017), published yesterday. The paper suggests that when Greenland and the North Atlantic cool during Dansgaard–Oeschger oscillations, the surface of the Norwegian Sea warms, and vice versa. The warmth in the Norwegian Sea is reconstructed from the cysts of dinoflagellates, which live near the surface. Cold subsurface conditions in the Norwegian Sea are reconstructed from planktic foraminifera.
Since Wary et al was published in Climate of the Past, the complete peer review process – reviews, editors comments and author replies – is publicly available. This now includes the second round of reviews/editor comments which were previously hidden.
Lacking the expertise to critique the physical plausibility of this regional see-saw, my review focused on the dinocyst reconstructions. The paper reconstructs summer and winter sea surface temperatures and salinities, together with sea-ice duration. I really doubt that all five variables can be reconstructed independently, especially since most dinoflagellates overwinter in cysts on the sea floor.
I criticised the paper reporting model performance statistics from a cross-validation scheme (either leave-one-out or k-fold cross-validation – the paper is not clear) that ignores the considerable spatial autocorrelation in the calibration set, and suggested that the true uncertainty was severely underestimated. I also criticised the lack of reconstruction diagnostics to help the reader evaluate the reconstructions.
The editor agreed these were important concerns. So how did the authors respond?
The authors added – as they had promised – a plot showing the taxonomic distance of each fossil sample to the nearest analogue in the modern calibration set. They claim this plot will “ensure that one can assess by his own the reliability and robustness of our reconstructions”.
Well good luck with that. Usually, plots of the distance to the nearest analogue show some reference levels (often the 5th and 10th percentile of all distances in the calibration set) that the distances can be compared with. Wray et al do not, so there is no way to know if the distances are high or low (a problem exacerbated by the absence of information on which distance metric was used, hampering replication). This figure is almost useless.
Wary et al rely on Guiot and de Vernal (2011a, b) (a paper and their response to a comment) in support of their assertion that
parallel studies equally based on cross-validation schemes showed that this spatial autocorrelation has in fact relatively low impact on the calculation of the error of prediction of the MAT transfer function applied to dinocyst assemblages.
Unfortunately, Guiot and de Vernal (2011a, b) is a strong contender for one of the worst papers ever published in Quaternary Science Reviews, managing to simultaneously demonstrate and deny that autocorrelation is a problem, and use an irrelevant test to prove nothing. It is absolutely not evidence that autocorrelation is not a serious problem for transfer functions.
The authors also cite de Vernal et al (2013a) and de Vernal et al (2013b) as further evidence that autocorrelation is not a problem for dinocyst transfer functions. However neither paper even attempts to test if autocorrelation leads to an overestimation of model performance. Both papers use k-fold cross-validation. This is only minutely less sensitive to autocorrelation than leave-one-out cross-validation: it is a solution to autocorrelation to the same extent that a sieve makes a good boat.
The authors graciously cite several of my papers which demonstrate that utocorrelation is a problem and suggest means to identify it and deal with it. However, I would much rather that instead of contributing towards increasing my h-index, the authors had engaged with the h-block cross-validation scheme I proposed. In conclusion, Wary et al is yet another wasted opportunity to determine the true utility of dinocyst-based transfer functions.