I was alerted to Guiot and de Vernal (2011; GdV11) by Google Reader one morning while reading in bed and I drafted a rejoinder before I got up. Unfortunately not all mornings are that productive, but fortunately I don’t read many papers as bad as GdV11.
The basis of the rejoinder was that evidence that autocorrelation biases transfer-function performance statistics was incontrovertible, and that the prediction that GdV11 made and tested was not relevant, so their conclusions were unsubstantiated.
Guiot and de Vernal were offered and took the opportunity to write a reply. Their text is below, with my comments in red.
Guiot, J. & de Vernal, A. (2011) QSR Correspondence “Is spatial autocorrelation introducing biases in the apparent accuracy of palaeoclimatic reconstructions?” Reply to Telford and Birks. Quaternary Science Reviews 30, 3214–3216
By this sentence “If the community is content to condone poor statistical practise, it should brace itself for strident criticism from outside the scientific community”, Telford wrote something deeply true. He positioned himself as guardian of the palaeoclimatological temple. Why not? Many scientists produce data and results others produce wealth of advice. Unfortunately the advices of Telford and Birks are restricted to criticisms, which are not all scientifically grounded, and appear sometimes biased and unacceptable.
[I think this text is best described as stretching the boundaries of academic discourse. I confess I didn’t know that there was a palaeoclimatological temple: I do hope it is fashioned after a beautiful diatom, not some freakish organic-walled microfossil. Who is the high-priest? Do they wear a silly hat and chant the creed “We believe in one transfer function, and autocorrelation does not matter, …”? Is “biased and unacceptable” code for “blasphemous and heretical“?
Guiot and de Vernal are apparently unable to use Google to find my empirical papers, and are unaware of the reasons why I can no longer count diatoms. Still, I would rather produce a wealth of advice than the poverty of reason that pervades GdV11.]
They are also misleading in many cases since users of statistical paleoclimate reconstructions do not necessarily possess the background required to assess the pertinence of these criticisms.
[Guiot and de Vernal stand first in line.]
Nevertheless, the aim of Guiot and de Vernal’s (2011), paper under criticism in the above mentioned Telford and Birks’s comment, was not to discuss Telford’s papers nor to prove that modern analogue technique (MAT) was better than weighted averaging partial least-squares technique (WA-PLS), but to show that most of the results published with MAT since three decades, remain statistically sound. Hence, responses to the criticisms of Telford and Birks (2011), appear below.
[For a paper that did not aim to discuss my work, it did surprisingly little else. GdV11 also singularly fails to show that the results of their earlier work are sound, as GdV11 removes the effects of autocorrelation, the earlier papers do not.]
1) MAT is a non-parametric method while WA-PLS is a parametric method. The advantage of non-parametric methods is that they do not contain any hidden hypotheses about the structure of the data, but the inconvenience is the difficulty to assess its prediction capability. The robustness of both methods does not rely on the same hypotheses, which makes a big difference. The spatial structure of the variables is important for both types of methods. However, in the case of calibration methods (such as regressions or WA-PLS), an implicit hypothesis is that the residuals are not correlated and have constant variance (Draper and Smith, 1966). The non-parametric methods do not have such requirements. We know that the spatial structure of the variables remains problematic to assessing the predictability of all types of methods and this is recognized in our paper (Guiot and de Vernal, 2011). Therefore, when evaluating the predictability of MAT, the root mean squared error (RMSE) statistics are used to assess on the fact that the proxy assemblages contain climatic information, and the root mean squared error of prediction (RMSEP) is used to check if the method is robust for prediction. This evidently requires that the RMSEP is calculated on truly independent data. However, as spatial autocorrelations are a consequence of the bio-physical significance of the data, we have to cope with them.
[I want to celebrate the first sentence for it is unambiguously correct. The rest of the paragraph make the first-year undergraduate error of assuming that because a method is non-parametric it is assumption free. Non-parametric methods do avoid the assumption that the data arise from a given parametric distribution, but the other assumptions remain. Consider for example the assumptions of the t-test and the Mann–Whitney–Wilcoxon test.]
2) Another point, missed by Telford and Birks (2011), is that MAT does not optimize any fit with the data. MAT simply looks for samples with the same ecological composition. If this composition has a climatic meaning, the fit should be good with a low RMSE, which means that two similar assemblages have the same climatic constraints. It is thus expected that similar assemblages are often geographically closer than dissimilar ones, which makes the spatial structure a necessity, not a problem. Unlike MAT, parametric methods calculate some coefficients to optimize the fit, which is measured by the RMSE. When we exclude geographically close samples as potential analogues, we impose a much more drastic constraint to the verification process than with other methods, because it is not insured to have the good analogues elsewhere. Consequently, it can be expected that the ratio RMSEP/RMSE is lower with MAT than with WA-PLS.
[It is ridiculous to write that MAT does not optimise any fit with the data. That is what the dissimilarity coefficient is doing, making the optimal choice of analogues.
A MAT model having a low RMSE does not guarantee that the variable being reconstructed is ecologically meaningful. Even models trained on simulated data can have a low RMSE. This is the crux of the problem – how to tell if the model is genuinely good or if it only appears to be good because of autocorrelation. If the model still performs well when geographic neighbours are excluded, then the model has proven prediction power. If the model does not have demonstrated power to make predictions in space, why should any credence be given to its power to make predictions in time.]
3) For verification purpose, we have used a region located in the middle of the environmental space. Telford and Birks (2011) criticized this choice as being arbitrary. It is not. Guiot and de Vernal (2007) chose extreme cold and warm regions to verify the extrapolation capability of the transfer functions, including parametric and non-parametric methods. Telford and Birks (2009; p. 1315) criticized also this approach arguing that samples located at the separation between calibration and verification datasets are geographically close, which induces some autocorrelation. This has been avoided in the 2011 paper by isolating a central region. Hence, despite the claims of Telford and Birks, the various verification tests made from different locations, central or distal, have proved the robustness of MAT. Furthermore, as complementary verification tests, we have also applied the h-block tests, which confirm the findings from other tests. This was clearly explained in Guiot and de Vernal (2011).
[A whole paragraph to complain against one word, “arbitrary”, in the rejoinder they objected to. Nothing of substance. ]
4) Telford and Birks wrote “the claim made by Guiot and de Vernal (2011) that transfer function residuals should not have spatial structure is erroneous”. Correlation and thus spatial structure between the residuals of different variables could be due to what Telford and Birks called a nuisance variable. They mentioned that the absence of residual autocorrelations should be due to an inappropriate internalization of the nuisance spatial structure. Actually, when climatic variables are strongly correlated, they also have a similar spatial structure. So we found that latitude and longitude explained 98% of the winter temperature variance, 90% of the winter salinity, 99% of the summer temperature and 94% of the summer salinity. With such results, internalization is possible. With MAT, it is not a problem as we consider climate as a vector of four variables that we reconstruct together, assuming that they determine the assemblages. MAT is an integrative approach unlike WA-PLS, which reconstructs the four variables separately. With WA-PLS, internalization might thus be a problem and the remaining autocorrelation of the residuals is another one (see point 1 above).
[Either the text in GdV11 has been poorly translated, or Guiot and de Vernal are misinterpreting/misdescribing their own results. The numbers quoted in this paragraph are described by GdV11 as “The coefficient of correlation (R2) of the interpolation”, which I understood to be the correlation between the raw and gridded data. A high correlation between raw and gridded data does not necessarily imply a high correlation between the data and latitude and longitude. Instead, it means that the data are smooth.
It is perfectly fine to consider four environmental variables, even more should you please, but don’t make the mistake of assuming that the organisms consider the variables, or that the reconstructions are valid just because the method is integrative.]
5) Telford and Birks (2011) proposed the simulated environmental variables as being a panacea. It is true that it is used in a number of statistical papers, as it is easier to work on simulated (idealized) data than on real data. However, the simulation of data must be done with cautious. In the present case, simulated data must reproduce the spatial structure of the environmental data. However, because 90–99% of the variance of the environmental data can be reproduced with only longitude and latitude variables, it is clear that simulated variables might be very close to the original data and that a transfer function is expected to reproduce reasonably well these simulated data. This is illustrated from Fig. 2 of Telford and Birks (2011), which is complemented by Fig. 1 below. Fig. 2b of Telford and Birks (2011)shows that the r2 of MAT applied on simulated variables is higher than it is when applied on observations in approximately 5% of the cases. Fig. 1 below shows that environmental variables simulated with a range of 2000 km or 5000 km have a correlation (in absolute value) with observed values higher than 0.31–0.43 (according to the variable) in 5% of the cases. This correlation increases to 0.43–0.54 when the range is increased to 5000 km. Large spatial ranges thus make the simulated data significantly closer to the real data. By nature, MAT is more sensitive than WA-PLS to this property. The argument of Telford and Birks (2005) and of Telford and Birks (2011), based on their Fig. 2, does not prove at all that WA-PLS performs better relative to the null model because simulated data are not expected to favour the null model.
Fig. 1. […]
[There are none so blind as those who will not see. With this paragraph, Guiot and de Vernal neatly demonstrate the problems inherent with spatially autocorrelated data: chance alone can make the relationship between variables appear to be strong.]
It was argued by Telford (2006) that “The sites in the 940 dinoflagellate training set (de Vernal et al., 2005) are not evenly distributed in space, either geographically or environmentally, instead many sites are in clusters. This arrangement favours the modern analogue technique, which can select geographically local analogues, constrained to have very similar environmental characteristics”. This is precisely the reason why we have gridded the reference dataset! As each grid point is distant of approximately 130 km, the possibility to select too local analogues is reduced.
[Please forgive me for not anticipating in 2006 what Guiot and de Vernal would write in 2011. The gridded analysis is completely irrelevant as it has not been used by any of the papers I criticised, or any others, so any benefits gridding might have does not to apply to them.]
7) Telford and Birks also criticized the use of the organic-walled dinoflagellate cysts (or dinocysts) that are “resting on the sea sediment” as tracers of sea-surface conditions. By doing so, Telford and Birks (2011) questioned the grounds of palaeoclimatological and paleoceanographical studies using any microfossil proxies. Whereas taphonomical processes cannot be ignored, the basic assumption that microfossil populations are related to the environments in which the original populations developed applies to the cysts of dinoflagellates as well as planktic foraminifers, diatoms or coccoliths (see for example the papers by Crosta and Koc, 2007, de Vernal and Marret, 2007, Giraudeau and Beaufort, 2007 and Kucera, 2007).
[This paragraph shows an egregious misreading of the rejoinder. The sentence referred to reads “It is, for example, unlikely that the overwintering cysts of coastal dinoflagellates resting on the sea sediment are directly sensitive to the conditions in winter at the sea surface, tens to hundreds of metres above them.” This clearly refers to the cyst as the resting or overwintering phase of the dinoflagellate life-cycle (i.e. dormant and thus not responsive to its environment), not any taphonomic problem (which are possibly more severe in organic-walled dinocysts than many other microfossils).]
For what concern dinoflagellates and their cysts, which occur in a wide range of environmental conditions, the analyses of surface sediment samples have shown that sea-surface temperature, salinity and seasonality, including winter to summer temperature contrasts and sea-ice cover extent, are playing important role on the distribution of microfossil assemblages (cf. for example, de Vernal et al., 1997, de Vernal et al., 2001, Rochon et al., 1999, Marret and Zonneveld, 2003, Holzwarth et al., 2007, Radi and de Vernal, 2008, Grøsfjeld et al., 2009, Bonnet et al., 2010, Elshanawany et al., 2010 and Shin et al., 2011).
[None of these papers have tested these variables correctly; all have used tests that assume that the observations are independent. Some of these variables certainly are ecologically important. Whether they are important enough to be meaningfully reconstructed is another issue, but one which is very important for the dinocyst user community.]
Finally, unlike what is suggested by Telford and Birks, it is not true that our paper is restricted to the comparison of two methods, MAT and WA-PLS. Our objective was to show the suitability of MAT behaviour according to spatial structure of environmental fields in response to Telford and Birks (2009) who concluded about MAT that “some of these transfer functions, with apparently good performance statistics, have no predictive power”.
[Since Telford & Birks (2006, 2009) showed that transfer functions trained with simulated environmental data could have good performance statistics, but obviously have no predictive power, there is a risk that some published transfer functions from autocorrelated environments are also spurious. The sunshine reconstructions of Fréchette et al (2008) are an obvious candidate. I did not write, and do not believe, that all MAT reconstructions from autocorrelated environments are nonsense, but their uncertainty will be underestimated, perhaps greatly so.]
The main argument for such an assertion relied on comparison with WA-PLS, which is apparently the preferred technique of the Telford and Birks’ team, although it is not necessarily the most efficient approach for paleoceanographic reconstructions as shown by the various statistical tests we performed using planktic foraminifer assemblages (Guiot and de Vernal, 2007) and dinocyst assemblages (Guiot and de Vernal, 2011).
[I begin to doubt that Guiot and de Vernal actually read the rejoinder. It was explicitly written that the comparison between WAPLS and MAT is not the main argument for problems with autocorrelation. The problems with autocorrelation could easily be shown if MAT were the only transfer function method.
The reason why WAPLS does not appear to be as efficient as MAT is that WAPLS is less able to use the autocorrelation structure in the data. ]
By denigrating again the robustness of MAT that is still one of the most efficient and most currently used technique for paleoclimate reconstructions, Telford and Birks continue to shed confusion in the scientific community. Self criticism is important and we acknowledge uncertainties with MAT-based reconstructions as well as other transfer function techniques, but this should not prevent progresses towards the study of paleoclimates, which is still relevant in view of the caveats of the current paleoclimate simulation models.
[The confusion seems mainly to reside in the authors of the reply. Palaeoclimate studies can only progress when the causes of uncertainties, including autocorrelation, are properly assessed. Reconstructions with over optimistic uncertainty estimates are of limited use for comparison with palaeoclimate simulations.]
Thus, until a higher performance transfer function approach is developed, for example on the basis of process models (Guiot et al., 2009 and Hughes et al., 2010), we can only encourage the paleoclimate community to continue using MAT, with all statistical precautions required, as it has been successful for the documenting recent Earth’s climate dynamics.
[I agree with the sentiment in this concluding sentence, but suspect the authors and I would disagree what “all statistical precautions required” entails.]
[In addition to what they write about, it is worth examining what was not written about.
- That their test of which method was better when autocorrelation had been removed was irrelevant to the problem of how the methods behave when autocorrelation has not been removed.
- The bogus assumption that the observations have a unimodal distribution along the gradient for WAPLS
- The recognition of spatial autocorrelation as an issue in every other scientific field – even Guiot’s other work.
Does this silence indicate agreement?]