I have largely neglected the chrysophyte-inferred reconstructions of winter severity and summer calcium concentrations/zonal wind speed from Lake Żabińskie even though they fall within the scope of my review of sub-decadal resolution reconstructions. This is not because I think this pair of reconstructions are jointly credible, but because no data have been archived despite several requests to the authors (and no useful calibration set data can be scraped from the publications).
Having shown that the correlation between instrumental temperature records and the pollen reconstruction from Mauntschas is no better than that expected because of the autocorrelation and multiple testing, I wondered if the same could be true for the calcium/wind reconstruction.
Hernández-Almeida et al (2015) report that a weighted averaging transfer function using their 50-lake calibration set has a bootstrap R2 of 0.68 and a RMSEP of 0.143 (log10 units). This is reasonably good, but the distribution of sites along the gradient is very uneven which will bias the RMSEP.
I’m not totally convinced that the relationship between the estimated and observed calcium concentration in figure 4 of the paper is cross-validated, which then makes me wonder if the performance statistics are cross-validated or not. With the data this would be trivial to confirm. Without — doubts multiply.
The authors smooth the reconstruction from Lake Żabińskie with a three-year triangular filter and compare it to different combinations of months and allow for a lagged response using a procedure developed earlier which generated 144 climate time series.
I wanted to test if this multiple testing on autocorrelated data could generate the reported correlation of 0.5 between the reconstruction and May-October wind.
Even though the assemblage data are not yet available, I can test several ideas with just the wind data. The wind data come from the Twentieth Century Reanalysis Project. Other than reporting that the 1000 hPa level is used, no information is given about how the data were processed. I’ve assumed that the closest grid point to Lake Żabińskie has been used (22°E, 54°N) and downloaded the data and calculated the monthly means.
I had assumed that getting replicating the wind speed figure would be the easy part. Alas no. My wind speed data look nothing like the published curve. Then I realise that although the text discusses wind speed, it appears that the authors are actually using wind velocity, specifically the zonal (west-east) velocity. This would account for the otherwise impossible negative “speeds”. However, my mean velocity curve also looks nothing like the published curve, but at least it has a similar mean and variance.
Using mean velocity rather than mean speed then the paper presents some huge problems. A May-October mean velocity of 0 m/s could indicate flat calm conditions all summer or that easterly gales blew for half the summer and westerly gales blew for the remainder. The impact of these two scenarios on lake mixing are quite different.
Even if, as the paper suggests, westerly winds have much more impact on the lake than winds from other directions because of the catchment’s morphology, it is unphysical to expect mean wind velocity to be a useful predictor. As it stands, the result suggests that westerly winds mix the lake and easterly winds unmix it. Mean wind-speed, weighted by the lake’s exposure in the direction of the wind, would be a much more physically relevant predictor.
The paper appears to be a fishing trip, searching for the best of many predictors regardless of their physical plausibility. This is not a recipe for reproducible research. Had mean velocity from some combination of months not given a good performance, would the authors realised that mean velocity was a daft idea and repeated the analysis with wind speed? And if that too failed, would they have used mean squared wind speed, to better correspond with the wind’s mixing potential?