In January 2015, shortly after Larocque-Tobler et al published their remarkably good chironomid-inferred August air temperature reconstruction from Lake Żabińskie, Poland, some data were posted online as a condition of publication. Since I found it difficult to believe that chironomids could be almost as good as thermometers, I took a look at these data expecting to find that the authors had made a mistake. Specifically, I suspected that a calibration-in-time model (a transfer function from well-dated chironomid assemblages and corresponding climate station data for the same years) had been accidentally reported.
It only took a few minutes to more-or-less replicate the reconstruction from the fossil data and the modern training set: I was satisfied that a calibration-in-space model had been used as reported. So then I looked at the data: they were as credible as a three-pound note drawn with a crayon.
I don’t think that it is contentious to say that the data posted in January are not the real chironomid data. They are only distantly related to the data now archived and to the data used in the paper. Hence the credibility of the January data have no direct bearing on the credibility of the current version of the data. The data archived in January are no longer available from Dr Larocque-Tobler’s website, but can be downloaded from here. Let’s take a look at them.
library(readxl) zab <- read_excel('Zabinskie chiro 1886AD.xlsx', sheet = 1) chron <- zab$Chron zab$Chron <- NULL pos <- function(x) x[, x > 0] #extract +ve values t(pos(zab[24,])) # row 24 - 1987 24 Dicrotendipes nervosus 11.1111 Microtendipes pedellus 11.1111 Cladotanytarsus mancus1 11.1111 Paratanytarsus 11.1111 Tanytarsus lugens 11.1111 Tanytarsus pallidicornis 11.1111 Corynoneura 11.1111 Parochlus 11.1111 Procladius 11.1111
Nine taxa all with an abundance of one ninth. Curious.
Remembering that LT15 claimed the counts were of at least fifty chironomids, how often can we expect to see a count like this? Assuming the improbable but favourable case that the assemblage contains nine equally abundant taxa and that 54 chironomids were counted (the smallest multiple of nine greater than fifty), we can calculate the probability of getting such a count with all nine taxa having six individuals using the multinomial distribution.
1/dmultinom(x = rep(6, 9), size = 54, prob = rep(1/9, 9))  761675.4
One in seven hundred and sixty thousand. Counts with identical abundances for all taxa are going to be rare. I wouldn’t expect to see one again. But wait, exactly identical counts occur a further three times (and another sample has a different set of nine equally abundant taxa). How likely are we to see five counts of nine equally abundant taxa? Something like 1 in 1022 under the most favourable (and highly improbable) case.
We are still somewhat short of 2276709 to one, but we haven’t looked at the other samples yet. One sample has seven equally abundant taxa, three more (including two duplicates) have six taxa with relative abundances of one seventh or two sevenths. A final example, one sample has three taxa at 20% and one at 40%.
A related curiosity in these data is the dearth of rare taxa. With a count of fifty, a taxon represented by a single head capsule will represent 2% of the assemblage.
min0 <- function(x) min(x[x > 0]) mean(apply(zab, 1, min0) > 2) 0.9285714
Ninety three percent of samples (78/84) apparently have no taxa represented by a single head capsule. This is exceedingly unlikely.
Almost every sample in the January dataset is unlikely to arise from a count of fifty chironomids. The most obvious explanation (which was admitted in the corrigendum), is that some of the counts were based on less than fifty head capsules. Perhaps much less – a count with three taxa at 20% and one at 40% is only really likely to happen when the count sum is five.
I downloaded the January data expecting to find a mistake. Instead, I found that the data are almost certainly (one must obey Cromwell’s rule) not gathered according to the methods reported in LT15 and the stratigraphy differs from that published, yet the reconstruction broadly resembles the published reconstruction (though with substantial differences for some levels).
When I enquired whether the January data were the correct data, they were withdrawn without explanation and replaced by a sequence of other versions of the data before finally a version of the data was archived at NOAA. It is that version of the data that I’ll begin to examine in a subsequent post in this series.