Having shown that the archived chironomid data from Lake Żabińskie are strange in several ways and have bad reconstruction diagnostics, I want to see how well I can replicate the August air-temperature reconstuction.
First, I need to build the transfer function. I’m using WAPLS on square-root transformed species data. All taxa with more than one occurrence are included, and the lakes with declared low count sums are omitted.
keep <- colSums(spp > 0) > 1 mod1 <- crossval(WAPLS(sqrt(spp[, keep]), env), cv.method = "bootstrap", nboot = 5000, verbose = FALSE) knitr::kable(performance(mod1)$crossval[1:3, 1:4], digits = 2)
The bootstrap cross-validation performance is very similar to that reported by the corrigendum (WAPLS component 2: RMSEP = 2.3°C, r^2^ = 0.76).
The reconstruction is similar to that archived but not identical.
wapls.sqrt <- predict(mod1, sqrt(fos))$fit[, "Comp02"] reconstruction <- data.frame( chron = chron, Instrumental = instrumental$temperature, Archived = recon$temperature, Replication = wapls.sqrt ) reconstruction2 <- gather(reconstruction, key = "Series", value = "Temperature", -chron) ggplot(reconstruction2, aes(chron, Temperature, colour = Series)) + geom_line(alpha = 0.5) + scale_colour_manual(limits = c("Instrumental", "Archived", "Replication"), values = c("black", "red", "blue")) + labs(x = "Year CE", y = "Temperature °C", colour = "Series")
The mean of the archived and replication reconstructions are very similar (16.92 vs. 16.88), but the variance of my replication is about 20% higher (2.19 vs. 2.67).
One of the most noticable differences is that the archived reconstruction has a value for 1925, whereas the chironomids have data for 1927. This 1925/1927 switch also occurred during the evolution of the fossil data.
The other differences between the archived reconstruction and my replication might be because of different sites included in the calibration set (LT15 omit nine lakes on the basis of a PCA, but its is not clear which lakes these are and whether they are also omitted for the corrigendum) or different species inclusion rules. The bootstrap that LT15 use will also cause variability.
Ideally anybody who has the raw data should be able to replicate the results of any paper exactly. Given the vague description of the methods in LT15 and the corrigendum, the replication is as good as can be expected.
This is probably my last post detailing oddities in the data archived by Larocque-Tobler et al (2015). I think I have done enough to demonstrate that the data have unexpected properties that need explaining. In my next couple of posts in this series, I’ll describe my quest to get that explanation, or how I mistook a yo-yo for a die.