Replicating the Lake Żabińskie reconstruction

Having shown that the archived chironomid data from Lake Żabińskie are strange in several ways and have bad reconstruction diagnostics, I want to see how well I can replicate the August air-temperature reconstuction.

First, I need to build the transfer function. I’m using WAPLS on square-root transformed species data. All taxa with more than one occurrence are included, and the lakes with declared low count sums are omitted.

keep <- colSums(spp > 0) > 1
mod1 <- crossval(WAPLS(sqrt(spp[, keep]), env), cv.method = "bootstrap", nboot = 5000, verbose = FALSE)
knitr::kable(performance(mod1)$crossval[1:3, 1:4], digits = 2)
RMSE R2 Avg.Bias Max.Bias
Comp01 2.68 0.65 -0.06 9.97
Comp02 2.33 0.76 -0.04 6.75
Comp03 2.38 0.76 -0.08 6.27

The bootstrap cross-validation performance is very similar to that reported by the corrigendum (WAPLS component 2: RMSEP = 2.3°C, r^2^ = 0.76).

The reconstruction is similar to that archived but not identical.

wapls.sqrt <- predict(mod1, sqrt(fos))$fit[, "Comp02"]
reconstruction <- data.frame(
chron = chron,
Instrumental = instrumental$temperature,
Archived = recon$temperature,
Replication = wapls.sqrt
reconstruction2 <- gather(reconstruction, key = "Series", value = "Temperature", -chron)
ggplot(reconstruction2, aes(chron, Temperature, colour = Series)) +
geom_line(alpha = 0.5) +
scale_colour_manual(limits = c("Instrumental", "Archived", "Replication"), values = c("black", "red", "blue")) +
labs(x = "Year CE", y = "Temperature °C", colour = "Series")


The mean of the archived and replication reconstructions are very similar (16.92 vs. 16.88), but the variance of my replication is about 20% higher (2.19 vs. 2.67).
One of the most noticable differences is that the archived reconstruction has a value for 1925, whereas the chironomids have data for 1927. This 1925/1927 switch also occurred during the evolution of the fossil data.

The other differences between the archived reconstruction and my replication might be because of different sites included in the calibration set (LT15 omit nine lakes on the basis of a PCA, but its is not clear which lakes these are and whether they are also omitted for the corrigendum) or different species inclusion rules. The bootstrap that LT15 use will also cause variability.

Ideally anybody who has the raw data should be able to replicate the results of any paper exactly. Given the vague description of the methods in LT15 and the corrigendum, the replication is as good as can be expected.

This is probably my last post detailing oddities in the data archived by Larocque-Tobler et al (2015). I think I have done enough to demonstrate that the data have unexpected properties that need explaining. In my next couple of posts in this series, I’ll describe my quest to get that explanation, or how I mistook a yo-yo for a die.

About richard telford

Ecologist with interests in quantitative methods and palaeoenvironments
This entry was posted in Peer reviewed literature, transfer function, Uncategorized and tagged , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s