The third chironomid calibration-in-time reconstruction

Most reconstructions of past environmental conditions derived from chironomid assemblages with transfer functions use a calibration-in-space approach, in which the calibration is a set of modern chironomid assemblages paired with modern environmental data. I am aware of three chironomid calibration-in-time reconstruction. Here the calibration set is well dates chironomid assemblages paired with historic environmental data.

I have already shown that the calibration-in-time reconstructions from Seebergsee and Silvaplana report the apparent performance rather than the cross-validated performance they claim to report, and that the actual skill is near zero. But what about the third reconstruction from Nurmijärvi.

Nurmijärvi is a varved lake in southern Finland from which the authors counted the chironomids at 1-11 year resolution over the instrumental period which began in the 1830s and reconstruct July air temperatures.

The paper reports that the calibration-in-time transfer function has an r2jack of 0.64 (Table 1). This is a fairly good performance given the limited temperature range in the calibration set. The paper also reports the correlation between the reconstructed and instrumental temperature as 0.51 (Table 2, Figure 4), which is equivalent to an r2 of 0.26. With a calibration-in-time model, the model r2 should be identical to the r2 between the reconstruction and the environmental data. It is not clear why they are not identical in this case.

Since the authors have not replied to emails enquiring about this discrepancy, nor to requests for the raw data, I digitised the stratigraphic diagram and the temperature data. There are of course inaccuracies in the digitised data, and I have only the 38 common taxa (the stratigraphic diagram omits 21 rare taxa).

I find that the model the authors use, a tolerance-weighted weighted-averaging model with inverse deshrinking, has an r2 of 0.57. Not too far off the 0.64 reported. But this is the apparent statistic – the leave-one-out cross-validated r2 is only 0.13.


Predicted vs measured July temperatures for the Nurmijärvi calibration-in-time transfer function

A model with an r2 of 0.13 has very limited utility, and this is before we consider the impact of autocorrelation etc.

It appears to me that the authors are, like the authors of the Silvaplana and Seebergsee papers, reporting the apparent rather than the cross-validation performance. If the authors believe that imperfections in the digitised data are responsible for the lack of reproducibility, they only need to send me the data and I will update this post.


Posted in Peer reviewed literature, transfer function | Tagged , , | Leave a comment

The tangled web of Silvaplana papers

I’ve not written much about the Lake Silvaplana papers although they are as questionable as the Seebergsee and Zabinskie papers.

I have now digitised the 150-year stratigraphy, my request to the authors for the data having been ignored. Together with the archived 420-year stratigraphy I can now answer questions that I couldn’t earlier.

First question, how large were the count sums? The 150-year paper reports that

The total number of head capsules is highly variable; between 15 and 125 head capsules were counted in samples of ca. 30 mg dry weight, the average being 30 head capsules per sample.

JoPL-06 (copy)

The 150-year stratigraphy.

This is easily demonstrated to be false. The stratigraphic diagram shows the count sums in the third curve from the right. Over 20 samples (of 94) have a count sum to the left of the tick mark at 12.5 head capsules. Some of the samples appear to have counts as low as three (samples 33 & 34 are both ~66% Cricotopus and ~33% one other species).

The 420-year paper makes even stronger claims:

Before sample no. 170 [~1775 CE], the minimum head capsule numbers was 50.

Perhaps the authors counted some more chironomids after publishing the first paper, but the stratigraphic diagram shows that most of the samples above sample 170 have counts smaller than fifty: some appear to be below fifteen.


The 420-year stratigraphy counts


All the papers report that samples with fewer than thirty head capsules were merged before the transfer function was applied. This presumably explains why the reconstruction in the 150-year paper has 69 values rather than one for each of the 94 levels in the stratigraphy. I assume the merged samples form (more or less) the upper 69 samples in the archived assemblage data from the 420-year paper (the oldest assemblage in the 150-year stratigraphy is identical to the 69th in the 420-year paper, which represents 1850).

Even after merging, the samples appear to have some counts below 30. The youngest sample in the 420-year dataset has eight species all with exactly 10% and one with 20% of the assemblage. It is more likely that this assemblage has a count sum of ten than a count sum of at least thirty. The youngest sample in the 150-year stratigraphy has the same composition within the error of digitisation. The reported count sum for this sample is about ten.

Reliable reconstructions cannot be expected from such small samples.

I cannot reproduce the calibration-in-space reconstructions derived from the 150-year or the 420-year stratigraphy as the calibration set has not been archived. Given that the chronological uncertainty is estimated at 15%, it is surprising that the reported correlation between the reconstruction and the instrumental record is so good.

I can, however, attempt to reproduce the calibration-in-time reconstruction shown in a subsequent paper. This reconstruction has 82 levels. I don’t know how the 94 levels were merged to make 82, so I’m going to use the 64 samples from the 420-year stratigraphy which fall into the instrumental period which starts in 1864 (five fall between 1864 and 1850). I also don’t know exactly which years are supposed to be represented by each sample, so I am just going to use the reported year with data from meteoSwiss.

I’m not expecting the result to be exactly the same as the authors, but it should be similar. My 2-component WAPLS model on square-root transformed assemblage data has an apparent R2 between the measured and estimated temperature of 0.45. The leave-one-out cross-validated reconstruction has an R2 of 0.01. This is somewhat short of the reported R2 of 0.51 for the whole record (0.71 for the first 90 years). It would appear, as at Seebergsee, that the authors have not cross-validated their model. In both Silvaplana and Seebergsee, the cross-validated model has no skill, and the published millennium-long reconstruction cannot be relied upon. If my analysis is supported by a re-analysis of the full data used in the calibration-in-time paper, then this paper should be retracted.


Posted in Peer reviewed literature | Tagged , , , , | 1 Comment

Extracting data from a PDF image

Some scientists archive their data. Some scientists email their data on request. Some editors cajole authors into releasing data to interested parties. And sometimes none of these approaches yields data.

What then? One option is to request data via the scientist’s head-of-department, another is to scrape the data out of published diagrams. Much has been written about how to extract data from raster formats, including stratigraphic diagrams, I only found a little information on how to extract data from vector formats. This is potentially a much more powerful option.

This is how I did it. First I downloaded Luoto and Ojala (2017) and checked that the stratigraphic plot was a vector format by greatly increasing the magnification. This would make the image appear pixelated if it was a raster.


The stratigraphic diagram from Luoto and Ojala (2017)

The pdf needs to be converted into a format that can be read into a text editor. This can be done with qpdf in the terminal. First I extract the page with the stratigraphic diagram to make further processing easier.

qpdf 0959683616675940.pdf  --pages 0959683616675940.pdf 4 -- outfile.pdf

qpdf outfile.pdf --stream-data=uncompress  outfile2.pdf

Now outfile2.pdf can be read into R

page4 = readLines("luoto/data/outfile2.pdf")

#Find start and end of image
start = grep("/PlacedGraphic", page4)[2]
end = grep("[(Figur)20(e 2.)]TJ", page4, fixed = TRUE)
page4 = page4[start:end]

fig2 = page4[grepl("re$", page4)] %>%
read.table(text = .) %>%
set_names(c("x", "y", "width", "height", "re"))

The bars are, within rounding error, the same height. Various other rectangles in the figure are different heights, so I need to filter the data I want.

fig2 = fig2 %>%
  filter(between(height, -2.4, -2.3))

Now I can plot the data.

fig2 %>%
  ggplot(aes(x = x + width/2, y = y, width = width, height = height, fill = factor(x, levels = sample(unique(x))))) +
  geom_tile(show.legend = FALSE)



The scraped data from the stratigraphic plot

The next step would be to assign taxon names to each unique value of x and scale the widths so they are in percent. When that is done, and the weather data digitised, I can test how well I can reproduce the calibration-in-time transfer function model. The calibration-in-space model will need to wait for data from several other papers to be scraped.

Posted in Peer reviewed literature, R | Tagged , , , | 2 Comments

Bergen: a year with some sunshine

May was glorious.  December less so.


The data are from the Geofysisk Institutt in Bergen. Here is the code I used

florida = read.csv("../climate_data//Florida_2017-06-21_2018-06-22_1529649176.csv", stringsAsFactors = FALSE) %>% as.tibble()
florida = florida %>%
mutate(date_time = ymd_hm(paste(Dato, Tid)),
date = ymd(Dato),
Tid = ymd_hm(paste("2000-01-01", Tid)),
Globalstraling = if_else(Globalstraling == 9999.99, NA_real_, Globalstraling)) %>%
complete(date, Tid) %>%#
arrange(date, Tid) %>%
mutate(Globalstraling = coalesce(Globalstraling, lag(Globalstraling, 24 * 6)))#fill gaps with previous day

ggplot(florida, aes(x = Tid, y = date, fill = Globalstraling)) +
geom_raster() +
scale_fill_gradient(low = "black", high = "yellow") +
scale_x_datetime(date_labels = "%H", expand = c(0, 0)) +
scale_y_date(date_labels = "%b", expand = c(0, 0)) +
labs(x = "Time", y = "Month", fill = expression(atop(Sunshine, Wm^-1)))

Posted in climate, R, Uncategorized | Tagged | Leave a comment

More trouble counting to fifty

Earlier this week at the palaeolimnology symposium, Gavin told me that it had not dawned on him that the count sum could be estimated from percent data using our knowledge of rank abundance curves. I only recently realised this; previously I did not imagine that this would ever be a useful thing to do.

And yet this morning, I found strong evidence that another chironomid analyst has problems keeping to the count sum promised in the paper (the tally is now three chironomid analysts, a diatomist, and a palynologist).

On Sunday, after I gave my presentation to the chironomid workshop, there was some discussion about what should be done with small counts (they do have some information), but there was unanimity that the paper must report it if some counts contain fewer than the target sum.

At the moment, I am not going to name the analyst whose data I examined this morning. This is only a temporary reprieve, when I write up the presentation I gave on Sunday, this case will be used as an example.

The paper reports that the minimum number of head capsules per sample was fifty. Eight of the thirty-four samples appear to have a count sum of less than fifty. In one sample, the rarest taxon has a relative abundance of 6.25%. The relative abundances of the other six taxa are all integer multiples of 6.25%. This is strong evidence that the count sum is sixteen.

There are three possibilities.

First that the counts are a great fluke. This is unlikely. The sample discussed above could be one in which every taxa was present with a multiple of four head capsules (i.e. the true count sum is 64). Even under extremely optimistic assumptions, the probability of getting such a sample is 1/(47) = 6 * 10-5. And then there are another seven samples, one of which would require all nineteen taxa to have multiples of three head capsules (9 * 10-10). Combining the probabilities of all these unlikely counts will give a very small number.

Neither of the remaining two possibilities is very pleasant.

It could be that the author(s) (whom I believe count(s) their own chironomids) are so negligent that they forgot when they wrote the paper that the count sums of almost a quarter of their samples were smaller than promised. If this is the case, the authors’ competence has to be doubted and we need to ask if anything the author(s) report should be trusted.

The other possibility is that the author(s) knowingly misreported the count sums. This could easily be construed as fabrication (“easily” that is for anyone except a university integrity officer), and data fabrication is a form of misconduct. Obviously this would not be the most serious case misconduct, perhaps akin to plagiarising a paragraph rather than a full paper.

The question remains as to what to do with this case of possible data falsification (and any other cases I find when I have time to import some more data). I am seeing three options: to describe the case in my forthcoming manuscript with a citation; to alert the journal that published the paper; and to advise the authors’ university’s integrity officer.

I ask my readers, both of you, to tell me in the comments or otherwise, how you think I should proceed in this case and what you think the outcome should be.


Posted in Misconduct, Peer reviewed literature | Tagged | 3 Comments

My presentation to IPA-IAL 2018

I’ve just given a presentation at the joint IPA-IAL conference in Stockholm

Sub-decadal resolution palaeoenvironmental reconstructions from microfossil assemblages

Download it!

The deadline for applying for a reward for finding typos has expired.

Posted in climate | Tagged , , , | Leave a comment

How many is fifty? Sanity checks for assemblage data.

This week I’m at the Palaeolimnology Symposium in Stockholm this week.

I have a couple of presentations. I gave the first this morning to the chironomid “DeadHead” meeting. I showed some sanity checks for assemblage data, some of which are related to Brown and Heathers’ (2018) GRIM test.  It can download it from figshare.

Posted in Data manipulation, EDA | Tagged , | 1 Comment