Falling for ggplot2

I spent a long time resisting the lure of ggplot2. I was proficient with the plotting functions in base graphics; why did I need to learn an entirely new graphics system? Yes, getting up colour ramps could be a real pain, but I was in complete control.

Recently I’ve had to learn how to use ggplot2. Students were asking me to help them fix their code (often copied off Stack Overflow), so I needed at least some understanding. I read the book – eller hente ebok frå Universitetsbiblioteket. Its very readable. Now I see why ggplot2 is so popular and have started to use it whenever possible.

Today I wanted to make a screeplot for a principal component analysis in ggplot. So I wrote my own function based on screeplot in the vegan package.

ggscreeplot <-
function (x, bstick = FALSE, npcs = min(10, if (is.null(x$CCA) || x$CCA$rank == 0) x$CA$rank else x$CCA$rank),
          xlab = "Component", ylab = "Inertia", title = "",  ...) {
  if (is.null(x$CCA) || x$CCA$rank == 0)
    eig.vals <- x$CA$eig
  else eig.vals <- x$CCA$eig
  ncomps <- length(eig.vals)   if (npcs > ncomps)
    npcs <- ncomps
  comps <- seq(len = npcs)

  df <- data.frame(Inertia = eig.vals[comps], names = reorder(factor(names(eig.vals[comps])), comps))
  ymax <- df$Inertia[1]   if (bstick && !is.null(x$CCA) && x$CCA$rank > 0) {
    warning("'bstick' unavailable for constrained ordination")
    bstick <- FALSE
  } else {
    df <- cbind(df, bstick = bstick(x)[comps])
    ymax <-max(ymax, df$bstick[1])
  g <- ggplot(df, aes(x = names)) +
                geom_bar(aes(y = Inertia), stat = "identity", fill = "grey50") +
                labs(x = xlab, y = ylab, title = title) +
                scale_y_continuous(limits = c(0, ymax * 1.04), expand = c(0, 0))

  if (bstick) {
                g <- g + geom_point(aes(y = bstick), colour = "red")
                g <- g + geom_line(aes(y = bstick, group =1), colour = "red")

Here it is in action

data(RLGH)#Round Loch of Glenhead diatom stratigraphy
ggscreeplot(rda(sqrt(RLGH$spec)), bstick = TRUE)

Screeplot of a PCA of the Round Loch of Glenhead diatom stratigraphy. Note that axis one is much longer than the other axes – it has almost half the total variance.

I’ll be using this code when I show ordinations of the Lake Żabińskie chironomid reconstruction. Spoiler – they don’t look much like this.

Posted in EDA, R | Leave a comment

Reconstruction diagnostics for the Lake Żabińskie chironomid reconstruction

The evaluation of reconstruction diagonstics is an essential part of the process of generating palaeoenvironmental reconstructions from microfossil assemblages using transfer functions. If the reconstruction diagnostics are bad, we should be especially cautious about interpreting the reconstruction. The problems is that “good” and “bad” are not well defined and we rely on various rules-of-thumb to guide us.

Since the chironomid-based reconstruction of August air temperature presented by Larocque-Tobler et al (2015; hereafter LT15) from Lake Żabińskie is so remarkably good, it should be an interesting case to test how well reconstruction diagnostics work.

LT15 use analogue quality as their main diagnostic method

For the combined transfer function, to determine whether the modern calibration models had adequate analogues for the fossil assemblages, the modern analogue technique (MAT) was performed using C2 (Juggins, 2005), with squared chord distance as the dissimilarity coefficient (DC) (Overpeck et al., 1985). Confidence intervals were based on minimum DC distance within the calibration sets (Laing et al., 1999). Fossil assemblages above the 95% confidence interval were considered to have no analogues in the calibration set; samples between 75% and 95% were considered to have fair analogues (Francis et al., 2006).

This text from LT15 is not entirely clear – what confidence intervals? Time to read Francis et al (2006).

In order to determine whether the modern calibration model had adequate analogs for the fossil assemblages, modern analog testing (MAT) was performed using the computer program C2, with squared chord distance as the dissimilarity coefficient (Overpeck et al., 1985). Confidence intervals were based on minimum DC distance within the calibration set following Laing et al. (1999). Fossil assemblages above the 95% confidence interval were considered to have no analogs in the calibration set, samples between 75% and 95% were considered to have fair analogs.

I get a slight sense that I’ve read this before somewhere. I’m sure it is just a coincidence. But having read it twice, I understand what is being done. The squared-chord distances between each fossil sample and its closest analogue in the calibration set is being compared with the 75th and 95th percentiles of the distribution of distances between each calibration-set sample and its nearest neighbour.

LT15 report that

No sample had chironomid assemblages outside the 95% confidence interval suggesting that the transfer function can be applied to the downcore samples.

but don’t show this with a figure (no complaint here, I would do the same if the analogues were good). I want to see a figure showing the analogue distances.

First we need to load the data, which can be downloaded from NOAA.


fname <- "zabinskie2015cit.xls"
spp <- read_excel(fname, sheet = "Training species")
env <- read_excel(fname, sheet = "Training temperature")
fos <- read_excel(fname, sheet = "Chironomids Zabinsk percentages")
recon <- read_excel(fname, sheet = "Reconstruction ")
names(recon) <- c("date", "temperature")

rownames(spp) <- spp[, 1]
spp[, 1] <- NULL
rownames(env) <- env[, 1]
env <- env[, 2, drop = FALSE]

lowCount <- c("SAL", "LEK", "TRZ", "WAS", "SZOS", "GOR", "KOS", "ZAB")
spp <- spp[!rownames(spp) %in% lowCount, ]
env <- env[!rownames(env) %in% lowCount, , drop  = FALSE]
identical(rownames(spp), rownames(env))
env <- env$Temp

chron <- fos[, 1]
fos <- fos[, -c(1, ncol(fos))]

####check names####
setdiff(names(fos), names(spp))
setdiff(names(spp), names(fos))

Distances to the nearest analogue are easily calculated with the rioja package which should give the same result as C2.


matmod <- MAT(spp, env)
matpred <- predict(matmod, fos)
goodpoorbad <- quantile(matmod$dist.n[, 1], prob=c(0.75, 0.95))
qualitybands <- data.frame(xmin = rep(-Inf, 3),
xmax = rep(Inf, 3),
ymax = c(goodpoorbad, Inf),
ymin = c(-Inf, goodpoorbad),
fill = factor(c("Good", "Fair", "None"), levels = c("None", "Fair", "Good")))

fillscale <-  scale_fill_manual(values = c("salmon", "lightyellow", "skyblue"), name = "Analogue Quality")

g <- ggplot(data.frame(chron, analogue =  matpred$dist.n[,1])) +
geom_point(aes(x = chron, y = analogue)) +
labs(x = "Date CE", y = "Squared chord distance to nearest analogue") +
geom_rect(aes(xmin = xmin, xmax = xmax, ymin = ymin, ymax = ymax, fill = fill), qualitybands, alpha = .5) +

Analogue quality against date

Thirty eight of the 89 samples have no modern analogues under the definition used by LT15. Only 11 samples have good analogues. This is difficult to reconcile with the claim by LT15 that

No sample had chironomid assemblages outside the 95% confidence interval

The reconstruction of August air-temperature in LT15 is remarkably good, almost as good as what would be expected if chironomids were perfect thermometers, yet the analogue quality is fairly awful (squared residual length is also fairly awful). Does this mean that these diagnostics are utterly useless guides to the utility of reconstructions? Or is this another remarkable feature of the Lake Żabińskie chironomid reconstruction?

Perhaps some ordinations would be useful to investigate what is going on. I’ll show some in a future post.

Posted in Peer reviewed literature, transfer function | Tagged , , , | Leave a comment

Curious chironomid counts

In January 2015, shortly after Larocque-Tobler et al published their remarkably good chironomid-inferred August air temperature reconstruction from Lake Żabińskie, Poland, some data were posted online as a condition of publication. Since I found it difficult to believe that chironomids could be almost as good as thermometers, I took a look at these data expecting to find that the authors had made a mistake. Specifically, I suspected that a calibration-in-time model (a transfer function from well-dated chironomid assemblages and corresponding climate station data for the same years) had been accidentally reported.

It only took a few minutes to more-or-less replicate the reconstruction from the fossil data and the modern training set: I was satisfied that a calibration-in-space model had been used as reported. So then I looked at the data: they were as credible as a three-pound note drawn with a crayon.

I don’t think that it is contentious to say that the data posted in January are not the real chironomid data. They are only distantly related to the data now archived and to the data used in the paper. Hence the credibility of the January data have no direct bearing on the credibility of the current version of the data. The data archived in January are no longer available from Dr Larocque-Tobler’s website, but can be downloaded from here. Let’s take a look at them.

zab <- read_excel('Zabinskie chiro 1886AD.xlsx', sheet = 1)
chron <- zab$Chron
zab$Chron <- NULL

pos <- function(x) x[, x > 0] #extract +ve values
t(pos(zab[24,])) # row 24 - 1987
Dicrotendipes nervosus   11.1111
Microtendipes pedellus   11.1111
Cladotanytarsus mancus1  11.1111
Paratanytarsus           11.1111
Tanytarsus lugens        11.1111
Tanytarsus pallidicornis 11.1111
Corynoneura              11.1111
Parochlus                11.1111
Procladius               11.1111

Nine taxa all with an abundance of one ninth. Curious.

Remembering that LT15 claimed the counts were of at least fifty chironomids, how often can we expect to see a count like this? Assuming the improbable but favourable case that the assemblage contains nine equally abundant taxa and that 54 chironomids were counted (the smallest multiple of nine greater than fifty), we can calculate the probability of getting such a count with all nine taxa having six individuals using the multinomial distribution.

1/dmultinom(x = rep(6, 9), size = 54, prob = rep(1/9, 9))
[1] 761675.4

One in seven hundred and sixty thousand. Counts with identical abundances for all taxa are going to be rare. I wouldn’t expect to see one again. But wait, exactly identical counts occur a further three times (and another sample has a different set of nine equally abundant taxa). How likely are we to see five counts of nine equally abundant taxa? Something like 1 in 1022 under the most favourable (and highly improbable) case.

We are still somewhat short of 2276709 to one, but we haven’t looked at the other samples yet. One sample has seven equally abundant taxa, three more (including two duplicates) have six taxa with relative abundances of one seventh or two sevenths. A final example, one sample has three taxa at 20% and one at 40%.

A related curiosity in these data is the dearth of rare taxa. With a count of fifty, a taxon represented by a single head capsule will represent 2% of the assemblage.

min0 <- function(x) min(x[x > 0])
mean(apply(zab, 1, min0) > 2)

Ninety three percent of samples (78/84) apparently have no taxa represented by a single head capsule. This is exceedingly unlikely.

Almost every sample in the January dataset is unlikely to arise from a count of fifty chironomids. The most obvious explanation (which was admitted in the corrigendum), is that some of the counts were based on less than fifty head capsules. Perhaps much less – a count with three taxa at 20% and one at 40% is only really likely to happen when the count sum is five.

I downloaded the January data expecting to find a mistake. Instead, I found that the data are almost certainly (one must obey Cromwell’s rule) not gathered according to the methods reported in LT15 and the stratigraphy differs from that published, yet the reconstruction broadly resembles the published reconstruction (though with substantial differences for some levels).

When I enquired whether the January data were the correct data, they were withdrawn without explanation and replaced by a sequence of other versions of the data before finally a version of the data was archived at NOAA. It is that version of the data that I’ll begin to examine in a subsequent post in this series.

Posted in Peer reviewed literature, transfer function | Tagged , | Leave a comment

The limits of high-resolution quantitative palaeoecology and Larocque-Tobler et al (2015)

The first quantitative palaeoclimate reconstructions from microfossil assemblages were low resolution, with a sample spacing of hundreds or thousands of years and focused on seasonal climate. Reconstructions focusing on a single month are now published at annual resolution. Can resolution be further improved? Few would argue that seasonal reconstructions are impossible (notwithstanding a sensible choice of season); only the foolish would believe a reconstruction focused on 4th July temperatures (even though it might have impressive transfer function performance statistics).

I’m going to use Larocque-Tobler et al (2015) (hereafter LT15) as an example of high-resolution palaeoecology (I discussed the corrigendum on this paper recently).

LT15 reconstruct August air temperature from chironomid (non-biting midges) assemblages from Lake Żabińskie in eastern Poland using a transfer function calibrated on chironomid assemblages in lakes in Canada and Poland. As the lake sediments are varved, LT15 are able to make a reconstruction on a near-annual basis for the last century.

The correlation between the reconstruction and instrumental data from local weather stations is remarkably good at 0.74. The mean absolute difference between the instrumental and the inferred August air temperatures is only 0.75 °C, much smaller than the root mean squared error of prediction of the transfer function (2.3 °C; the original paper erroneously reported the RMSEP as 1.3 °C, which the corrigendum persists in using). This correlation is not the result of a shared trend in both time series – the trend in the instrumental data is slight (r2 = 0.05) and the residuals from this trend have low autocorrelation (ar1 = 0.14).

To put this correlation into context, the 99th percentile of positive correlations between tree-ring widths and summer temperature is 0.63 (St. George 2014). Even though tree-rings are a much simpler proxy than chironomid assemblages, few ring-width series have a better correlation with climate than the Lake Żabińskie chironomid reconstruction.

What are the physical, ecological and taphonomic limits to reconstruction resolution?


Transfer functions are (usually) an example of space-time substitution: spatial patterns in biotic assemblages and climate are used to estimate climatic changes in time from fossil assemblages. The problem is that the space-time substitution is imperfect: the strength of the correlation between different climatic variables in space and in time differ. For example, the correlation in space between June and August mean temperature across northern Europe is 0.99, whereas the year-on-year correlation in time between June and August temperatures at Vilnius (close to Lake Żabińskie) is only 0.16.

This is not in the least surprising. Places where the climatological mean temperature is warm in June inevitably have warm August mean temperatures. Conversely, if June is warm in Vilnius this year there is no reason to expect August to be warm as well. In short, this is the difference between climate (long-term mean) and weather.

This difference between climate and weather has serious implications for transfer-function development and application. Transfer functions trained on August, June or the whole summer will have very similar performances regardless of which month or months are ecologically important for the biota as they depend on the climatological means. In contrast, if August temperature is reconstructed, but June temperature is the ecologically important month, the correlation between the reconstruction and instrumental records will be woeful. This is a particular problem for high-resolution reconstructions – at decadal and longer scales the effect of weather will be averaged out. An identical problem occurs with the correlation between environmental variables such as dissolved organic carbon and climate; it might be strong in space but cannot be expected to correlate as well on a year-on-year basis.

I don’t know which months chironomids are most sensitive to, but, if they are temperature sensitive it is reasonable to assume that they respond to the whole summer rather than only August. The correlation at Vilnius between August temperature and the whole summer (JJA) temperature is 0.76. If this whole summer assumption is correct, the correlation reported by LT15 between the reconstruction and the instrumental data is on the edge of what could physically be expected if chironomids were perfect thermometers.


The ideal proxy for highly focused reconstructions would be directly sensitive to the environmental variable we want to reconstruct; only be sensitive to this variable during the time of year we want to make the reconstruction; not be sensitive to variability in other environmental variables; have a short generation time so they can quickly respond to the environmental conditions; and be sufficiently abundant that large counts can be made.

Some algal proxies might meet most of these criteria. Chironomids don’t. The larval phase of chironomids is indirectly sensitive to air temperature via water temperature and oxygen availability; they are probably sensitive (and producing head-capsules) to temperature throughout the summer; they are sensitive to anoxia; the larval phase can last over a year; and often only small counts are possible (one of the issues in the corrigendum).


Taphonomic processes will tend to degrade the signal in biotic assemblages. In anoxic lakes, like Żabińskie, bioturbation will be minimised, but sediment reworking before its final deposition in the centre of the lake could blur the record. Reworking might be a minor problem for planktonic proxies where the majority of the organisms are deposited without reworking, but large in littoral proxies that have to be reworked to reach the core site.

Lake Żabińskie is 44.4 m deep yet the chironomid assemblages are dominated by littoral taxa, probably because few chironomids can survive in the anoxic hypolimnion in the deep central part of the lake where the core was collected. The littoral chironomids must have been transported into the deep water. Water currents when the lake’s thermal stratification breaks down in autumn can transport the chironomids. The taphonomic problem is that to retain the annual signal, the currents need to just transport the chironomids that lived that year. If the currents transport chironomids that lived in different years, the assemblages will include chironomids from different years, smoothing the stratigraphy and weakening the correlation with instrumental records. This is only a problem for high resolution records. It is possible to imagine lakes where autumn currents would only transport the chironomids that lived in the preceding summer, for example one where all the sediment deposited in shallow water was scoured every year, but that does not seem possible for anything other than a lake in a rock basin.


High-resolution reconstructions depend on the chronology. The varve-based chronology for Żabińskie is excellent with an uncertainty of a couple of percent. However, even an error of one year could ruin the correlation between the reconstruction and the instrumental record. An error near the top of the core would be worst, affecting more of the record. An error in the lower resolution section at the bottom of the record would have much less impact.


Excellent sites are needed for very high-resolution palaeoecological studies. Lake Żabińskie with its excellent chronology seems to be an ideal site.

Identifying biotic proxies that are sensitive to a single month’s weather will prove difficult. Planktonic algal proxies are probably best as they might reduce taphonomic problems and have short generations times.

Chironomids would not appear to be ideal for very high-resolution analyses of monthly climate variables. Despite this, the reconstruction of August air temperatures from Żabińskie is remarkably good, at the limit of what is physically possible and with an error substantially lower than expected given the uncertainty of the transfer function.

Fortunately, the authors have archived some data. In a subsequent post, I will take a look at these data to test if the results can be reproduced and if the remarkably good performance can be explained.

Posted in Peer reviewed literature, transfer function | Tagged , | 1 Comment

Detecting anomalous patterns in data

In the period 2007-2008, almost 70% of cases with the US Office of Research Integrity involved allegations of image manipulation. Are data fabricators and falsifiers mainly showing off their photoshop skills rather than fiddling other types of data? I doubt it. I suspect the prevalence of image issues reflects the availability of tools to detect flaw in images which some journals use to check every accepted manuscript. About 1% of manuscripts are rejected in this final check.

Would it be possible to create equivalent tools and procedures for detecting anomalies in none-image data? It is going to be difficult given the immense diversity of data types and data properties. Worse, whereas suspect images can often be compared against the originals making manipulation obvious (the intent of manipulation may be harder to gauge), which may not be possible for data (how do you tell an Excel file is the original?).

One well known tool for identifying falsified data is Benford’s Law which concerns the frequency of the digits 0-9 in the first and subsequent positions in a number. Sometimes it is useful for detecting malpractice, but it would be a disaster if all data sets were tested against Benford’s law. Many data sets would fail the test because Benford’s law is not appropriate null expectation for them. No-one expects, for example, the first digit of the weight of adult humans to follow Benford’s law. The last digit of tree diameters should follow Benford’s law, but the measurements are read from a tape, it is probable that discrepancies will creep in due to difficulties in precisely estimating this least important digit (one dataset I checked had an understandable excess of zeros and fives). Determining which datasets should be expected to conform to Benford’s law would not be trivial. Even for data sets which are expected to follow Benford’s law, there would be a large number of false positives unless the p-value used to identify anomalies was set to be very low, in which case the analysis would lack power.

The same problems would occur with any mass testing. A more focused approach is needed.

A recent paper by Joel Pitt and Helene Hill with the understated title Statistical analysis of numerical preclinical radiobiological data, explores some methods they are useful for testing the credibility of data with three replicates. They find that one of the scientists they investigate reported data where one of the replicates to match (within rounding) the mean of all three replicates far more often either than expected under a null model or found by other scientists working with equivalent data (the scientist in question seems to have picked the mean they wanted and assigned this to the first value and then picked high and low values roughly equidistant from the first). The final digit of this scientist’s data also fails to conform to a uniform distribution as expected and found for the other scientists’ data. The comparison of patterns in the suspect data with both other data and null models is a powerful approach.

The tests Pitt and Hill develop are in response to anomalies they detect in the data.

The first step in using statistical techniques to identify fabricated data is to look for anomalous patterns of data values in a given data set (or among statistical summaries presented for separate data sets), patterns that are inconsistent with those that might ordinarily appear in genuine empirical data.

This is a critical problem. It amounts to a post-hoc test of the patterns with the consequent increased risk of a false positives. Analyse any data set in enough ways and some unexpected patterns will present themselves.

There are solutions to this problem. The first to account for the post-hoc nature of the analysis by only getting excited by anomalies with very low p-values: p = 0.05 is just not interesting, p = 10-6 is interesting, p = 10-50 is becoming very interesting. Better still is to split the available data into a test set explored for anomalies and a verification set where the most unexpected pattern found by the exploratory analysis is tested for. The verification set could be data from a second paper.

No statistical analysis will ever prove data falsification, still less the motivation for any malpractice. But it can give the evidence needed to initiate a thorough investigation by those with the authority to demand to see samples, lab notes, computer files and if necessary attempt to replicate experiments.

Posted in Peer reviewed literature | Tagged | Leave a comment

Skiing before breakfast

Because I can.

This week I’m attending a workshop on measuring resilience, resistance, sensitivity, recovery rates in the palaeoecological record at the Finse Research Station.

Posted in Uncategorized | Tagged | Leave a comment

50 chironomids or fewer – corrigendum for Larocque-Tobler et al 2015

Last year, Isabelle Larocque-Tobler and coauthors published a chironomid-inferred August air-temperature reconstruction for the 20th century from a Polish lake in Quaternary Science Reviews. The correlation between instrumental temperature records and the near-annual resolution reconstruction from Lake Żabińskie is remarkably good at 0.74. The mean absolute error is only 0.75°C, considerably lower than the cross-validation uncertainty in the Polish-Canadian calibration set (RMSEP = 2.3 °C).

Yesterday a corrigendum was published, reporting three problems with Larocque-Tobler et al (2015; hereafter LT15).

The chironomid stratigraphy was not consistent with the online data table

More precisely, LT15 used an incorrect version of the chironomid stratigraphy from Lake Żabińskie. The corrigendum reports several taxonomic and other changes between the published stratigraphy and the data now archived at the Palaeoclimate Data Centre (I think this is the first chironomid calibration set to be archived publicly). No explanation of how the wrong data came to be used is given.

The reconstruction obtained from the online data differed from the reconstruction published

More precisely, the reconstruction in LT15 was incorrect as there were several data processing errors including: misspelt taxa (which will then be excluded from the analysis); inclusion of lakes with low chironomid counts; and the inclusion of temperature as a species in the calibration set (this will make much less difference to the reconstruction than one might suppose).

The number of head capsules per sample used for the temperature reconstruction was below 50, the typical number used for temperature inferences (Heiri and Lotter, 2001, Larocque, 2001 and Quinlan and Smol, 2001)

This directly contradicts the claim in LT15 that

At least 50 head capsules (Heiri et al., 2001 and Larocque et al., 2001) were mounted. (Methods 3.1)

If a couple of levels in the stratigraphy had 48 chironomid head capsules rather than the requisite 50, it would matter not. Unfortunately, the counts are often far lower, ranging from 19 to 68.5. A third of counts are below 30 chironomids; only an eighth of are 50 or more. This matters because small counts are expected to have larger counting errors and hence larger reconstruction uncertainties.More importantly, readers must be able to rely upon claims made by the authors.

The corrigendum provides a new stratigraphy and a new August air-temperature reconstruction which is, like the old one, remarkably good.

LT15 has been cited eight times.

Posted in Peer reviewed literature, transfer function | Tagged , | 5 Comments